Development of Batch Machine Learning Techniques for Multi-concept Descriptions in the form of Feature Partitioning and Overlapping Feature Intervals


Abstract: Classification problems can be modeled in terms of multi-concept description problems. Induction of multi-concept descriptions from classified examples is one of the fields of machine learning research which have found large number of applications to real-world problems. Our prior research on the incremental learning of concept descriptions in the form of feature partitions and overlapping feature intervals was successful. Our previous study has shown that it is possible to obtain more accurate concept descriptions in these forms when they are learned in the batch (non-incremental) mode. Here, the concepts are represented as the partitioning of the values for each feature or the set of overlapping feature intervals corresponding to concepts. The classification of an unseen instance is determined through a weighted voting scheme on classifications based on the individual the features. Because the weights of features are also learned, it is possible to identify the relevant features from the irrelevant ones; moreover, it is possible to learn the relative importance of the features for the classification. We aim to investigate the use of genetic algorithms for learning the feature weights. Using the techniques to be obtained in the completion of the project, we expect to be able to mechanically learn multi-concept descriptions and feature weights in various domains. For this purpose, we will compile two medical datasets, one for the description of arrhythmia characteristics from EKG signals, and the other for the histopathological description of the dermatological illnesses. We will evaluate the learning techniques developed in the project on these datasets, along with the standard datasets from the UCI-Repository.

Keywords: Machine Learning, Genetic Algorithms, Concept Learning

Principal Investigator: H. Altay Guvenir, Ph.D.
Investigator: Aynur Akkus, BSc.
Investigator: Gulsen Demiroz, BSc.

Duration: August 1995 - August 1997.

Sponsor: Scientific and Technical Research Council of Turkey

Grant No: EEEAG-153

Budget: 1,154,000,000 TL (~ USD 25,000 in 1995).

Final Report (Postscript, compressed)