Data-Mining von heterogenen Daten mit ART-Netz

Description

Der Schwerpunkt des Projektes liegt in der Entwicklung eines neuen Data-Mining Systems für heterogene Daten, das auf einem neuronalen Netz der ART-(Adaptive Resonanz Theorie)-Familie basiert. Im Gegensatz zu bereits existierenden Ansätzen steht die Mehrfachklassifikation von heterogenen Daten im Vordergrund, wobei ein Objekt mehrere Klassenlabels hat. Das System soll auch die automatische Erstellung einer gemeinsamen Wissenshierarchie aus der Kombination verschiedener Datenquellen ermöglichen. Dadurch kann verborgenes Wissen aus heterogenen Daten abgeleitet werden und das Verständnis des Anwenders für Datenentstehungsprozesse wesentlich verbessert werden.

Participants
Institutions
  • WG Berthold (Bioinformatics and Information Mining)
Publications
  Benites, Fernando; Sapozhnikova, Elena (2015): Hierarchical interestingness measures for association rules with generalization on both antecedent and consequent sides Pattern Recognition Letters. 2015, 65, pp. 197-203. ISSN 0167-8655. eISSN 1872-7344. Available under: doi: 10.1016/j.patrec.2015.07.027

Hierarchical interestingness measures for association rules with generalization on both antecedent and consequent sides

×

Abstract Pairwise generalized association rules mined from raw data can be used to connect the concepts of multiple ontologies. In this case the items of rules are hierarchically organized and one can use the relations between them in order to reduce rule redundancy. Recently proposed hierarchical interestingness measures address this issue, taking hierarchical information on the antecedent side into account. In this paper, we extend them to the case of considering two hierarchies on both the antecedent and the consequent sides of a rule. The extended measures are then compared with their counterparts as well as with conventional ones. Three real world datasets from the text mining domain with predefined ground truth sets of associations are used for comparison within the framework of instance-based ontology mapping.

Origin (projects)

  Benites, Fernando; Sapozhnikova, Elena (2014): Evaluation of Hierarchical Interestingness Measures for Mining Pairwise Generalized Association Rules IEEE Transactions on Knowledge and Data Engineering. 2014, 26(12), pp. 3012-3025. ISSN 1041-4347. eISSN 1558-2191. Available under: doi: 10.1109/TKDE.2014.2320722

Evaluation of Hierarchical Interestingness Measures for Mining Pairwise Generalized Association Rules

×

In the literature about association analysis, many interestingness measures have been proposed to assess the quality of obtained association rules in order to select a small set of the most interesting among them. In the particular case of hierarchically organized items and generalized association rules connecting them, a measure that dealt appropriately with the hierarchy would be advantageous. Here we present the further developments of a new class of such hierarchical interestingness measures and compare them with a large set of conventional measures and with three hierarchical pruning methods from the literature. The aim is to find interesting pairwise generalized association rules connecting the concepts of multiple ontologies. Interested in the broad empirical evaluation of interestingness measures, we compared the rules obtained by 37 methods on four real world data sets against predefined ground truth sets of associations. To this end, we adopted a framework of instance-based ontology matching and extended the set of performance measures by two novel measures: relation learning recall and precision which take into account hierarchical relationships.

Origin (projects)

    Benites, Fernando; Simon, Svenja; Sapozhnikova, Elena (2014): Mining Rare Associations between Biological Ontologies PLoS ONE. 2014, 9(1), e84475. eISSN 1932-6203. Available under: doi: 10.1371/journal.pone.0084475

Mining Rare Associations between Biological Ontologies

×

The constantly increasing volume and complexity of available biological data requires new methods for their management and analysis. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining cross and intra-ontology pairwise generalized association rules. Its advantage is sensitivity to rare associations, for these are important for biologists. We propose a new class of interestingness measures designed for hierarchically organized rules. These measures allow one to select the most important rules and to take into account rare cases. They favor rules with an actual interestingness value that exceeds the expected value. The latter is calculated taking into account the parent rule. We demonstrate this approach by applying it to the analysis of data from Gene Ontology and GPCR databases. Our objective is to discover interesting relations between two different ontologies or parts of a single ontology. The association rules that are thus discovered can provide the user with new knowledge about underlying biological processes or help improve annotation consistency. The obtained results show that produced rules represent meaningful and quite reliable associations.

Origin (projects)

  Benites, Fernando; Sapozhnikova, Elena (2014): Using Semantic Data Mining for Classification Improvement and Knowledge Extraction SEIDL, Thomas, ed., Marwan HASSANI, ed., Christian BEECKS, ed.. Proceedings of the 16th LWA Workshops: KDML, IR and FGWM, Aachen, Germany, September 8-10, 2014. CEUR-WS.org, 2014, pp. 150-155. CEUR Workshop Proceedings. 1226

Using Semantic Data Mining for Classification Improvement and Knowledge Extraction

×

The objective of this position paper is to show that the inte- gration of semantic data mining into the DAMIART data mining system can help further improve classification performance and knowledge ex- traction. DAMIART performs multi-label classification in the presence of multiple class ontologies, hierarchy extraction from multi-labels and concept relation by association rule mining. Whereas DAMIART com- bines knowledge from multiple data sources and multiple class ontologies, the proposed extension should also explore available ontologies over at- tributes. This will allow the system to produce not only more accurate classification results but also improve their interpretability and overcome such problems as data sparseness.

Origin (projects)

Benites, Fernando; Sapozhnikova, Elena (2013): Generalized Association Rules for Connecting Biological Ontologies FERNANDES, Pedro, ed. and others. BIOINFORMATICS 2013 : proceedings of the International Conference on Bioinformatics Models, Methods and Algorithms. [S.l.]: SciTePress, 2013, pp. 230-237. ISBN 978-989-8565-35-8

Generalized Association Rules for Connecting Biological Ontologies

×

The constantly increasing volume and complexity of available biological data requires new methods for Managing and analyzing them. An important challenge is the integration of information from different sources in order to discover possible hidden relations between already known data. In this paper we introduce a data mining approach which relates biological ontologies by mining generalized association rules connecting their categories. To select only the most important rules, we propose a new interestingness measure especially well-suited for hierarchically organized rules. To demonstrate this approach, we applied it to the bioinformatics domain and, more specifically, to the analysis of data from Gene Ontology, Cell type Ontology and GPCR databases. In this way found association rules connecting two biological ontologies can provide the user with new knowledge about underlying biological processes. The preliminary results show that produced rules represent meaningful and quite reliable associations among the ontologies and help infer new knowledge.

Origin (projects)

  Benites, Fernando; Sapozhnikova, Elena (2012): Learning different concept hierarchies and the relations between them from classified data MAGDALENA-BENEDITO, Rafael, ed. and others. Intelligent data analysis for real-life applications : theory and practice. Hershey, PA: Information Science Reference, 2012, pp. 18-34. ISBN 978-1-4666-1806-0. Available under: doi: 10.4018/978-1-4666-1806-0.ch002

Learning different concept hierarchies and the relations between them from classified data

×

Methods for the automatic extraction of taxonomies and concept hierarchies from data have recently emerged as essential assistance for humans in ontology construction. The objective of this chapter is to show how the extraction of concept hierarchies and finding relations between them can be effectively coupled with a multi-label classification task. The authors introduce a data mining system which performs classification and addresses both issues by means of association rule mining. The proposed system has been tested on two real-world datasets with the class labels of each dataset coming from two different class hierarchies. Several experiments on hierarchy extraction and concept relation were conducted in order to evaluate the system and three different interestingness measures were applied, to select the most important relations between concepts. One of the measures was developed by the authors. The experimental results showed that the system is able to infer quite accurate concept hierarchies and associations among the concepts. It is therefore well suited for classification-based reasoning.

Origin (projects)

  Brucker, Florian; Benites, Fernando; Sapozhnikova, Elena (2011): An Empirical Comparison of Flat and Hierarchical Performance Measures for Multi-Label Classification with Hierarchy Extraction KÖNIG, Andreas, ed., Andreas DENGEL, ed., Knut HINKELMANN, ed., Koichi KISE, ed., Robert J. HOWLETT, ed., Lakhmi C. JAIN, ed.. Knowledge-Based and Intelligent Information and Engineering Systems. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 579-589. Lecture Notes in Computer Science. 6881. ISBN 978-3-642-23850-5. Available under: doi: 10.1007/978-3-642-23851-2_59

An Empirical Comparison of Flat and Hierarchical Performance Measures for Multi-Label Classification with Hierarchy Extraction

×

Multi-label Classification (MC) often deals with hierarchically organized class taxonomies. In contrast to Hierarchical Multi-label Classification (HMC), where the class hierarchy is assumed to be known a priori, we are interested in the opposite case where it is unknown and should be extracted from multi-label data automatically. In this case the predictive performance of a classifier can be assessed by well-known Performance Measures (PMs) used in flat MC such as precision and recall. The fact that these PMs treat all class labels as independent labels, in contrast to hierarchically structured taxonomies, is a problem. As an alternative, special hierarchical PMs can be used that utilize hierarchy knowledge and apply this knowledge to the extracted hierarchy. This type of hierarchical PM has only recently been mentioned in literature. The aim of this study is first to verify whether HMC measures do significantly improve quality assessment in this setting. In addition, we seek to find a proper measure that reflects the potential quality of extracted hierarchies in the best possible way. We empirically compare ten hierarchical and four traditional flat PMs in order to investigate relations between them. The performance measurements obtained for predictions of four multi-label classifiers ML-ARAM, ML-kNN, BoosTexter and SVM on four datasets from the text mining domain are analyzed by means of hierarchical clustering and by calculating pairwise statistical consistency and discriminancy.

Origin (projects)

Funding sources
Name Finanzierungstyp Kategorie Project no.
Emmy-Noether-Programm third-party funds research funding program 571/08
Further information
Period: 15.04.2008 – 31.10.2013