Integration of Data Mining with Database System


Significant research efforts have been made to enable in-database data mining to take advantage of the DBMS's power of handling large volume of data. The integration of data mining algorithms with a relational Database Management System is a challenging issue. Tight coupling of data mining and database systems, however, is - besides improving data mining algorithms - a key issue for efficient and scalable data mining in large databases. Tight coupling means not only to link specific data mining algorithms to the database system, e.g. as stored procedures, but rather to investigate, which salient functionality of key mining algorithms should be integrated to be run as part of the database system kernel, so as to avoid expensive data transport. Basically, the question is whether certain parts of the data mining functionality can only be implemented efficiently inside the DBMS server, because running them outside (i.e., on top) would be too large a performance penalty. Our overall goal is to identify such data mining primitives that need to be included in the low-level DBMS's functional repertoire, pretty much in the same way as some of the OLAP extensions that have already made it into the SQL standard and into the core DBMS algorithms.

We further aim to discover potential computations that can be performed in advance on the ready-to-mine data, which can also be materialized and made part of the database. These computations offer a set of pre-calculated and ready-to-consume values to the data mining algorithms which otherwise would be calculated from scratch. Such pre-computations not only enable data mining algorithms to work efficiently, but also tightly couple data mining with database systems.

  • FB Informatik und Informationswissenschaft
Further information