XQuery Your Filesystem - Enhancing filesystems using semi-structured database technology

Description

The long term perspective of this project is to find synergies between filesystem and semi-structured database techniques.
While filesystems provide an easy and well-understood interface to the data, they lack important and demanded features like, for example, the ability to query the data.
We will break with the long tradition to consider a file merely as a sequences of bytes. We will unseal the black-box and let the classical file hierarchy emerge into the files itself. The consideration of content and structure opens the door for query languages that operate on semi-structured data.
Currently we are following the concept of a joint storage for both, database and filesystem.
Finally we will provide both, proven and stable access to the data leveraging file system techniques and query support for all stored files.

Institutions
  • FB Informatik und Informationswissenschaft
Publications
Holupirek, Alexander (2012): Declarative Access to Filesystem Data : New application domains for XML database management systems

Declarative Access to Filesystem Data : New application domains for XML database management systems

×

XML and state-of-the-art XML database management systems (XML-DBMSs) can play a leading role in far more application domains as it is currently the case.

Even in their basic configuration, they entail all components necessary to act as central systems for complex search and retrieval tasks. They provide language-specific indexing of full-text documents and can store structured, semi-structured and binary data.

Besides, they offer a great variety of standardized languages (XQuery, XSLT, XQuery Full Text, etc.) to develop applications inside a pure XML technology stack. Benefits are obvious: Data, logic, and presentation tiers can operate on a single data model, and no conversions have to be applied when switching in between.

This thesis deals with the design and development of XML/XQuery driven information architectures that process formerly heterogeneous data sources in a standardized and uniform manner. Filesystems and their vast amounts of different file types are a prime example for such a heterogeneous dataspace. A new XML dialect, the Filesystem Markup Language (FSML), is introduced to construct a database view of the filesystem and its contents. FSML provides a uniform view on the filesystem’s contents and allows developers to leverage the complete XML technology stack on filesystem data.

BaseX, a high performance, native XML-DBMS developed at the University of Konstanz, is pushed to new application domains. We interface the database system with the operating system kernel and implement a database/filesystem hybrid (BaseX-FS), which is working on FSML database instances. A joint storage for both the filesystem and the database is established, which allows both developers and users to access data via the conventional and proven filesystem interface and, in addition, through a novel declarative, database-supported interface. As a direct consequence, XML languages such as XQuery can be used by applications and developers to analyze and process filesystem data. Smarter ways for accessing personal information stored in filesystems are achieved by retrieval strategies with no, partial, or full knowledge about the structure, format, and content of the data (“Query the filesystem like a database”).

In combination with BaseX-Web, a database extension that facilitates the development of desktop-like web applications, we present a system architecture that makes it easier for application developers to build content-oriented (data-centric) retrieval and search applications dealing with files and their contents. The proposed architecture is ready to drive (expert) information systems that work with distinct data sources, using an XQuery-driven development approach. As a concluding proof of concept, a complete development cycle for an OPAC (Online Public Access Catalogue) system is presented in detail.

Origin (projects)

  Holupirek, Alexander; Grün, Christian; Scholl, Marc H. (2009): BaseX and DeepFS - Joint Storage for Filesystem and Database Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09. - New York, New York, USA : ACM Press, 2009. - S. 1108-1111. - ISBN 978-1-60558-422-5

BaseX and DeepFS - Joint Storage for Filesystem and Database

×

BaseX is an early adopter of the upcoming XQuery Full Text Recommendation. This paper presents some of the enhancements made to the XML database to fully support the language extensions. The system s data and index structures are described, and implementation details are given on the XQuery compiler, which supports sequential scanning, index-based, and hybrid processing of full-text queries. Experimental analysis and an insight into visual result presentation of query results conclude the presentation.

Origin (projects)

  Holupirek, Alexander; Scholl, Marc H. (2008): Implementing Filesystems by Tree-aware DBMSs Proceedings of the VLDB Endowment ; 1 (2008), 2. - S. 1623-1630. - ISSN 2150-8097

Implementing Filesystems by Tree-aware DBMSs

×

With the rise of XML, the database community has been challenged by semi-structured data processing. Since the data type behind XML is the tree, state-of-the-art RDBMSs have learned to deal with such data (e.g., [18, 5, 6, 16]). This paper introduces a Ph.D. project focused on the question in how far the tree-awareness of recent RDBMSs can be used to, once again, try to implement filesystems using database technology. Our main goal is to provide means to query the data stored in filesystems and to find ways to enhance/ combine the data storage and query capabilities of operating systems using semi-structured database technology. Two DBMSs with relational XML storage, built on top of the XPath accelerator numbering scheme [14], are the foundations for our work. With BaseX, an XML database, we establish a link between user, database and lesystem content. BaseX allows visual access to filesystem data stored in the database. An integrated query interface allows users to filter query results in interactive response time. Second, we establish a link between DBMS and OS. We implement a filesystem in userspace backed by the MonetDB/XQuery system, a well-known relational database system, which integrates the Pathfinder XQuery compiler [5] and the MonetDB kernel [4]. As a result, the DBMS is mounted as a conventional filesystem by the operating system kernel. Consequently, access via the established (virtual) filesystem interface as well as database enhanced access to the same data is provided.

Origin (projects)

Holupirek, Alexander; Scholl, Marc H. (2008): An XML Database as Filesystem in Userspace Proceedings of the 20. GI-Workshop on Foundations of Databases (Grundlagen von Datenbanken) : May 13 - May 16, 2008, Apolda, Germany / Höpfner, Hagen et al. (Hrsg.). - Bruchsal : School of Information Technology, 2008. - S. 31-35

An XML Database as Filesystem in Userspace

×

dc.title:


dc.contributor.author: Holupirek, Alexander; Scholl, Marc H.

Origin (projects)

Grün, Christian; Holupirek, Alexander; Scholl, Marc H. (2007): Visually Exploring and Querying XML with BaseX Datenbanksysteme in Business, Technologie und Web (BTW) : 7. - 9.3.2007 in Aachen / Kemper, Alfons et al. (Hrsg.). - Bonn : GI, 2007. - (Gesellschaft für Informatik...Fachtagung des GI-Fachbereichs Datenbanken und Informationssysteme (DBIS) ; 12). - S. 629-632. - ISBN 978-3-88579-197-3

Visually Exploring and Querying XML with BaseX

×

XML documents are widely used as a generic container for textual contents. As they are increasingly growing in size, XML databases are emerging to efficiently store and query their contents. Besides, due to the hierarchic structure of XML documents, hierarchic visualizations are needed to facilitiate cognitive access to query results. BaseX is a simple database prototype, mapping XML documents to a table based tree encoding. An integrated treemap visualization and a query interface allow visual access to the documents and demonstrate the efficiency of the underlying data storage.

Origin (projects)

Holupirek, Alexander; Grün, Christian; Scholl, Marc H. (2007): Melting Pot XML : Bringing File Systems and Databases One Step Closer Datenbanksysteme in Business, Technologie und Web (BTW) : 7. - 9.3.2007 in Aachen / Kemper, Alfons et al. (Hrsg.). - Bonn : GI, 2007. - (Gesellschaft für Informatik...Fachtagung des GI-Fachbereichs Datenbanken und Informationssysteme (DBIS) ; 12). - S. 309-323. - ISBN 978-3-88579-197-3

Melting Pot XML : Bringing File Systems and Databases One Step Closer

×

Ever-growing data volumes demand for storage systems beyond current file systems abilities, particularly, a powerful querying capability. With the rise of XML, the database community has been challenged by semi-structured data processing, enhancing their field of activity. Since file systems are structured hierarchically they can be mapped to XML and as such stored in and queried by an XML-aware database. We provide an evaluation of a state-of-the-art XML-aware database implementing a file system.

Origin (projects)

Grün, Christian; Holupirek, Alexander; Kramis, Marc; Scholl, Marc H.; Waldvogel, Marcel (2006): Pushing XPath Accelerator to its Limits Proceedings of the First International Workshop on Performance and Evaluation of Data Management Systems (EXPDB 2006). - Chicago, Ill. : ACM, 2006. - Chicago : ACM, 2006

Pushing XPath Accelerator to its Limits

×

Two competing encoding concepts are known to scale well with growing amounts of XML data: XPath Accelerator encoding implemented by MonetDB for in-memory documents and X-Hive's Persistent DOM for on-disk storage. We identified two ways to improve XPath Accelerator and present prototypes for the respective techniques: BaseX boosts in-memory performance with optimized data and value index structures while Idefix introduces native block-oriented persistence with logarithmic update behavior for true scalability, overcoming main-memory constraints.
An easy-to-use Java-based benchmarking framework was developed and used to consistently compare these competing techniques and perform scalability measurements. The established XMark benchmark was applied to all four systems under test. Additional fulltext-sensitive queries against the well-known DBLP database complement the XMark results.
Not only did the latest version of X-Hive finally surprise with good scalability and performance numbers. Also, both BaseX and Idefix hold their promise to push XPath Accelerator to its limits: BaseX efficiently exploits available main memory to speedup XML queries while Idefix surpasses main-memory constraints and rivals the on-disk leadership of X-Hive. The competition between XPath Accelerator and Persistent DOM definitely is relaunched.

Origin (projects)

Further information
Period:
Dissertations
Title Author Supervisors
Declarative Access to Filesystem Data - New application domains for XML database management systems Alexander Holupirek Marc Scholl
Marcel Waldvogel