BaseX – Processing and Visualizing large XML Instances

Institutionen
  • AG Scholl (Database and Information Systems)
Publikationen
Erat, Jens (2013): Fine Granular Locking in XML Databases

Fine Granular Locking in XML Databases

×

XML databases gained highly in popularity over the last years, and queries performed got far more complex. Whereas mainly used for single threaded, often single user applications, usage for real-time, multi-user and parallel client-server environments increases. Along with that, demand for higher concurrency gets louder.



This bachelor thesis analyses requirements on and searches for suitable concurrency control algorithms suitable for the sequential XML encoding based on the pre/post plane widely used in native XML databases. For comparing different concepts, two of them have been implemented for BaseX - one of those native database systems:



- Conservative and strict two phase locking which was recognized as requirement to support all possible use cases, and
- optimistic concurrency control as a very different approach on achieving higher parallelism.



A short glimpse on other native XML database systems completes the evaluation of concurrency strategies.



While tree locking protocols have been dismissed, possible ways to further enhance concurrency control in BaseX are illustrated and considered.

Forschungszusammenhang (Projekte)

Holupirek, Alexander (2012): Declarative Access to Filesystem Data : New application domains for XML database management systems

Declarative Access to Filesystem Data : New application domains for XML database management systems

×

XML and state-of-the-art XML database management systems (XML-DBMSs) can play a leading role in far more application domains as it is currently the case.

Even in their basic configuration, they entail all components necessary to act as central systems for complex search and retrieval tasks. They provide language-specific indexing of full-text documents and can store structured, semi-structured and binary data.

Besides, they offer a great variety of standardized languages (XQuery, XSLT, XQuery Full Text, etc.) to develop applications inside a pure XML technology stack. Benefits are obvious: Data, logic, and presentation tiers can operate on a single data model, and no conversions have to be applied when switching in between.

This thesis deals with the design and development of XML/XQuery driven information architectures that process formerly heterogeneous data sources in a standardized and uniform manner. Filesystems and their vast amounts of different file types are a prime example for such a heterogeneous dataspace. A new XML dialect, the Filesystem Markup Language (FSML), is introduced to construct a database view of the filesystem and its contents. FSML provides a uniform view on the filesystem’s contents and allows developers to leverage the complete XML technology stack on filesystem data.

BaseX, a high performance, native XML-DBMS developed at the University of Konstanz, is pushed to new application domains. We interface the database system with the operating system kernel and implement a database/filesystem hybrid (BaseX-FS), which is working on FSML database instances. A joint storage for both the filesystem and the database is established, which allows both developers and users to access data via the conventional and proven filesystem interface and, in addition, through a novel declarative, database-supported interface. As a direct consequence, XML languages such as XQuery can be used by applications and developers to analyze and process filesystem data. Smarter ways for accessing personal information stored in filesystems are achieved by retrieval strategies with no, partial, or full knowledge about the structure, format, and content of the data (“Query the filesystem like a database”).

In combination with BaseX-Web, a database extension that facilitates the development of desktop-like web applications, we present a system architecture that makes it easier for application developers to build content-oriented (data-centric) retrieval and search applications dealing with files and their contents. The proposed architecture is ready to drive (expert) information systems that work with distinct data sources, using an XQuery-driven development approach. As a concluding proof of concept, a complete development cycle for an OPAC (Online Public Access Catalogue) system is presented in detail.

Forschungszusammenhang (Projekte)

Miller, Wolfgang (2011): BaseX Tree View : Entwicklung einer datenbankgestützten Visualisierung von XML-Dokumenten

BaseX Tree View : Entwicklung einer datenbankgestützten Visualisierung von XML-Dokumenten

×

Diese Arbeit dokumentiert die Entwicklung einer traditionellen Baumvisualisierung, welche als Projekt zur XML-Datenbank BaseX und deren graphischem Frontend, der BaseX GUI, entstand. Zielsetzung dabei war es eine performante Visualisierung zu realisieren, die dem Anwender eine alternative Sicht auf eingelesene XML-Dokumente ermöglicht.

Forschungszusammenhang (Projekte)

  Holupirek, Alexander; Grün, Christian; Scholl, Marc H. (2009): BaseX and DeepFS - Joint Storage for Filesystem and Database Proceedings of the 12th International Conference on Extending Database Technology Advances in Database Technology - EDBT '09. - New York, New York, USA : ACM Press, 2009. - S. 1108-1111. - ISBN 978-1-60558-422-5

BaseX and DeepFS - Joint Storage for Filesystem and Database

×

BaseX is an early adopter of the upcoming XQuery Full Text Recommendation. This paper presents some of the enhancements made to the XML database to fully support the language extensions. The system s data and index structures are described, and implementation details are given on the XQuery compiler, which supports sequential scanning, index-based, and hybrid processing of full-text queries. Experimental analysis and an insight into visual result presentation of query results conclude the presentation.

Forschungszusammenhang (Projekte)

Grün, Christian; Gath, Sebastian; Holupirek, Alexander; Scholl, Marc H. (2009): XQuery Full Text Implementation in BaseX Database and XML technologies : 6th International XML Database Symposium, XSym 2009, Lyon, France, August 24, 2009 / Bellahsène, Zohra et al. (Hrsg.). - Berlin [u.a.] : Springer, 2009. - (Lecture notes in computer science ; 5679). - S. 114-128. - ISBN 978-3-642-03554-8

XQuery Full Text Implementation in BaseX

×

BaseX is an early adopter of the upcom- ing XQuery Full Text Recommendation. This paper presents some of the enhancements made to the XML database to fully support the language extensions. The system s data and index structures are described, and implementation details are given on the XQuery compiler, which sup- ports sequential scanning, index-based, and hybrid processing of full-text queries. Experimental analysis and an insight into visual result presen- tation of query results conclude the presentation.

Forschungszusammenhang (Projekte)

Grün, Christian; Holupirek, Alexander; Scholl, Marc H. (2007): Visually Exploring and Querying XML with BaseX Datenbanksysteme in Business, Technologie und Web (BTW) : 7. - 9.3.2007 in Aachen / Kemper, Alfons et al. (Hrsg.). - Bonn : GI, 2007. - (Gesellschaft für Informatik...Fachtagung des GI-Fachbereichs Datenbanken und Informationssysteme (DBIS) ; 12). - S. 629-632. - ISBN 978-3-88579-197-3

Visually Exploring and Querying XML with BaseX

×

XML documents are widely used as a generic container for textual contents. As they are increasingly growing in size, XML databases are emerging to efficiently store and query their contents. Besides, due to the hierarchic structure of XML documents, hierarchic visualizations are needed to facilitiate cognitive access to query results. BaseX is a simple database prototype, mapping XML documents to a table based tree encoding. An integrated treemap visualization and a query interface allow visual access to the documents and demonstrate the efficiency of the underlying data storage.

Forschungszusammenhang (Projekte)

Holupirek, Alexander; Grün, Christian; Scholl, Marc H. (2007): Melting Pot XML : Bringing File Systems and Databases One Step Closer Datenbanksysteme in Business, Technologie und Web (BTW) : 7. - 9.3.2007 in Aachen / Kemper, Alfons et al. (Hrsg.). - Bonn : GI, 2007. - (Gesellschaft für Informatik...Fachtagung des GI-Fachbereichs Datenbanken und Informationssysteme (DBIS) ; 12). - S. 309-323. - ISBN 978-3-88579-197-3

Melting Pot XML : Bringing File Systems and Databases One Step Closer

×

Ever-growing data volumes demand for storage systems beyond current file systems abilities, particularly, a powerful querying capability. With the rise of XML, the database community has been challenged by semi-structured data processing, enhancing their field of activity. Since file systems are structured hierarchically they can be mapped to XML and as such stored in and queried by an XML-aware database. We provide an evaluation of a state-of-the-art XML-aware database implementing a file system.

Forschungszusammenhang (Projekte)

Grün, Christian; Holupirek, Alexander; Kramis, Marc; Scholl, Marc H.; Waldvogel, Marcel (2006): Pushing XPath Accelerator to its Limits Proceedings of the First International Workshop on Performance and Evaluation of Data Management Systems (EXPDB 2006). - Chicago, Ill. : ACM, 2006. - Chicago : ACM, 2006

Pushing XPath Accelerator to its Limits

×

Two competing encoding concepts are known to scale well with growing amounts of XML data: XPath Accelerator encoding implemented by MonetDB for in-memory documents and X-Hive's Persistent DOM for on-disk storage. We identified two ways to improve XPath Accelerator and present prototypes for the respective techniques: BaseX boosts in-memory performance with optimized data and value index structures while Idefix introduces native block-oriented persistence with logarithmic update behavior for true scalability, overcoming main-memory constraints.
An easy-to-use Java-based benchmarking framework was developed and used to consistently compare these competing techniques and perform scalability measurements. The established XMark benchmark was applied to all four systems under test. Additional fulltext-sensitive queries against the well-known DBLP database complement the XMark results.
Not only did the latest version of X-Hive finally surprise with good scalability and performance numbers. Also, both BaseX and Idefix hold their promise to push XPath Accelerator to its limits: BaseX efficiently exploits available main memory to speedup XML queries while Idefix surpasses main-memory constraints and rivals the on-disk leadership of X-Hive. The competition between XPath Accelerator and Persistent DOM definitely is relaunched.

Forschungszusammenhang (Projekte)

Mittelgeber
Name Kennziffer Beschreibung Laufzeit
Deutsche ForschungsgemeinschaftunknownGraduiertenkolleg "Explorative Analysis and Visualization of Large Information Spaces"
Weitere Informationen
Laufzeit:
Dissertationen
Titel Autor Gutachter
Declarative Access to Filesystem Data - New application domains for XML database management systems Alexander Holupirek Marc Scholl
Marcel Waldvogel