SmartDataLake - Sustainable Data Lakes for Extreme-Scale Analytics

Description

pData lakes are raw data ecosystems, where large amounts of diverse data are retained and coexist. They facilitate self-service analytics for flexible, fast, ad hoc decision making. SmartDataLake enables extreme-scale analytics over sustainable big data lakes. It provides an adaptive, scalable and elastic data lake management system that offers: (a) data virtualization for abstracting and optimizing access and queries over heterogeneous data, (b) data synopses for approximate query answering and analytics to enable interactive response times, and (c) automated placement of data in different storage tiers based on data characteristics and access patterns to reduce costs. The data lake’s contents are modelled and organised as a heterogeneous information network, containing multiple types of entities and relations. Efficient and scalable algorithms are provided for: (a) similarity search and exploration for discovering relevant information, (b) entity resolution and ranking for identifying and selecting important and representative entities across sources, (c) link prediction and clustering for unveiling hidden associations and patterns among entities, and (d) change detection and incremental update of analysis results to enable faster analysis of new data. Finally, interactive and scalable visual analytics are provided to include and empower the data scientist in the knowledge extraction loop. This includes functionalities for: (a) visually exploring and tuning the space of features, models and parameters, and (b) enabling large-scale visualizations of spatial, temporal and network data. The results of the project are evaluated in real-world use cases from the business intelligence domain, including scenarios for portfolio recommendation, production planning and pricing, and investment decision making. SmartDataLake will foster innovation and enable European SMEs to capitalize on the value of their own data lakes.

Institutions
  • WG Keim (Data Analysis and Visualization)
Publications
    Zagermann, Johannes; Hubenschmid, Sebastian; Balestrucci, Priscilla; Feuchtner, Tiare; Mayer, Sven; Ernst, Marc O.; Schmidt, Albrecht; Reiterer, Harald (2022): Complementary interfaces for visual computing it - Information Technology. De Gruyter Oldenbourg. 2022, 64(4-5). ISSN 1611-2776. eISSN 2196-7032. Available under: doi: 10.1515/itit-2022-0031

Complementary interfaces for visual computing

×

With increasing complexity in visual computing tasks, a single device may not be sufficient to adequately support the user’s workflow. Here, we can employ multi-device ecologies such as cross-device interaction, where a workflow can be split across multiple devices, each dedicated to a specific role. But what makes these multi-device ecologies compelling? Based on insights from our research, each device or interface component must contribute a complementary characteristic to increase the quality of interaction and further support users in their current activity. We establish the term complementary interfaces for such meaningful combinations of devices and modalities and provide an initial set of challenges. In addition, we demonstrate the value of complementarity with examples from within our own research.

Origin (projects)

    Görtler, Jochen; Spinner, Thilo; Streeb, Dirk; Weiskopf, Daniel; Deussen, Oliver (2020): Uncertainty-Aware Principal Component Analysis IEEE Transactions on Visualization and Computer Graphics. Institute of Electrical and Electronics Engineers (IEEE). 2020, 26(1), pp. 822-831. ISSN 1077-2626. eISSN 1941-0506. Available under: doi: 10.1109/TVCG.2019.2934812

Uncertainty-Aware Principal Component Analysis

×

We present a technique to perform dimensionality reduction on data that is subject to uncertainty. Our method is a generalization of traditional principal component analysis (PCA) to multivariate probability distributions. In comparison to non-linear methods, linear dimensionality reduction techniques have the advantage that the characteristics of such probability distributions remain intact after projection. We derive a representation of the PCA sample covariance matrix that respects potential uncertainty in each of the inputs, building the mathematical foundation of our new method: uncertainty-aware PCA . In addition to the accuracy and performance gained by our approach over sampling-based strategies, our formulation allows us to perform sensitivity analysis with regard to the uncertainty in the data. For this, we propose factor traces as a novel visualization that enables to better understand the influence of uncertainty on the chosen principal components. We provide multiple examples of our technique using real-world datasets. As a special case, we show how to propagate multivariate normal distributions through PCA in closed form. Furthermore, we discuss extensions and limitations of our approach.

Origin (projects)

    Blumenschein, Michael; Debbeler, Luka J.; Lages, Nadine C.; Renner, Britta; Keim, Daniel A.; El-Assady, Mennatallah (2020): v-plots : Designing Hybrid Charts for the Comparative Analysis of Data Distributions Computer Graphics Forum. Wiley. 2020, 39(3), pp. 565-577. ISSN 0167-7055. eISSN 1467-8659. Available under: doi: 10.1111/cgf.14002

v-plots : Designing Hybrid Charts for the Comparative Analysis of Data Distributions

×

Comparing data distributions is a core focus in descriptive statistics, and part of most data analysis processes across disci-plines. In particular, comparing distributions entails numerous tasks, ranging from identifying global distribution properties,comparing aggregated statistics (e.g., mean values), to the local inspection of single cases. While various specialized visualiza-tions have been proposed (e.g., box plots, histograms, or violin plots), they are not usually designed to support more than a fewtasks, unless they are combined. In this paper, we present the v-plot designer; a technique for authoring custom hybrid charts,combining mirrored bar charts, difference encodings, and violin-style plots. v-plots are customizable and enable the simulta-neous comparison of data distributions on global, local, and aggregation levels. Our system design is grounded in an expertsurvey that compares and evaluates 20 common visualization techniques to derive guidelines for the task-driven selection ofappropriate visualizations. This knowledge externalization step allowed us to develop a guiding wizard that can tailor v-plotsto individual tasks and particular distribution properties. Finally, we confirm the usefulness of our system design and the user-guiding process by measuring the fitness for purpose and applicability in a second study with four domain and statistic expert

Origin (projects)

    Bishop, Fearn; Zagermann, Johannes; Pfeil, Ulrike; Sanderson, Gemma; Reiterer, Harald; Hinrichs, Uta (2020): Construct-A-Vis : exploring the free-form visualization processes of children IEEE Transactions on Visualization and Computer Graphics. Institute of Electrical and Electronics Engineers (IEEE). 2020, 26(1), pp. 451-460. ISSN 1077-2626. eISSN 1941-0506. Available under: doi: 10.1109/TVCG.2019.2934804

Construct-A-Vis : exploring the free-form visualization processes of children

×

Building data analysis skills is part of modern elementary school curricula. Recent research has explored how to facilitate children's understanding of visual data representations through completion exercises which highlight links between concrete and abstract mappings. This approach scaffolds visualization activities by presenting a target visualization to children. But how can we engage children in more free-form visual data mapping exercises that are driven by their own mapping ideas? How can we scaffold a creative exploration of visualization techniques and mapping possibilities? We present Construct-A-Vis, a tablet-based tool designed to explore the feasibility of free-form and constructive visualization activities with elementary school children. Construct-A-Vis provides adjustable levels of scaffolding visual mapping processes. It can be used by children individually or as part of collaborative activities. Findings from a study with elementary school children using Construct-A-Vis individually and in pairs highlight the potential of this free-form constructive approach, as visible in children's diverse visualization outcomes and their critical engagement with the data and mapping processes. Based on our study findings we contribute insights into the design of free-form visualization tools for children, including the role of tool-based scaffolding mechanisms and shared interactions to guide visualization activities with children.

Origin (projects)

  Spinner, Thilo; Schlegel, Udo; Schäfer, Hanna; El-Assady, Mennatallah (2020): explAIner : A Visual Analytics Framework for Interactive and Explainable Machine Learning IEEE Transactions on Visualization and Computer Graphics. Institute of Electrical and Electronics Engineers (IEEE). 2020, 26(1), pp. 1064-1074. ISSN 1077-2626. eISSN 1941-0506. Available under: doi: 10.1109/TVCG.2019.2934629

explAIner : A Visual Analytics Framework for Interactive and Explainable Machine Learning

×

We propose a framework for interactive and explainable machine learning that enables users to (1) understand machine learning models; (2) diagnose model limitations using different explainable AI methods; as well as (3) refine and optimize the models. Our framework combines an iterative XAI pipeline with eight global monitoring and steering mechanisms, including quality monitoring, provenance tracking, model comparison, and trust building. To operationalize the framework, we present explAIner, a visual analytics system for interactive and explainable machine learning that instantiates all phases of the suggested pipeline within the commonly used TensorBoard environment. We performed a user-study with nine participants across different expertise levels to examine their perception of our workflow and to collect suggestions to fill the gap between our system and framework. The evaluation confirms that our tightly integrated system leads to an informed machine learning process while disclosing opportunities for further extensions.

Origin (projects)

    Zagermann, Johannes; Pfeil, Ulrike; Reiterer, Harald (2018): Studying Eye Movements as a Basis for Measuring Cognitive Load Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems. New York, NY: ACM Press, 2018, LBW095. ISBN 978-1-4503-5621-3. Available under: doi: 10.1145/3170427.3188628

Studying Eye Movements as a Basis for Measuring Cognitive Load

×

Users' cognitive load while interacting with a system is a valuable metric for evaluations in HCI. We encourage the analysis of eye movements as an unobtrusive and widely available way to measure cognitive load. In this paper, we report initial findings from a user study with 26 participants working on three visual search tasks that represent different levels of difficulty. Also, we linearly increased the cognitive demand while solving the tasks. This allowed us to analyze the reaction of individual eye movements to different levels of task difficulty. Our results show how pupil dilation, blink rate, and the number of fixations and saccades per second individually react to changes in cognitive activity. We discuss how these measurements could be combined in future work to allow for a comprehensive investigation of cognitive load in interactive settings.

Origin (projects)

    Zagermann, Johannes; Pfeil, Ulrike; Fink, Daniel I.; von Bauer, Philipp; Reiterer, Harald (2017): Memory in Motion : The Influence of Gesture- and Touch-Based Input Modalities on Spatial Memory CHI'17 : Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems. New York, NY, USA: ACM, 2017, pp. 1899-1910. ISBN 978-1-4503-4655-9. Available under: doi: 10.1145/3025453.3026001

Memory in Motion : The Influence of Gesture- and Touch-Based Input Modalities on Spatial Memory

×

People's ability to remember and recall spatial information can be harnessed to improve navigation and search performances in interactive systems. In this paper, we investigate how display size and input modality influence spatial memory, especially in relation to efficiency and user satisfaction. Based on an experiment with 28 participants, we analyze the effect of three input modalities (trackpad, direct touch, and gesture-based motion controller) and two display sizes (10.6" and 55") on people's ability to navigate to spatially spread items and recall their positions. Our findings show that the impact of input modality and display size on spatial memory is not straightforward, but characterized by trade-offs between spatial memory, efficiency, and user satisfaction.

Origin (projects)

    Zagermann, Johannes; Pfeil, Ulrike; Schreiner, Mario; Rädle, Roman; Jetter, Hans-Christian; Reiterer, Harald (2015): Reporting Experiences on Group Activities in Cross-Device Settings Accepted Paper for Surface 2015 : Workshop on Interacting with Multi-Device Ecologies in the Wild. 2015

Reporting Experiences on Group Activities in Cross-Device Settings

×

Even though mobile devices are ubiquitous and users often own several of them, using them in concert to achieve a common goal is not well supported and remains a challenge for HCI. In this paper, we report on our observations of cross-device usage within groups when they engaged in a dyadic collaborative sensemaking task. Based on our findings, we discuss limitations of a state-of-the-art cross-device setting and present a set of design recommendations. We then propose an alternative design that aims for greater flexibility when using mobile devices to enable a free configuration of workspaces depending on users’ current activity.

Origin (projects)

Funding sources
Name Finanzierungstyp Kategorie Project no.
Europäische Union third-party funds research funding program 412/19
Further information
Period: 01.01.2019 – 31.12.2021