Exploration und Visualisierung großer Informationsmengen

Beschreibung

Im Zentrum des Forschungsprogramms des Graduiertenkollegs steht die Entwicklung von Methoden insbesondere im Rahmen von Visualisierung und Computergraphik zur Unterstützung von Exploration, Analyse und Management großer Datenräume. Dabei können diese Datenräume auch selbst visueller Natur sein, z.B. in Form von Multimediadokumenten oder komplexen geometrischen Strukturen. Die im Graduiertenkolleg besonders relevanten Fachrichtungen sind Informationsvisualisierung, Computergraphik, Human Computer Interaction, Intelligente Datenanalyse, Information Retrieval, Datenbanken und Informationssysteme, sowie digitale Kommunikation. Ein Ziel der Datenexploration und ¿analyse ist es, neue a-priori unbekannte, aber für den Anwender nützliche Informationen zu finden. Die Forschung zielt darauf ab, existierende Verfahren effektiver und effizienter zu machen sowie neue Verfahren der Exploration und Analyse zu entwickeln, die den speziellen Erfordernissen der z.B. im Internet gespeicherten und zu übertragenden Informationen gerecht werden. Dabei sollen die Informationen analysiert und gruppiert (geclustert) sowie ihre Qualität bewertet werden. Für diesen Prozess werden zunächst Informations-Repräsentations-Methoden der Informationswissenschaft sowie Datenmodellierungsmethoden aus dem Bereich Datenbanksysteme benötigt. Die Informationsmodellierung ist dann die Ausgangsbasis für die eigentliche Exploration der Daten, die durch eine Kombination von automatischen Clusteranalyseverfahren, einer wissensbasierten semantischen Analyse, sowie einer interaktiven Visualisierung der Daten erfolgen soll. Ein gewichtiger Anwendungsbereich der im Graduiertenkolleg zu entwickelnden Verfahren liegt in der <I> explorativen Analyse von großen Beständen an Bioinformatikdaten </i> und bildet eine Klammer um die meisten der in dem Graduiertenkolleg angestrebten Forschungsprojekte. Das Kolleg implementiert neue Betreuungs- und Ausbildungsstrukturen für Doktoranden. In der ersten Ausbildungsphase von zwei Semestern werden regelmäßig stattfindende Spezialvorlesungen gehalten und die Stipendiaten in Arbeitsgruppen, Praktika und Seminaren in die Forschungsaufgaben eingeführt. In dem folgenden Hauptteil der wissenschaftlichen Tätigkeit übernehmen Doktoranden und Postdoktoranden auch Forschungsorganisationsaufgaben bei der Planung und Durchführung von Workshops und internationalen Sommerschulen. Auslandsforschungsaufenthalte sind für alle Stipendiaten obligatorisch und runden das internationale Profil des Kollegs ab.



Central to the research programs of the Graduate College is the development of methods, especially within the framework of visualization and computer graphics, in support of data mining, data analysis, and the management of large information spaces, whereby the information spaces themselves may be visual in nature, i.e. in form of multimedia documents or complex geometric structures. The subject areas most relevant within the program are information visualization, computer graphics, human computer interaction, intelligent data analysis, information retrieval, database and information systems, as well as digital communication. One of the objectives of data mining and data analysis is to find new, previously unknown, yet useful information. The research aims at perfecting existing procedures to be more effective and more efficient, and at the same time it seeks to develop new procedures with regards to exploration and analysis, which serve more adequately special requirements, such as the vast information stored and transferred in the internet. The information must first be analyzed and clustered, as well as qualified. To complete this process, methods of knowledge representation within the area of information science, as well as data modeling methods within the area of database systems are needed. Hence, the information modeling is the starting point for the actual exploration of the data. The latter should be worked through using a combination of automatic cluster analysis procedures, knowledge based semantic analysis, and an interactive visualization of the data. An important range of application lies within the explorative analysis of vast amount of bioinformation data, and this commitment embraces most of the goals and objectives aimed at by the college. The graduate college will implement new supporting and educational structures for doctoral students. In the first phase, which will span two semesters, special lectures will be held on a regular basis. Students will be introduced to the scientific research in work groups, through practical applications, and seminars. In the second phase, which is also the core of the scientific activity, graduate students and post graduate students will be responsible for planning and implementing of workshops, as well as structuring and conduction international summer schools in addition to their own research projects. Research abroad is obligatory for all students and will round off the international profile of the graduate college.

Institutionen
  • FB Informatik und Informationswissenschaft
Publikationen
  Dahmen, Thorsten(2016): Modeling, Simulation, and Optimization of Pacing Strategies for Road Cycling on Realistic Tracks

Modeling, Simulation, and Optimization of Pacing Strategies for Road Cycling on Realistic Tracks

×
In this study, we develop methods to model and simulate road cycling on real-world courses, to analyze the performance of individual athletes and to identify and quantify potential performance improvement. The target is to instruct the athlete where and how to optimize his pacing strategy during a time trial.<br />We review the state-of-the-art mechanical model for road cycling power that defines the relationship between pedaling power and cycling speed. It accounts for the power demand to overcome the resistance due to inertia, rolling friction, road gradient, friction in bearings and aerial drag.<br />For several model parameters the measurement proves to be difficult. Thus, we estimate four compound parameters from a fit of the dynamic model to varying real-world power and speed measurements. The approach guarantees precise estimation even on courses with moderately varying slope as long as that slope is known with sufficient precision. An experimental evaluation shows that our calibration improves the model speed estimation significantly both on the calibration course and on other courses with the same type of road surface. A sensitivity analysis allows to compute the change in speed for small parameter perturbations proving in detail that the influences of the coefficients for aerial drag and rolling friction dominate.<br />We designed a simulator based on a Cyclus2 ergometer. The simulation includes real height profiles, virtual gears, a video playback that was synchronized with the cyclist's current virtual position on the course and online visualization of course and performance parameters. The ergometer brake is controlled so that it imitates the resistance predicted by the outdoor road cycling model. The software can partly compensate the physical limitations of the eddy current brake.<br />The road cycling model and thus the simulator resistance depend sensitively on an accurate estimation of the slope of the road. Commercial gps enabled bicycle computers do not provide a sufficient precision since the differentiation of the height data in order to compute the slope amplifies high frequency noise. A differential gps device provides height data of sufficient quality but only in case the satellite signals are not hidden by obstacles such as houses, trees, or mountains, which is often a serious limiting factor. For this purpose, we also present a method that combines model-based slope estimations with noisy measurements from multiple GPS signals of different quality.<br />We validated both the model and the simulator with field data obtained on mountain courses. The model described the performance parameters accurately with correlation coefficients of 0.96–0.99 and signal-to-noise ratios of 19.7–23.9 dB. We obtained similar quality measures for a comparison between model estimation and our simulator. Thus the model prediction errors can be attributed to measurement errors in differential gps altitude and model parameters but not to the ergometer control.<br />The athlete represents the motor of the system. Power supply models quantify his ability to sustain time-variable power demand. We briefly review the Morton-Margaria model that illustrates the interplay between the aerobic and anaerobic metabolism as a hydraulic system. Due to the complexity of human physiology and the inability to measure the required quantities, the model needs coarse simplification before it is usable quantitatively in practice. We present three physiological power supply models:<br />1. The 3-parameter critical power model extends the classical critical power model with the two parameters critical power and anaerobic work capacity by introducing a maximum power constraint and has an exertion rate that depends linearly on the pedaling load.<br />2. Gordon's modification, denoted by exertion model, suggests an alternative non-linear exertion rate that, in addition, defines an implicit maximum power constraint.<br />3. Our own 4-parameter model introduces an additional steering parameter for the nonlinearity and adopts the power constraint from the 3-parameter critical power model, thus combining – as we believe – some of the favorable properties of both models.<br />Having the power demand and different versions of supply models at hand, we compute minimum-time pacing strategies for both synthetic and real-world cycling courses as numerical solutions of optimal control problems using the Matlab package GPOPS-II.<br />In order to verify and discuss the numerical solutions, we derive candidate solutions for each problem. It turns out that for the 3-parameter critical power model, we deal with a singular control problem and, remarkably, the optimality criterion is that on sections, where the slope varies only moderately, the speed is perfectly constant.<br />Direct transcription methods as they are used in GPOPS-II often have severe numerical difficulties with singular optimal control problems. However, we found that if our problem is parametrized using kinetic energy instead of speed, significantly more detailed optimal strategies may be obtained on courses with real complex slope data and the computing time decreases.<br />We plot and discuss minimum-time pacing strategies for three real uphill courses in Switzerland, for which we have accurate height profile data, combined with the three physiological models. For Gordon's model we conducted an experiment, where an athlete was instructed on our simulator to follow the optimal strategy and finished the course in less time than when pacing himself based on his experience.<br />Finally, we give a numerical example how a weaker athlete rides in the slipstream of a stronger leading competitor and overtakes just in the right moment towards the end of the race in order to win the competition.

Forschungszusammenhang (Projekte)

    Rehman, Nafees; Mansmann, Svetlana; Weiler, Andreas; Scholl, Marc H.(2012): Building a Data Warehouse for Twitter Stream Exploration 2012 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining. - IEEE, 2012. - S. 1341-1348. - ISBN 978-1-4673-2497-7

Building a Data Warehouse for Twitter Stream Exploration

×
In the recent year Twitter has evolved into an extremely popular social network and has revolutionized the ways of interacting and exchanging information on the Internet. By making its public stream available through a set of APIs Twitter has triggered a wave of research initiatives aimed at analysis and knowledge discovery from the data about its users and their messaging activities. While most of the projects and tools are tailored towards solving specific tasks, we pursue a goal of providing an application in dependent and universal analytical platform for supporting any kind of analysis and knowledge discovery. We employ the well established data warehousing technology with its underlying multidimensional data model, ETL routine for loading and consolidating data from different sources, OLAP functionality for exploring the data and data mining tools for more sophisticated analysis. In this work we describe the process of transforming the original stream into a set of related multidimensional cubes and demonstrate how the resulting data warehouse can be used for solving a variety of analytical tasks. We expect our proposed approach to be applicable for analyzing the data of other social networks as well.

Forschungszusammenhang (Projekte)

  Nocaj, Arlind; Brandes, Ulrik(2012): Organizing search results with a reference map IEEE Transactions on Visualization and Computer Graphics ; 18 (2012), 12. - S. 2546-2555. - ISSN 1077-2626. - eISSN 1941-0506

Organizing search results with a reference map

×
We propose a method to highlight query hits in hierarchically clustered collections of interrelated items such as digital libraries or knowledge bases. The method is based on the idea that organizing search results similarly to their arrangement on a fixed reference map facilitates orientation and assessment by preserving a user's mental map. Here, the reference map is built from an MDS layout of the items in a Voronoi treemap representing their hierarchical clustering, and we use techniques from dynamic graph layout to align query results with the map. The approach is illustrated on an archive of newspaper articles.

Forschungszusammenhang (Projekte)

    Rehman, Nafees Ur; Mansmann, Svetlana; Weiler, Andreas; Scholl, Marc H.(2012): Discovering Dynamic Classification Hierarchies in OLAP Dimensions Foundations of Intelligent Systems / Chen, Li; Felfernig, Alexander; Liu, Jiming; Raś, Zbigniew W. (Hrsg.). - Berlin : Springer, 2012. - (Lecture Notes in Computer Science ; 7661). - S. 425-434. - ISBN 978-3-642-34623-1

Discovering Dynamic Classification Hierarchies in OLAP Dimensions

×
The standard approach to OLAP requires measures and dimensions of a cube to be known at the design stage. Besides, dimensions are required to be non-volatile, balanced and normalized. These constraints appear too rigid for many data sets, especially semi-structured ones, such as user-generated content in social networks and other web applications. We enrich the multidimensional analysis of such data via content-driven discovery of dimensions and classification hierarchies. Discovered elements are dynamic by nature and evolve along with the underlying data set.<br /><br /><br /><br />We demonstrate the benefits of our approach by building a data warehouse for the public stream of the popular social network and microblogging service Twitter. Our approach allows to classify users by their activity, popularity, behavior as well as to organize messages by topic, impact, origin, method of generation, etc. Such capturing of the dynamic characteristic of the data adds more intelligence to the analysis and extends the limits of OLAP.

Forschungszusammenhang (Projekte)

Mittelgeber
NameKennzifferBeschreibungLaufzeit
Deutsche Forschungsgemeinschaft530/04GRK 1042/101.07.2004 – 31.12.2008
Deutsche Forschungsgemeinschaft501/09GRK 1042/201.01.2009 – 30.06.2013
Weitere Informationen
Laufzeit: 01.07.2004 – 30.06.2013