Explosion of social network activity in the recent years has led to generation of massive volumes of user-related data, such as status updates, messaging, blog and forum entries, recommendations, connection requests and suggestions, etc. and has given birth to novel analysis areas, such as social media analysis and social network analysis. This phenomenon can be viewed as a part of the Big Data challenge, which is to cope with the rising flood of digital data from many sources, including mobile phones, internet, videos, e-mails, and social network communication. The generated content is heterogeneous and encompasses textual, numeric, and multimedia data. Companies and institutions worldwide anticipate to gain valuable insights from big data and hope to improve their marketing, customer services and public relations with the help of the acquired knowledge.
The established data warehousing technology with On-Line Analytical Processing (OLAP) and data mining (DM) functionality is known for its universality and high performance, but also for its rigidness and limitations when it comes to semi-structured, unstructured or complex data. Various solutions have been proposed in theory and practice for warehousing and analyzing heterogeneous data. One class of solutions focuses on extending the capabilities of the predominant technologies, i.e., relational and multidimensional databases. Our approach is based on (1) discovering facts, dimensions and hierarchies from semi-structured and unstructured data. (2) Enriching the outcome by exploiting text mining techniques (e.g., Entity Detection, Language Detection, Sentiment Analysis etc.) And (3) extending the obtained structures via content-driven discovery of additional data characteristics. The benefit of obtaining a properly structured and consolidated dataset lies in the ability to use the standard tools for data analysis, visualization, and mining for performing a variety of analysis tasks. This work enables OLAP for social media analysis and offers a new social dimension to the existing business data in the warehouse, which allows new and potentially useful insights to the data.