The Internet is a pervasive medium, available throughout the world and accessible by a significant percentage of mankind. It is increasingly used for business, advertising, dissemination of news, collaborative and social activity, and in many other ways. The Internet is immediate with its content ever-changing; however, the verifiability, origin, or completeness of much that is published is questionable or uncertain. In addition, its publicly available content is often fragmented and incomplete. The Internet is increasingly used as a source of information, insight, knowledge, or social comment. Although it was initially text-dominated, the Internet is increasingly becoming the province of other media, including images, video, and sound. Much of this content is unstructured (over 95%) and thus inaccessible via traditional means of automated analysis and organization, although there is often at least an inferred structure within and sometimes among Internet documents and pages.
The main objective of this project is to develop novel and important capabilities for effectively analyzing and understanding Web pages that contain or link together multimedia content. In particular, this content will comprise text, images, video, and audio. The plan is to make these visual analytics tools easy to use and to place them in the hands of investigative analysts. The automated analyses will be closely coupled to user-directed exploratory visual analyses for the purpose of discovery, gaining insight, interpretation, and understanding. To maintain speed and flexibility, the automated analysis will be mostly semantic-free; the user-directed visual analyses will provide semantics and meaning, which can then be fed back into both the automated and interactive processes. This coupling is necessary because we cannot expect that automated analysis will be able to discover meanings and relations or provide the insights that humans can. On the other hand, automated analysis is required as a first step and also perhaps afterward (under user direction) because initial collection, organization, and interrelation of thematic content for millions of multimedia objects is too expensive and time-consuming for human analysts.