1. New Paradigms in Information Retrieval and Web Search (Dr. Gao)

With the rapid growth of the Web and IR systems in scale, traditional algorithmic search paradigms become increasingly inadequate, and new paradigms need to be investigated to prepare for the next-generation search engines. In this project, we explore one or multiple untypical search paradigms in the following directions: (a) personalized and mass-collaborative search, which allows direct user editing on the ranks and organization of search results, leveraging the power of mass collaboration to improve search performance. (b) two-box search, which utilizes user-input contextual terms for query disambiguation, making it possible to prefer other senses than the dominant sense without altering the original query. (c) clustering-based faceted search, which uses clustering techniques to generate topic-coherent groups from diverse search results and facilitate faceted browsing. (d) scalable keyword search over graphs, which targets a generic scalability solution for any answer model of keyword search over graphs.

2. Craigslist-mate: Structurizing Search for Online Communities (Dr. Gao)

Search facilities are an important component in unstructured data management and community information management, which are new trends in Database research. Online community data are generally unstructured. Leveraging extracted structures can significantly improve search performance. In this project, we focus on structuring search for Craigslist by applying online, real time information extraction (IE) to unpublished data, in contrast to off-line IE to published data. In other words, our techniques should automatically fill out forms for users who are editing free-text ads in a timely and accurate manner. Craigslist-mate will seek to provide other convenient tools to help Craigslist users. For example, automatic matching of selling ads with buying ads. Another example is faciilitating situational chat that brings real time interaction between buyers and sellers.

3. WEAvE: Web Exploration and Analytic Engine for Scientists (Dr. Ngu)

In this project, we investigate a new paradigm in the retrieval and discovery of Deep Web sources based on a rich service class description. This will lead to the development of an automated utility for integration and access to a large number of dynamically changing scientific deep Web sources. Today's search engines can index and retrieve surface Web sources, which are static html pages on the Web.  But they cannot retrieve content in Deep Web sources, which are dynamically generated html pages from searchable databases or documents. In the scientific domain, the continuous proliferation of Deep Web sources poses even more challenge to their effective usage by scientists.

4. Exploration of New Paradigms in Multimedia Information (Image/Video) Retrieval (Dr. Lu)

Currently, image and video retrieval is one of the hottest research topics in information retrieval since it has become one of the most popular services in many search engines, such as Bing, Google and Yahoo!. However, most of the existing techniques are mainly based on textual information. This is due to the fact that text-based search techniques are mature while image visual content information is difficult or expensive to exploit. Hence, how to represent an image as a ``text" document becomes a key and interesting problem. With the popularity of 3D cameras, 3D TV and 3D displayer, in the new future, more and more customer 3D images will be generated and shared on Internet.  3D image understanding and retrieval becomes a new and hot topic. This project is to explore these new paradigms in multimedia information retreival: 1) build compact and descriptive visual elements for images that function similar to textual words; 2) explore techniques to improve 2D image understanding by cooporating 3D information and develop new 3D image retrieval algorithms.

5. Spatio-Temporal Data Management as a Foundation for Information Retrieval Systems (Dr. McKenney)

Spatio-temporal databases, sometimes called moving objects databases, manage and store information representing data with a geographic or spatial component that change over time.  For example, a hurricane has a spatial component describing the region in which high wind speeds occur embedded in geographic reality, and a temporal component reflecting the changing shape and position of the hurricane over time. The representation of a hurricane as a moving object must address the spatial aspects of the hurricane's position in space, its thematic aspects, and its motion through space in an infinite temporal resolution.  A spatio-temporal database provides the foundational data management, querying, and operational capabilities to support information retrieval tasks over spatio-temporal data.

The amount of spatio-temporal data available is continuously growing with the development of sensor and imaging technology.  For example, data describing hurricanes, rain clouds, temperature zones, migratory patterns of birds, and trajectories of vehicles are publicly available from various sources.  However, spatio-temporal databases to manage and analyze this data, especially data that take the form of regions in space, do not exist.  The objective of this project is to develop new data models and implementation technologies to integrate spatio-temporal data into existing spatial databases, to populate the resulting databases with real world data that is freely available, and to investigate spatio-temporal information retrieval and analysis technologies with the implemented databases. This project represents a paradigm shift in spatio-temporal database work since our recent research allows a radically new direction of moving object representation and processing.  Furthermore, new data models will be designed with a goal of performance, allowing efficient algorithms for information retrieval to be built upon this work.