Research Projects

Current Projects

It is natural for humans to collaborate while dealing with complex problems. For my research I consider this process of collaboration in the context of information seeking. This research is driven by two dissatisfactions: (1) majority of IR systems today do not facilitate collaboration directly, and (2) the concept of collaboration itself is not well-understood. In the past I have worked with Diane Kelly and Robert Fu on the issues of fusing queries from different users to provide a "collaborative" searching. My work at FXPAL during summer 2007 involved looking at mediating the collaboration algorithmically. At present, at UNC, with Gary Marchionini and Diane Kelly, I am exploring Collaborative Information Seeking (CIS) with even broader perspectives, trying to understand how people collaborate and what kinds of tools and processes they can benefit from.

In today's Web 2.0 era, people are not only consuming the information, but also are producing it. People are seeking more meaningful and customized information than what is obtained by keywords-based queries and document retrieval through a search engine. I have been studying the domain of Social Q&A. In particular, I have build crawlers to obtain a large amount of data (questions, answers, comments, user profiles) from Google Answers and Yahoo! Answers. With this data we have done some interesting analysis on how people seek information on such social sites and also contribute to other users' information seeking.

People Analytics is a research project that aims to investigate and use various signals generated by people to study their behavior. These signals include, but not limited to, social media data, implicit and explicit actions performed by people online, as well as body sensors logs. The analytics to understand and influence people come from two types of signals: social media, and mobile and wearable devices.

Social Media Data Mining and Analytics with SOCRATES

While People Analytics is about collecting and analyzing all sorts of signals generated by people, currently we are focusing on social media signals. To help us and other scholars interested in this kind of work, we are developing a new framework called SOCRATES. SOcial and CRowdsourced AcTivities Extraction System (SOCRATES) is meant to be a robust, highly usable social-computational platform that will transform the manner in which researchers and educators track, capture, visualize, explore, and analyze social media data and annotations.

Behavioral Modeling and Prediction with Wearable and Mobile Devices

Emerging trends in smartphones and wearable devices (smartwatches, fitness trackers) allow for creation of a rich user behavioral profile for users as they engage in search tasks. Such a behavioral (social, mobile, affective, and cognitive) profile goes beyond traditional browser-based context and allows one's personality type and social capital to become pivotal predictors of search behavior and performance. We are working on collecting and analyzing data from wearable and mobile devices, in addition to the Web logs and social media streams, to identify various behavioral markers. This could help us understand individuals at a level not possible before. The new knowledge gained by this could help us predict various behavioral patterns about an individual, including their searching/browsing and consumption behaviors. For instance, we may find that one's social capital is positively correlated to one's ability to cover novel information, but negatively correlated to one's overall search performance. We may find that when an extrovert collaborates with an introvert, they have a higher likelihood of learning and unique coverage than extrovert-extrovert or introvert-introvert pairs.

Past Projects

Result Space Support for Personal and Group Information Seeking Over Time (2008-2010)
With Gary Marchionini and Rob Capra at UNC, I was involved in writing an NSF grant proposal, which has been funded to develop techniques and systems that help people solve information problems that are complex, general, or ongoing and when information seeking takes place over multiple intervals or in collaboration with other people. The approach is to first study how people seek information and interpret results of searches as they use multiple systems over time and in collaboration with emphasis given to managing and optionally sharing result sets and items. Second, based on these initial investigations we will build systems that support dynamic search and visualization and can serve both as a personal information manager and a group information manager. Third, we will evaluate these tools in field and laboratory settings. The research is linked to educational theories of active learning and are embedded in university student and research team information needs over multiple months. The results of this research will provide guidance for designers of the next generation of systems that support a full range of information seeking needs. The project will also contribute specific open source tools that people can easily adopt as plug-ins to popular web browsing software. This work will thus have broad impact on Internet-based information activities in schools, homes, offices, and research laboratories. More details can be found here.

Context Mining for Digital Preservation (2006-2010)
Digitizing any information has become inexpensive and accessible. Just like advancements in bio-sciences and eco-sciences have made it possible to save endangered species, digital revolution has made it possible to preserve valuable information of varying nature.

At UNC Chapel Hill, I was a member of VidArch project that broadly addressed various issues related to preserving video collections. Under the guidance of my advisor Gary Marchionini, I implemented a prototype system called ContextMiner that searched specialized databases based on various metadata fields and aids a curator in building a digital repository.

Faceted Interfaces and Exploratory Systems (2006-2007)
Are we all "Googlized"? Why do we see search replacing exploration? How do we study systems that employ exploratory interfaces and make them better?

With my advisor Gary Marchionini at UNC Chapel Hill, I am investigating the effectiveness of faceted systems and various issues related to their implementation. I have implemented a faceted interface to OpenVideo website using Flamenco, which can be seen here, and now I am working on doing the same using Relational Browser. I have prepared a proposal to conduct a user study in the spring to evaluate various aspects of exploratory systems.

User/System Relevance (2006-2007)
We use relevance in IR all the time, but what exactly it is? This is a tricky and highly subjective notion. I am interested in understanding how much a system/algorithm's evaluation about relevance differs from a user's perception about it.

With my advisor Diane Kelly, I am analyzing some data from a user study to investigate this issue. We have data from a user study where users had marked terms that they found relevant from a list of terms that a system suggested for a query.

Perception about Search Engines (2006-2007)
What are the factors that people like about their favorite search engine? Do they really realize how a search engine is good or bad in a technical sense? Can a user identify the difference between different levels of precision in a rank list?

My advisor Diane Kelly, fellow grad student Robert Fu, and myself recently conducted a user study to investigate these questions. At present we are analyzing the rich data that we have collected from this study.

Terminological Feedback (2005-2006)
This work, that investigates the evaluation of terminological feedback systems and their construction, was done as an independent project with another grad student Ramesh Nallapatti at UMass. It is presented in this technical report.

Suggesting related terms for a given query has been a popular method for improving the query performance. However, the effectiveness of these feedback terms has remained quite subjective. While the most have done evaluation with user studies, there have been some efforts that evaluate terminological feedback in different ways. We looked at this problem from the retrieval point of view. We argued that the feedback terms should not only be relevant, but should also cover unique aspects of the original query. Considering these requirements, we proposed some new measures for evaluating the performance. We established the reliability of our measures by showing their high correlation with the measures used when the judgments for the terms are available. We then analyzed some techniques for term selection and compare their performance using our evaluation measures. Using the characteristics of these techniques, we developed a new method for finding feedback terms. The results on TREC data with 150 queries showed that our proposed method improves the overall performance.

Topic Detection and Tracking (TDT) (2004-2006)
The Topic Detection and Tracking (TDT) research has provided a standard platform for addressing event-based organization of broadcast news and evaluating such systems. The governing motivation behind such research was to provide a core technology for a system that would monitor broadcast news and alert an analyst to new and interesting events happening around the world. The research program of TDT focuses on five tasks: story segmentation, first story detection, cluster detection, tracking, and story link detection. Each is viewed as a component technology whose solution will help address the broader problem of event-based news organization.

I started working on TDT in Fall 2004 with James Allen at UMass. We presented our results at TDT 2004. I continued working on TDT, mainly focusing on Story Link Detection (SLD).

Story Link Detection (SLD) (2004-2006)
The Story Link Detection (SLD) task evaluates a TDT system that detects if two stories are "linked" by the same event. For TDT, two stories are linked if they discuss the same event. Unlike other TDT tasks, link detection was not motivated by a hypothetical application, but rather the task of detecting when stories are linked is a "kernel" function from which the other TDT tasks can be built.

I started working on SLD under the guidance of W. Bruce Croft and David Jensen at UMass in Spring 2005. We focused on the document representation aspect for SLD. The results were presented in a poster at CIKM 2006. I continued working on SLD during summer 2006 as a visiting research fellow at National Institute of Informatics (NII) at Tokyo, Japan. The work with Koji Eguchi at NII resulted in a paper that will be presented at ECIR 2007.

High Accuracy Retrieval (2003-2004)
This work was done in my first year at UMass (Fall 2003-Spring 2004) with my advisor W. Bruce Croft. The paper based on this work was presented at SIGIR 2004.

Although information retrieval research has always been concerned with improving the effectiveness of search, in some applications, such as information analysis, a more specific requirement exists for high accuracy retrieval. This means that achieving high precision in the top document ranks is paramount. In this work we aimed at achieving high accuracy in ad-hoc document retrieval by incorporating approaches from question answering (QA). We focused on getting the first relevant result as high as possible in the ranked list and argue that traditional precision and recall are not appropriate measures for evaluating this task. We instead used the mean reciprocal rank (MRR) of the first relevant result. We evaluated three different methods for modifying queries to achieve high accuracy. The experiments done on TREC data provided support for the approach of using MRR and incorporating QA techniques for getting high accuracy in ad-hoc retrieval task.