Tools

I have developed several tools to support and promote research in various domains of information seeking and use. All the tools are created with an open-source model, and available for free under Creative Commons Licenses from GitHub. See here how these tools have helped a number of individuals and organizations around the world in their projects.

[Research works citing and/or using Coagmento: 236 as per Google Scholar, February 19, 2017]

Coagmento is a browser plugin that installs (1) a toolbar that contains buttons to bookmark pages, save snippets of text, open a collaborative text editor, and (2) a sidebar displays the groups history of bookmarks and snippets saved, as well as group chat. It also includes an integrated collaborative text editor. In addition, Coagmento provides a Web-space, called CSpace, where one could visualize and manipulate collected information - individually or collectively. Coagmento also has iPhone, iPad, and Android apps.

As a research tool, Coagmento has been used in many studies involving individual and collaborative information seeking. This usage has been world-wide, with research groups from the UK, Canada, China, The Netherlands, Chile, and the US.

SOcial and CRowdsourced AcTivities Extraction System (SOCRATES) is a robust, highly usable social-computational platform that is meant to transform the manner in which researchers and educators track, capture, visualize, explore, and analyze social media data and annotations.

SOCRATES has three primary components: Collect, Analyze, and Explore. Currently SOCRATES can collect data from Facebook, Twitter, YouTube, Flickr, reddit, and New York Times. SOCRATES provides a number of analytics tools ranging from correlation to sentiment analysis.

[Research works citing and/or using ContextMiner: 70 as per Google Scholar, February 19, 2017]

ContextMiner is a framework to collect, analyze, and present the contextual information along with the data. It is based on an idea that while describing or archiving an object, contextual information helps to make sense of that object or to preserve it better. The ContextMiner website provides tools to collect data, metadata, and contextual information off the Web by automated crawls. At present, ContextMiner supports automated crawls from blogs, YouTube, Flickr, Twitter, and open Web. It also collects inlinks information for YouTube videos from the Web. Additional sources will continue to be added.

Let's say you are interested in what people are posting and saying about the recent outbreak of H1N1 virus. With ContextMiner, you can setup a campaign, say "H1N1 outbreak". Within this campaign, you can add queries that you want ContextMiner to keep running on sources such as blogs, Twitter, YouTube, and Flickr. In this case, queries could be 'swine flu', 'H1N1 virus', etc. ContextMiner will keep running these queries periodically (as per your preference) on various sources indicated before, extract and store data for you in a structured format. You may also want ContextMiner to monitor certain specific webpages. Once you provide these queries and URLs, ContextMiner continues monitoring them without any intervention from your side. Later you can come back to ContextMiner to see what your campaign has collected. You can filter or search into your collection, and even export it for further analysis with other tools of your choice.

ContextMiner is available as a Web-based service as well as a downloadable software.

[Research works citing and/or using TubeKit: 59 as per Google Scholar, February 19, 2017]

TubeKit is a toolkit for creating YouTube crawlers. It allows one to build one's own crawler that can crawl YouTube based on a set of seed queries and collect up to 17 different attributes. TubeKit assists in all the phases of this process starting database creation to finally giving access to the collected data with browsing and searching interfaces.

In addition to a suit of components to perform query-based YouTube crawling, Tubekit includes various tools allowing one grab various forms of data off YouTube without running queries. This data includes YouTube videos, video attributes, and user profiles.

Tubekit is free to use. It is an open-source project and distributed under a Creative Commons license.

[Research works citing and/or using InfoExtractor: 47 as per Google Scholar, February 19, 2017]

InfoExtractor is a framework to extract relevant information from various sources such as blogs, YouTube, and Twitter.

As a web service, InfoExtractor helps one extract structured information from a supplied URL. For example, one can enter a URL of a YouTube video and InfoExtractor will extract a number of associated attributes (title, tags, view count, comments, etc.) in a format that can be easily exported, analyzed, or plugged into something else.

Information Retrieval and Interaction System (IRIS) is a framework for investigating information retrieval and interactive activities, and a toolkit for implementing them.

For the developers, IRIS toolkit is provided as a set of APIs open to the public. It is open source and easily extendable. IRIS provides a set of simple yet modular document operators. These operators can be combined in various ways to create more interesting and advanced functionality. Our intentions are for these combinations to produce higher levels of abstraction for information retrieval. Eventually we hope that even very abstract concepts like "making sense" can be realized.

For the end-users, IRIS is available as a browser plug-in. As an extension to a browser, it can help a user make sense out of information they encounter on the Web with a click of a button.


DiscoverInfo (Discontinued)

DiscoverInfo is a unique tool to explore a collection of documents (currently, The North Carolina Election of 1898 from UNC Library).

With DiscoverInfo interface, one can do full text search in the collection. DiscoverInfo indexes text, HTML, XML, and PDF documents. The system prepares term cloud based on the term occurrences in the collection as well as across the documents. These clouds can provide a good overview of the underlying collection. One can browse through the clickable term clouds to discover documents. In addition to this, the system not only retrieves relevant information from the indexed collection, but can also evaluate how novel some information (here, document) is with respect to other documents. This can help one in discovering not only the relevant, but also novel information.

DIToolkit (Discontinued)

DiscoverInfo is a tool to visualize a collection. It grabs a website, indexes it, and prepares for browsing, which includes typical IR search, term clouds, and novelty visualization.