Datasets


Collaborative Information Seeking Lab Experiments Dataset

The dataset being shared from a set of lab experiments conducted by Chirag Shah and Roberto Gonzalez-Ibanez at Rutgers University in 2010-2011. It contains interaction logs (queries, page visits, relevance judgments, and snippets collected) by a total of 160 participants (students) in 80 teams, with each team working on an exploratory search task for about 30 minutes in a controlled lab setting. A reasonable care has been taken to clean up the data and remove any identifying information about the participants, but anyone using this data should assume it to be their responsibility to do due diligence regarding this. Any publications and presentations resulting from the usage of this data must cite and/or acknowledge this data. It took months to design the study and the tools used for the experiments, and several more months to collect the data. How to cite this data:
Shah, C., & Gonzalez-Ibanez, R. (2017). Collaborative Information Seeking Lab Experiments Dataset. Available from http://infoseeking.org/data.php#cis2010

Papers to cite:

  • Shah, C., and Gonzalez-Ibanez, R. (2011). Evaluating the synergic effect of collaboration in information seeking. Proceedings of ACM SIGIR, pp. 913-922. Beijing, China.
  • Shah, C., Gonzalez-Ibanez, R. (2012). Spatial context in collaborative information seeking. Journal of Information Science (JIS). 38(4), 333-349.
  • Gonzalez-Ibanez, R., Haseki, M., and Shah, C. (2013). Let's search together, but not too close! An analysis of communication and performance in collaborative information seeking. Information Processing & Management, 49(5), 1165-1179.

Format: MySQL; Size: 881 KB (with documentation); Download link (zip); Documentation


Community Q&A Experiments Dataset

This is the dataset that Chirag Shah collected from Yahoo! Answers and got it assessed for content (answer) quality using Amazon's Mechanical Turk. The data was used in a SIGIR 2010 paper. A reasonable care has been taken to clean up the data and remove any identifying information about the participants, but anyone using this data should assume it to be their responsibility to do due diligence regarding this. Any publications and presentations resulting from the usage of this data must cite and/or acknowledge this data. How to cite this data:
Shah, C. (2017). Community Question-Answering (CQA) Dataset. Available from http://infoseeking.org/data.php#cqa2010

Papers to cite:

  • Shah, C., & Pomerantz, J. (2010). Evaluating and predicting answer quality in community QA. In Proceedings of the 33rd international ACM SIGIR conference on Research and development in information retrieval (pp. 411-418). ACM.

Format: CSV; Size: 239KB; Download link (zip)