Spring 2017: MI 562: Problem Solving with Data

[ Home ] [ Schedule ] [ Exams ] [ Resources ] [ Sakai ]

Class meetings: Wednesday, 3:10-5:50pm, SC&I-337.
Instructor: Dr. Chirag Shah
Phone: (848) 932-8807
Office: Room 302 in SC&I
Office hours: Wednesday 2pm, or by appointment
Instructional assistant: Yiwei Wang
Email: yw498@scarletmail.rutgers.edu
Phone: (848) 932-8763
Office: Room 303 (InfoSeeking Lab) in SC&I
Office hours: By appointment

Course Description
As a constant flux of rapidly growing amounts of data is created and used in industries and research environments, there is an increasing demand for individuals and professionals who are able to pursue data-driven thinking and decision-making using meaningful insight derived from large and diverse data. This course offers students a practical introduction to the field of "Data Science," and common methods for quantitative and computational analytics, through which they can have an overview of key concepts, skills, and technologies used by data scientists. While the course covers several programming languages and tools, the focus is on solving problems or "hacking". "Hacking", in this context, refers to being able to find ways to address a problem with anything and everything available to one's disposal. The students will be introduced to several real-life problems that involve collecting and analyzing data, and it is in this context of solving problems that an appropriate set of tools and programming languages, including Python, PHP, R, and MySQL, will be taught.

Previous exposure to a technology course or training, including a beginner level understanding of any programming language, is required. Courses that fulfill such requirements are 17:610:550, 01:198:113, 01:198:211, and 04:547:202. Only one of such courses is required. An exception may be made for a student who could demonstrate technical readiness through some other method, including a technology course taken elsewhere or industry experience. Consult with the course instructor for further information.

Course Materials
There is no textbook for this course. The instructor will provide a companion ebook for free.

Learning Objectives
By the end of the course, students should be able to:
  1. Apply programming languages and tools, such as UNIX commands, MySQL, Python, and R, to collect, clean, process, and analyze data.
  2. Exhibit familiarity with data science methods by learning and experiencing essential algorithms and approaches.
  3. Use statistical methods and visualization techniques to explore and analyze data, and visualize and present the results.
  4. Identify data-driven analytics problems, and design solutions and applications to solve them.

Instructional Methods
This course is about collecting, formatting, and analyzing data to derive important insights about a problem at hand. We will be looking for meaningful patterns in our data in order to find relations among variables of interest, and make predictions. To do so, we will need various programming, database, and statistical tools. At the same time, this course is not meant to provide anyone mastery of those tools. It is expected that the students have at least a basic understanding of them, but appropriate attempts will be made during the course to provide introductions to such tools. To that end the course will be taught as a mixture of lecture, discussion and lab, in an effort to provide an accelerated path to experience. Students will work individually as well as in teams. Occasionally, individuals and teams will swap code, in order to understand the utility of writing clear code and the challenge of working with code written by others. The first half of the course will focus on learning and practicing the platform (UNIX), the database (MySQL), and the coding tools (Python, R). The second half of the course will focus on solving various data science problems using the tools learned in the first part of the course. The course will also teach working with different project development environments such as SVN and Trello.

Labs (assignments), exams, and project
This is a practice-oriented course. That means most of the assignments (homework, in-class, exams) will be based on tackling real-life problems and applying skills learned in the class. The classes and assignments directly relate to the learning objectives (LO). Specifically,
  • LO-1 is associated with units 1-6, which cover UNIX, MySQL, Python, and R.
  • LO-2 corresponds to units 8 to 13, in which three practical problems are explored, as well as Unit 1 to 6.
  • LO-3 specifically refers to statistical methods and visualization techniques in Python and R packages, which will be introduced both general language section (Unit 1 - 6) and practices (8 - 13).
  • LO-4 will be met through the mid-term and the final projects.
  • The weekly assignments will address LO 1-3.
Weekly assignments (45%): The course will have weekly assignments – given with each class. The assignment will typically be an extension of what is covered during that week. In other words, a typical assignment will ask to take what was taught and practiced during the class and take it a few steps further. One can expect to spend roughly 5-8 hours a week to work on an assignment.
Mid-term project (20%): The mid-term exam will take place in the seventh week. It will be open-book exam, which means you can use any and all resources you like, including online. If you have to miss that particular class, contact the instructor to find an alternative time and place for you to take the mid-term exam.
Final project (30%): The final project will be done in individual (or team). After mid-term exam, you will be given time to find topic of interest with writing a brief proposal, then you will do presentation of the project with written report.

Course Assessment
The content of this course is best understood by assimilating the lectures, by readings, by analyzing examples and by practice. The assessment for this course is based on a series of assignments that match the real-world process and on class participation. Assignments are of two types: smaller exercises and a multi-part course project. There will also be exercises that are not graded - in all cases, you will later use the same techniques/methods as a part of your project. Class participation includes participation in discussions; reading descriptions. Course grades are assigned according to the following:
  • A   (91-100%): Outstanding and excellent work of the highest standard, mastery of the topic, evidence of clear thinking, good writing, work submitted on time, well organized and polished.
  • B+ (85-90%:) Very good work, substantially better than the minimum standard, very good knowledge of the topic; error free.
  • B  (80-84%): Good work, better than the minimum standard, good knowledge of the topic.
  • C+ (74-79%): Minimum standard work, adequate knowledge of the topic.
  • C   (70-73%): Work barely meeting the minimum standard, barely adequate knowledge of the topic; errors.
  • D   (65-69%)  Writing not up to standard, disorganized, many errors.
  • F  (< 65%): Unacceptable, inadequate work.
  • T Temporary.
The final grade will be weighted based on the following: Assignments: 45%, Mid-term project: 20%, Final project: 30%, Class participation: 5%.

Course Policies
Announcements: Students are responsible for all announcements made in class, whether or not they are present when the announcements are made.
Late submissions: Deadlines are your responsibility. Late submissions may be accepted with a penalty. In the case of unforeseen emergencies (e.g. with a doctor's note), or with a prior permission from the instructor (obtained before the due date), late submissions will be graded normally. Late submissions will not receive any verbal or written feedback.
Communication: For emails, Rutgers accounts preferred. Always include your name (esp. if emailing from non-Rutgers account) and always include the course number (MI 562) in subject line. If you don't, your email most likely will not be read. This course uses Sakai, primarily for submitting assignments and posting grades. Speaking of communication, please turn off or silent your cellphones and anything that can spontaneously make noise before entering the class.
Attendance: Students are expected to attend all classes. If you expect to miss one or two classes, please use the University absence reporting website https://sims.rutgers.edu/ssra/ to indicate the date and reason for your absence. An email is automatically sent to me. Note that class participation accounts for 5% of the final grade (see the grading policy above). You are responsible for obtaining any material that might have been distributed in class the day when you were absent.

Academic Integrity
Academic integrity means, among other things:
  • Develop and write all of your own assignments.
  • Show in detail where the materials you use in your papers come from. Create citations whether you are paraphrasing authors or quoting them directly. Be sure always to show source and page number within the assignment and include a bibliography in the back.
  • Do not look over at the exams of others or use electronic equipment such as cell phones or MP3 players during exams.
  • Do not fabricate information or citations in your work.
  • Do not facilitate academic dishonesty for another student by allowing your own work to be submitted by others.
If you are doubtful about any issue related to plagiarism or scholastic dishonesty, please discuss it with the instructor.
The consequences of scholastic dishonesty are very serious. Rutgers' academic integrity policy is at this site. An overview of this policy may be found here. Multimedia presentations about academic integrity may be found here and here.

How to Succeed in this Course
  • Successful students will attend class regularly. If you know you must miss a class, please contact the instructor in advance, either by phone or email. You can obtain assignments or notes from a fellow classmate or from the instructor. In the case of a prolonged absence from class, you should schedule an appointment with the instructor so we can discuss the course material and concepts that you missed.
  • Successful students will pay close attention to the course goals and objectives, because they will help you master the course material. If you have any questions about any of the objectives, please ask the instructor. Questions are encouraged during class for clarification. Remember that you're probably not the only one in the class with the same question. If you have questions about material from previous classes, please email me prior to the next class session, and I'll address your question at the beginning of the class session, prior to any quizzes.
  • Successful students will talk to their classmates about the course material. You will find that they can help you understand many complex issues.
  • Successful students will come prepared to the class with assigned readings for that class. This will help you comprehend the material for that class better. Regular assignments will also be given at the end of each class. Doing these assignments and turning them on time (typically before the next class), will help you obtain higher-order learning goals for this course.

  1. Access the class material promptly and on time.
  2. Respect yourself, classmates, and the instructor.
  3. Participate in class discussions.
  4. Display preparedness for class through completing reading assignments.
  5. Present content knowledgeably with supported reasoning.

Biographical Information about the Instructor
Chirag Shah is an Associate Professor in both the School of Communication & Information (SC&I) and the Department of Computer Science at Rutgers University. His research interests include information seeking/retrieval in social and collaborative contexts. Shah received a PhD in information science from the University of North Carolina (UNC) at Chapel Hill. He directs the InfoSeeking Lab at Rutgers where he investigates issues related to information seeking, interactive information retrieval, and social media, supported by grants from National Science Foundation (NSF), Institute of Museum and Library Services (IMLS), Google, and Yahoo!

Chirag Shah