|As a constant flux of rapidly growing amounts of data is created and used in industries and research environments, there is an increasing demand for individuals and professionals who are able to pursue data-driven thinking and decision-making using meaningful insight derived from large and diverse data. This course offers students a practical introduction to the field of "Data Science," and common methods for quantitative and computational analytics, through which they can have an overview of key concepts, skills, and technologies used by data scientists. While the course covers several programming languages and tools, the focus is on solving problems or "hacking". "Hacking", in this context, refers to being able to find ways to address a problem with anything and everything available to one's disposal. The students will be introduced to several real-life problems that involve collecting and analyzing data, and it is in this context of solving problems that an appropriate set of tools and programming languages, including Python, PHP, R, and MySQL, will be taught.|
|Previous exposure to a technology course or training, including a beginner level understanding of any programming language, is required. Courses that fulfill such requirements are 17:610:550, 01:198:113, 01:198:211, and 04:547:202. Only one of such courses is required. An exception may be made for a student who could demonstrate technical readiness through some other method, including a technology course taken elsewhere or industry experience. Consult with the course instructor for further information.|
|There is no textbook for this course. The instructor will provide a companion ebook for free.|
|By the end of the course, students should be able to:
|This course is about collecting, formatting, and analyzing data to derive important insights about a problem at hand. We will be looking for meaningful patterns in our data in order to find relations among variables of interest, and make predictions. To do so, we will need various programming, database, and statistical tools. At the same time, this course is not meant to provide anyone mastery of those tools. It is expected that the students have at least a basic understanding of them, but appropriate attempts will be made during the course to provide introductions to such tools. To that end the course will be taught as a mixture of lecture, discussion and lab, in an effort to provide an accelerated path to experience. Students will work individually as well as in teams. Occasionally, individuals and teams will swap code, in order to understand the utility of writing clear code and the challenge of working with code written by others. The first half of the course will focus on learning and practicing the platform (UNIX), the database (MySQL), and the coding tools (Python, R). The second half of the course will focus on solving various data science problems using the tools learned in the first part of the course. The course will also teach working with different project development environments such as SVN and Trello.|
|Labs (assignments), exams, and project|
|This is a practice-oriented course. That means most of the assignments (homework, in-class, exams) will be based on tackling real-life problems and applying skills learned in the class. The classes and assignments directly relate to the learning objectives (LO). Specifically,
|Weekly assignments (50%): The course will have weekly assignments – given with each class. The assignment will typically be an extension of what is covered during that week. In other words, a typical assignment will ask to take what was taught and practiced during the class and take it a few steps further. One can expect to spend roughly 5-8 hours a week to work on an assignment.|
|Mid-term project (20%): The mid-term exam will take place during the seventh week. It will be open-book exam, which means you can use any and all resources you like, including online.|
|Final project (30%): The final project will be done in individual (or team). After mid-term exam, you will be given time to find topic of interest with writing a brief proposal, then you will do presentation of the project with written report.|
|The content of this course is best understood by assimilating the lectures, by readings, by analyzing examples and by practice. The assessment for this course is based on a series of assignments that match the real-world process and on class participation. Assignments are of two types: smaller exercises and a multi-part course project. There will also be exercises that are not graded - in all cases, you will later use the same techniques/methods as a part of your project. Class participation includes participation in discussions; reading descriptions. Course grades are assigned according to the following:
|Announcements: Students are responsible for all announcements made in class, whether or not they are present when the announcements are made.|
Late submissions: Deadlines are your responsibility. Late submissions may be accepted with a penalty. In the case of unforeseen emergencies (e.g. with a doctor's note), or with a prior permission from the instructor (obtained before the due date), late submissions will be graded normally. Late submissions will not receive any verbal or written feedback.
Communication: For emails, Rutgers accounts preferred. Always include your name (esp. if emailing from non-Rutgers account) and always include the course number (MI 562) in subject line. If you don't, your email most likely will not be read. This course uses Sakai, primarily for submitting assignments and posting grades. Speaking of communication, please turn off or silent your cellphones and anything that can spontaneously make noise before entering the class.
Attendance: Students are expected to attend all classes. If you expect to miss one or two classes, please use the University absence reporting website https://sims.rutgers.edu/ssra/ to indicate the date and reason for your absence. An email is automatically sent to me. Note that class participation accounts for 5% of the final grade (see the grading policy above). You are responsible for obtaining any material that might have been distributed in class the day when you were absent.
|Academic integrity means, among other things:
The consequences of scholastic dishonesty are very serious. Rutgers' academic integrity policy is at this site. An overview of this policy may be found here. Multimedia presentations about academic integrity may be found here and here.
|How to Succeed in this Course|
|Biographical Information about the Instructor|
|Chirag Shah is an Associate Professor in both the School of Communication & Information (SC&I) and the Department of Computer Science at Rutgers University. His research interests include information seeking/retrieval in social and collaborative contexts. Shah received a PhD in information science from the University of North Carolina (UNC) at Chapel Hill. He directs the InfoSeeking Lab at Rutgers where he investigates issues related to information seeking, interactive information retrieval, and social media, supported by grants from National Science Foundation (NSF), Institute of Museum and Library Services (IMLS), Google, and Yahoo!|
|● Chirag Shah ●|