Course Description

Foundations of Data Science combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social issues surrounding data analysis such as privacy and design.

Prerequisites

This course does not have any prerequisites beyond high-school algebra. The curriculum and format is designed specifically for students who have not previously taken statistics or computer science. Students with some prior experience in either statistics or computing are welcome to enroll, though some parts of the course will be slow. Students who have taken both statistics and computer science courses at Cal before should take a more advanced course instead.

**Materials & Resources**

Our primary text is called Computational and Inferential Thinking: The Foundations of Data Science. This text was written for the course by a team of course instructors.

The computing platform for the course is hosted at data8.berkeley.edu. Students are not required to have their own computer, but most choose to use their own computer for the course. Using the Google Chrome browser, navigate to data8.berkeley.edu from any machine to use the course computing platform. If you do not have access, notify the course staff.

**Support**

You are not alone in this course; the staff and instructors exist to support you as you learn the material. It's expected that some aspects of the course will take time to master, and the best way to master challenging material is to ask questions. For online questions, use Piazza. We will also hold office hours in the afternoons for in-person discussions.

**Labs**

Weekly labs are a required part of the course and should be submitted during your lab session. If you cannot attend lab, you may complete lab assignments remotely, but attendance is highly recommended. Each person must submit each lab independently, but you are welcome to collaborate with other students in your lab room. If you choose to complete labs remotely, you must work alone. Labs are due at 7pm on the Friday of the week that the lab is released.

**Projects**

Data science is about analyzing real-world data sets, and so a series of projects involving real data are a required part of the course. You may work with a single partner on all projects, and we strongly recommend that you find a partner in your lab section. The course staff will help pair up students during the first few labs.

To ensure that students understand and progress through the projects, students will meet with course tutors during the project periods to share their work and receive guidance.

**Homework**

Weekly homework assignments are a required part of the course and should be submitted to gradescope.com. Each student must submit each homework independently, but you are allowed to discuss problems with other students.

Homework is due on Thursday at 5pm, but an early submission bonus point is given to all students who submit their homework by Wednesday at 5pm.

**Exams**

The midterm exam will be held in class 10:10am-10:55am on Wednesday, March 16. The final exam will be held 3pm-6pm Tuesday, May 10. You must take both exams or receive special accommodations from the instructor by January 31 in order to take the course.

**Grading**

Grades will be assigned using the following weighted components:

For students who have taken both a statistics and a computer science course already (excluding CS 10) and are taking Data 8 for a letter grade, an additional independent project will be required, and final grades will be influenced strongly by the quality of this project. I expect to give very few high grades to such students — you should be taking a more advanced course.

**Learning Cooperatively**

**Academic Honesty**

Late Submission

If you want to receive credit for an assignment that you will turn in after the deadline, you must ask your GSI before the deadline. Otherwise, late homework & lab will not be accepted. Late projects will be accepted for half credit. Extensions will only be offered in advance of the deadline and for exceptional circumstances.

**A Parting Thought**

Foundations of Data Science combines three perspectives: inferential thinking, computational thinking, and real-world relevance. Given data arising from some real-world phenomenon, how does one analyze that data so as to understand that phenomenon? The course teaches critical concepts and skills in computer programming and statistical inference, in conjunction with hands-on analysis of real-world datasets, including economic data, document collections, geographical data, and social networks. It delves into social issues surrounding data analysis such as privacy and design.

Prerequisites

This course does not have any prerequisites beyond high-school algebra. The curriculum and format is designed specifically for students who have not previously taken statistics or computer science. Students with some prior experience in either statistics or computing are welcome to enroll, though some parts of the course will be slow. Students who have taken both statistics and computer science courses at Cal before should take a more advanced course instead.

Our primary text is called Computational and Inferential Thinking: The Foundations of Data Science. This text was written for the course by a team of course instructors.

The computing platform for the course is hosted at data8.berkeley.edu. Students are not required to have their own computer, but most choose to use their own computer for the course. Using the Google Chrome browser, navigate to data8.berkeley.edu from any machine to use the course computing platform. If you do not have access, notify the course staff.

You are not alone in this course; the staff and instructors exist to support you as you learn the material. It's expected that some aspects of the course will take time to master, and the best way to master challenging material is to ask questions. For online questions, use Piazza. We will also hold office hours in the afternoons for in-person discussions.

Weekly labs are a required part of the course and should be submitted during your lab session. If you cannot attend lab, you may complete lab assignments remotely, but attendance is highly recommended. Each person must submit each lab independently, but you are welcome to collaborate with other students in your lab room. If you choose to complete labs remotely, you must work alone. Labs are due at 7pm on the Friday of the week that the lab is released.

Data science is about analyzing real-world data sets, and so a series of projects involving real data are a required part of the course. You may work with a single partner on all projects, and we strongly recommend that you find a partner in your lab section. The course staff will help pair up students during the first few labs.

To ensure that students understand and progress through the projects, students will meet with course tutors during the project periods to share their work and receive guidance.

Weekly homework assignments are a required part of the course and should be submitted to gradescope.com. Each student must submit each homework independently, but you are allowed to discuss problems with other students.

Homework is due on Thursday at 5pm, but an early submission bonus point is given to all students who submit their homework by Wednesday at 5pm.

The midterm exam will be held in class 10:10am-10:55am on Wednesday, March 16. The final exam will be held 3pm-6pm Tuesday, May 10. You must take both exams or receive special accommodations from the instructor by January 31 in order to take the course.

Grades will be assigned using the following weighted components:

- Lab 10%
- Homework 20%
- Projects 30%
- Midterm 10%
- Final 30%

For students who have taken both a statistics and a computer science course already (excluding CS 10) and are taking Data 8 for a letter grade, an additional independent project will be required, and final grades will be influenced strongly by the quality of this project. I expect to give very few high grades to such students — you should be taking a more advanced course.

With the obvious exception of exams, we encourage you to discuss all of the course activities with your friends and classmates as you are working on them. You will definitely learn more in this class if you work with others than if you do not. Ask questions, answer questions, and share ideas liberally.

Since you're working collaboratively, keep your project partner and the course staff informed. If some medical or personal emergency takes you away from the course for an extended period, or if you decide to drop the course for any reason, please don't just disappear silently! You should inform your project partner, so that nobody is depending on you to do something you can't finish.

Cooperation has a limit, however. You should not share your code or answers directly with other students. Doing so doesn't help them; it just sets them up for trouble on exams. Feel free to discuss the problems with others beforehand, but not the solutions. Please complete your own work and keep it to yourself. The exception to this rule is that you can share everything related to a project with your project partner and turn in one project between you.

Penalties for cheating are severe — they range from a zero grade for the assignment or exam up to dismissal from the University, for a second offense.

Rather than copying someone else's work, ask for help. You are not alone in this course! The course staff is here to help you succeed. If you invest the time to learn the material and complete the projects, you won't need to copy any answers.

Late Submission

If you want to receive credit for an assignment that you will turn in after the deadline, you must ask your GSI before the deadline. Otherwise, late homework & lab will not be accepted. Late projects will be accepted for half credit. Extensions will only be offered in advance of the deadline and for exceptional circumstances.

This page shouldn't end with a list of penalties for cheating or lateness, because penalties and grades aren't the purpose of the course. We actually just want you to learn. Please keep that goal in mind throughout the semester. Welcome to Data 8.