Statistics 159/259: Reproducible and Collaborative Statistical Data Science
Textbook¶
While not strictly a textbook for this course, we will rely heavily on the excellent, openly licensed: Research software engineering in Python. More resources are listed in the course references page.
Administrativia¶
Prerequisites¶
Statistics 133, 134, 135
Graduate standing is required to register for Statistics 259.
Willingness to learn programming languages and software tools independently (tools used will include Python; Jupyter Notebooks; the Python “scientific stack” of numpy, scipy, matplotlib, pandas, and scikit; git; GitHub; GitHub actions; Docker; LaTeX, Markdown, pandoc)
Willingness to learn some statistical methodology by reading on one’s own (materials and links will be provided, but not all topics required to do the homework will be covered in lecture).
Format and assessment¶
3 hours of lecture and 2 hours of lab per week
lectures will focus on theory, philosophy of science, foundations of statistics, scientific applications, software engineering, code reviews and group discussion.
lab will focus on computing, software tools, workflow, and collaboration; a short and easy quiz on that lab’s material will occur each session. You must attend the lab section you are officially enrolled in. If this presents significant burden, please reach out to your lab TA.
For each assigned reading, you will submit a brief, 1 paragraph of thoughts and impressions due on Fridays at 11:59PM. The paragraph should briefly explore something that interested you (e.g., you may wish to focus on one aspect of the paper in more depth, you may wish to discuss something in the reading that you disagree with). During lecture, we will draw upon your reports for some group discussion.
Office hours¶
Perez: Monday, 10-11AM, 419 Evans Hall. I will normally also keep an open Zoom session for those needing to join remotely for Covid or other reasons.
Graduate Student Instructors¶
Labs: Friday 9AM-11AM & 1PM-3PM (340 Evans Hall).
Office hours: Wednesdays 1PM-2PM & Thursdays 10AM-11AM (428 Evans Hall).
Labs: Friday 3PM-5PM (342 Evans Hall).
Office Hours: Tuesdays 1PM-2PM (428 Evans Hall).
Communication¶
Please use the course Ed for questions about course material and logistics. For personal matters (illness, accommodations, etc.) that should remain private, please make a private Ed post that only the instructor and GSI will see. You may obviously email one of us privately if you need, but in general we’ll be able to more efficiently handle class communications if they stay on Ed.
During the work week, we expect to be able to reply to Ed posts and email within 24 hours. On weekends, we might need longer.
Grading¶
The course is not graded on a curve. It is possible for every student to make an A. We encourage you to focus on mastering the material, not on your grade.
Homework (~6-7): 20% (10% assignment + 10% peer code review).
Reading Assignments (weekly, on average): 10%
Lab Quizzes: 10%
Project 1: 10%
Project 2: 20%
Final Project: 30%
In case of medical exception, submit on Ed a private note to the instructors with a medical proof showing that you are unable to complete the assignment. We will grant extra 48hrs to the reading assignment/homework to be submitted, unless more time is required.
Homework¶
Homework deadlines will be posted immediately after the homework is released; deadlines will usually be on a Thursday at 11:59pm.
We will accept late homework assignments until 24hrs after the deadline of the homework. However, in those cases a 25% penalty will be applied to the final score.
All homework assignments must be completed individually.
Submitting assignments: Submit written assignments by making a pull request to your private repository within the Berkeley GitHub organization for the class, using the GitHub Classroom (you will practice all this, don’t worry).
Code review: Another component of each homework will be a GitHub-based peer code review. After turning in your homework, you will review another student’s code and give feedback. These will typically be due one week after that homework’s deadline.
Reading Assignments¶
These will be posted on the course website under Assigned Readings. For each paper/reading in the weekly list, you should submit a paragraph highlighting your ideas and thoughts. You will submit your reading assignments in bCourses.
Reading assignments will be due Fridays at 11:59pm. No later reading assignments will be accepted unless there is a medical exception. In that case, you will need to submit in Ed a private note to the instructors with a medical proof showing that you are unable to complete the assignment.
You can drop two readings without need of justification. Notice that this applies to INDIVIDUAL readings. For example, if the weekly reading consists of 4 papers, you can drop a maximum of two of them. If you drop two readings in one week, you cannot drop any other one without penalty.
Each reading assignment accounts for 1 point. The final points for the reading assignment that week is the sum of all the readings. Notice that this means that the maximum credit you can obtain per week depends on the number of readings that week.
Lab Quizzes¶
In each lab, there will be a very short online quiz on that day’s lab material.
The quiz question(s) will be based on that day’s lab session and is meant to be very easily completed if you attended.
You will be allowed to drop 2 of your lab quizzes for the semester, no questions asked. This is to give you some flexibility if you need to miss lab due to illness, travel, etc. If you need to miss multiple labs due to extenuating circumstances, please contact your lab TA. They will grant exceptions/extra drops according to their discretion.
Given course roster uncertainties in the first few weeks of any course, the first two lab quizzes will not be graded. Quizzes from Week 3 and onward will count towards the grade.
Projects (Project 1, Project 2, Final)¶
There will be a few projects throughout the semester, culminating in a final project where you will combine all of the tools and techniques you’ve been learning this semester into a single body of work. All projects will be completed in groups, which will be assigned at a later point.
Code of conduct; attribution of work¶
The high academic standard at the University of California, Berkeley, is reflected in each degree awarded. Every student is expected to maintain this high standard by ensuring that all academic work reflects unique ideas or properly attributes the ideas to the original sources.
These are some basic expectations of students with regards to academic integrity: Any work submitted should be your own individual thoughts, and should not have been submitted for credit in another course unless you have prior written permission to re-use it in this course from this instructor.
All assignments must use “proper attribution,” meaning that you have identified the original source and extent or words or ideas that you reproduce or use in your assignment. This includes drafts and homework assignments! If you are unclear about expectations, ask your instructor.
Do not collaborate or work with other students on assignments or projects unless the instructor gives you permission or instruction to do so.
Disability accommodations¶
If you need an accommodation for a disability, if you have information your wish to share with the instructor about a medical emergency, or if you need special arrangements if the building needs to be evacuated, please inform the instructor as soon as possible.
If you are not currently listed with DSP (the Disabled Students’ Program) and believe you might
benefit from their support, please apply online at https://