Instructor

Amelia McNamara (amelia.mcnamara@stthomas.edu, OSS 407)

Student hours: Mondays from 1:00-2:00, Wednesdays from 11-noon, and by appointment, in OSS 407.

Description

This course introduces students to an advanced statistical software package to effectively apply statistical methods, in general. Students create data sets from raw data files, create variables within a data set, append and/or modify data sets, create subsets, then apply a whole host of statistical procedures, create graphs and produce reports. The course will be based on several leading advanced statistical software packages, which will be chosen from semester to semester to match the needs of the community.

Prerequisite: STAT 220 or STAT 314

Textbooks

There are two required texts for this course:

• R for Data Science by Hadley Wickham and Garrett Grolemund
• If you buy the physical book, this will be the first edition of the book, which is also available online
• There is a second edition of the book that exists only online, but is perhaps better for reading electronically

These books are available for purchase in the university book store, online at Amazon and other online retailers, or freely available on the websites linked above. You don’t need a physical copy of either book, but many students prefer to have a hard copy.

We will supplement these two texts with additional readings from other materials, including (but not limited to):

• Happy git with R by Jenny Bryan, STAT 545 TAs, and Jim Hester (online only)
• R packages, by Hadley Wickham (paper book available)

Computing

This course is structured to be as authentic as possible with regard to statistical computing. This means we will be coding in the R language, creating literate programming files (mixing text and code) in the rmarkdown format, versioning code with git, sharing and collaborating on GitHub and communicating via Slack. All of these technologies have a learning curve, but I believe they are useful both pedagogically and practically (in terms of getting a job).

Inclusive classroom

Because data is collected by and about humans, it necessarily encodes aspects of our proclivities and biases. As a result, this course will likely touch upon difficult topics related to race, gender, inequality, class, and oppression. We each come into this class with different perspectives that can be shared to enhance our understanding of these issues. I ask that you enter these conversations with respect, curiosity, and cultural humility. You should be open to alternative perspectives and willing to revise beliefs that are based on misinformation. As a general rule, your ideas and experiences can always be shared during these conversation but please refrain from dismissing the experiences of others. Personal attacks of any kind will not be tolerated.

Please plan to treat me and your classmates with respect. This includes things like arriving at class on time, coming in quietly if you are late, and focusing on the task at hand, as well as using your classmates’ preferred name and pronouns.

Disability Statement

Academic accommodations will be provided for qualified students with documented disabilities including but not limited to mental health diagnoses, learning disabilities, Attention Deficit Disorder, Autism, chronic medical conditions, visual, mobility, and hearing disabilities. Students are invited to contact the Disability Resources office about accommodations early in the semester. Appointments can be made by calling 651-962-6315 or in person in Murray Herrick, room 110. For further information, you can locate the Disability Resources office on the web at http://www.stthomas.edu/enhancementprog/.

Collaboration

Much of this course will operate on a collaborative basis, and you are expected and encouraged to work together with a partner or in small groups to study, complete homework assignments, and work on projects. However, every word that you write must be your own. Copying and pasting sentences, paragraphs, or blocks of code from another student or the internet is not acceptable and will receive no credit. No interaction with anyone but the instructor is allowed on any exams or quizzes. All students are bound by the university Rules of Conduct. Cases of dishonesty, plagiarism, etc., will be reported to the dean.

Resources

Course website and other technology

The course website will be regularly updated with lecture handouts, project information, assignments, and other course resources. Course discussion will take place on Slack. Labs will be completed using R and git/GitHub.

Assignments

1. Assignments [40%]: Assignments will take the form of short coding exercises.

2. Labs [30%]: Labs will primarily take place in the open-source programming language R. Students are encouraged to work in pairs during these labs.

3. Final Project [20%]: The final project will see you applying what you have learned to create an R package for a task of your choice. There will be several (graded) milestones along the way to help you prepare, and we will hold a demonstration session on the final day of class.

4. Class participation [10%]: Classes meet Tuesday and Thursday in OSS 415. Your participation is an important part of the learning process. Active participation in class will comprise the remainder of your grade.

Tentative Schedule

The following is a brief outline of the course. Please refer to the complete day-to-day schedule for more detailed information.

1 r4ds 1-3 Introduction to the course, data visualization
2 r4ds 4-5, 27 Workflow basics, data wrangling
3 r4ds 6 More data wrangling, scripts, intro to git and GitHub
4 r4ds 12-13 Tidy data and joins
5 r4ds 14-16 Special data types
6 r4ds 17-21 Intro to programming in R
7 advR 2-4 Data structures in R, subsetting, vocab
March 25-April 1 Midterm Break
9 advR 5-6, Rpkg 3 Style, functions, simulation
10 advR 7 OO field guide, spatial data
11 advR 8-9, Rpkg 2, 4 Environments, R package structure
12 happygitwithR 22, 28, shiny basics Shiny apps, collaboration
13 R packages chapter Vignettes, Packaging data analysis reproducibly using R (and friends), NSE in dplyr Vignettes, Non-standard evaluation
14 adv R, non-standard evaluation, tidy eval Non-standard evaluation
15 Project Work Time and Project Presentations
May 24 All work due