Welcome to our course on Statistical Programming in R!

This free, online course is geared to introduce newcomers to the fundamentals of R programming and statistical analysis using examples from a variety of contexts, disciplines, and data sources. This course was originally prepared for Vanderbilt University’s NSF Site on Accountability, Behavior, & Conflict in Democratic Politics.

Created by Jennifer Barnes and Alexander Tripp.


Lessons

Click on any lesson below to begin. Each lesson is a presentation that you can navigate slide-by-slide with code that you can try out yourself. At the end of each lesson is a link back to this home page and a link to the next lesson.


Lesson 1: Introduction to Data Processing with R

Get acquainted with R and RStudio, learn how to load data, and practice manipulating variables

Topics: R basics, working directories, global environment, data structures, variable creation and manipulation

Lesson 2: Introduction to Descriptive Statistics

Learn about the basic summary statistics, as well as how to calculate and interpret them

Topics: Mean, median, mode, variance, standard deviation, frequency tables

Lesson 3: Introduction to Descriptive Statistics II

Walk through common statistical distributions and a couple foundational statistical theorems

Topics: Normal, T, uniform, log, and exponential distributions, Law of Large Numbers, Central Limit Theorem

Lesson 4: Introduction to Data Visualization

Gain the intuitions underlying good data visualizations and put that knowledge into practice using ggplot2

Topics: Bar plots, histograms, scatterplots, line graphs, visualization best practices, ggplot2

Lesson 5: Introduction to Correlational Analysis

Overview the basics of hypothesis testing using correlational analyses and linear regression

Topics: Correlations, p-values, t-tests, linear regression

Lesson 6: Introduction to Correlational Analysis II

Practice running regression analyses with advice on when and how to 1) include control variables and interactions, 2) validate assumptions behind OLS regressions, and 3) prepare regression output for broader circulation

Topics: Control variables, interaction terms, coefficient plots, OLS assumptions


How to Navigate

  • Use arrow keys to move between slides
  • Press f for fullscreen mode
  • Press o for overview mode to see all slides

Dataset

Throughout this course, we use the IMDb Top 1000 Movies dataset (as of 2020) to motivate our examples. This dataset contains information about the highest-rated movies on IMDB from 1920-2020, including their audience and critic ratings, revenue, runtime,directors, and genre.


Prerequisites

To follow along with the code on your own computer, you’ll need to download R, RStudio, and the following R packages:

install.packages(c("tidyverse", "lubridate", "stargazer", "sjPlot"))