Welcome to our course on Statistical Programming in R!
This free, online course is geared to introduce newcomers to the fundamentals of R programming and statistical analysis using examples from a variety of contexts, disciplines, and data sources. This course was originally prepared for Vanderbilt University’s NSF Site on Accountability, Behavior, & Conflict in Democratic Politics.
Created by Jennifer Barnes and Alexander Tripp.
Lessons
Click on any lesson below to begin. Each lesson is a presentation that you can navigate slide-by-slide with code that you can try out yourself. At the end of each lesson is a link back to this home page and a link to the next lesson.
Lesson 1: Introduction to Data Processing with R
Get acquainted with R and RStudio, learn how to load data, and practice manipulating variables
Topics: R basics, working directories, global environment, data structures, variable creation and manipulation
Lesson 2: Introduction to Descriptive Statistics
Learn about the basic summary statistics, as well as how to calculate and interpret them
Topics: Mean, median, mode, variance, standard deviation, frequency tables
Lesson 3: Introduction to Descriptive Statistics II
Walk through common statistical distributions and a couple foundational statistical theorems
Topics: Normal, T, uniform, log, and exponential distributions, Law of Large Numbers, Central Limit Theorem
Lesson 4: Introduction to Data Visualization
Gain the intuitions underlying good data visualizations and put that knowledge into practice using ggplot2
Topics: Bar plots, histograms, scatterplots, line graphs, visualization best practices, ggplot2
Lesson 5: Introduction to Correlational Analysis
Overview the basics of hypothesis testing using correlational analyses and linear regression
Topics: Correlations, p-values, t-tests, linear regression
Lesson 6: Introduction to Correlational Analysis II
Practice running regression analyses with advice on when and how to 1) include control variables and interactions, 2) validate assumptions behind OLS regressions, and 3) prepare regression output for broader circulation
Topics: Control variables, interaction terms, coefficient plots, OLS assumptions
Dataset
Throughout this course, we use the IMDb Top 1000 Movies dataset (as of 2020) to motivate our examples. This dataset contains information about the highest-rated movies on IMDB from 1920-2020, including their audience and critic ratings, revenue, runtime,directors, and genre.
Prerequisites
To follow along with the code on your own computer, you’ll need to download R, RStudio, and the following R packages:
install.packages(c("tidyverse", "lubridate", "stargazer", "sjPlot"))