Skip to content
Snippets Groups Projects
Lab-7-Linear-Regression-2-Skeleton.R 2.84 KiB
Newer Older
Matthew K Defenderfer's avatar
Matthew K Defenderfer committed
## Overall Progression for Lab 7:
# 1. Load Packages and Data
# 2. Review our Simple Model
# 3. Checking Assumptions
#   a. Linearity
#   b. Homoscedasticity of residuals
#   c. Normality of residuals
#   d. Independence
#   e. Multicollinearity (for multiple linear model)

## Loading Packages and Data For this lab, we will be using the `tidyverse`,
#`broom`, `car`, and `pastecs`, as well as the same dataset as last lab:

library(pastecs)
library(broom)
library(car)
library(tidyverse)

theme_set(theme_bw())

load("~/Google Drive/Grad School/GR 770 Statistics/R Labs/Data/framingham_n50.RData")

# Reviewing the Simple Model
# Let's review our simple model of systolic blood pressure (sysBP) as a function
# of age alone. We will recalculate this using `lm`

sysBP.m <- lm()
summary()

# Checking Assumptions

## Linearity 
## You can assess linearity using a scatterplot. We can recreate that
## plot here, again adding a regression line using `geom_smooth`
ggplot(fhs, aes()) +
  geom_point() +
  geom_smooth()


## Homoscedasticity We will be assessing homoscedasticity of the residuals.
## Residuals are difficult to access normally, however the `augment` function
## can grab them and other information from the lm object to work with

## Let's augment sysBP.m and see what we get
sysBP.m <- augment()
head(sysBP.a,10)

## Let's make sure our residuals average out to 0 first of all
mean()

## For assessing homoscedasticity, we will make scatterplots of the residuals
## versus the fitted values. We will add a horizontal line at zero for comparison
ggplot(sysBP.a, aes()) +
  geom_point() +
  geom_hline()

## We want to make sure we see no pattern in the residuals and that they are all
## randomly spaced away from the horizontal


## Normality of Residuals
## We will test normality in the same way as before, using Q-Q plots,
## historgram, and statistical tests

### Histogram
### Let's create a histogram with 15 bins for the residuals, adding on top a
### normal distribution
ggplot(sysBP.a,aes()) +
  geom_histogram(aes(), bins = , fill = "orange", color = "black") +
  stat_function(fun = , args = list(), color = "red")

### Q-Q Plot
ggplot(sysBP.a,aes()) +
  geom_qq() +
  geom_qq_line()

### Boxplot
ggplot(sysBP.a,aes()) +
  geom_boxplot()

### Statistical Tests
### Let's calculate skew, kurtosis, and perform SW tests on the residuals
stat.desc()

### Independence
### We don't talk about that in this class

### Multicolinearity
### This only applies for multiple linear regression. We need to make sure that
### our predictor variables are not highly correlated. We can test for
### multicolinearity using the `vif` function from the `car` package. If a
### predictor has VIF > 10, it is colinear with another variable and should be
### removed from the model

### Let's look at our 2 variable model of sysBP predicted by main effects of age
### and glucose
sysBP.m.2 <- lm()
vif()