-
Matthew K Defenderfer authored4df26284
# Library Setup
# 1. `tidyverse`: contains `dplyr` and `ggplot2`
# 2. `broom`: contains the `tidy` function to convert a summary table to a data frame
# 3. `knitr`: contains the `kable` function to export tidied summary tables to text
library(knitr)
library(broom)
library(tidyverse)
theme_set(theme_bw())
# Data Setup
# We will be using a modified form of the framingham heart study dataset stored in "framingham_n50.RData"
load()
# Simple Linear Regression
# The command for any basic linear regression is 'lm'. lm has the following form:
# lm(formula, data, weights, ...)`
# formula: a formula expressing the dependent variable as a function of the
# predictor variables.
# data: the data frame the variables can be found in
# weights: You can assign each data point a weight here that
# will determine how much it influences the final model. NULL by default
## For example, let's go ahead and model systolic blood pressure as a function of age and store in sysBP.m
sysBP.m <- lm()
## Let's view the output of the linear model using `summary`
## Reporting a linear regression table The `tidy` function from the `broom`
## package does a nice job extracting the coefficient statistics from a summary
## table. The only input it needs is the model.
## Let's use tidy on sysBP.m and store in sysBP.tidy
sysBP.tidy <- tidy()
sysBP.tidy
## Now you can see the coefficient outputs in a neat table. If you want to
## create a formatted text table to paste to a form, you can also use the
## `kable` function on the tidy table.
## Let's use `kable` on the sysBP.tidy dataframe we just made
kable()
## The table could use a little more tidyness, so let's round the output values and change the column names a bit
kable()
# This table would be a good thing to add to any test answer requiring regression
# Plotting the Regression Plotting a linear relationship between two variables
# is essentially the same as plotting a correlation line from last lab using
# `geom_smooth`
# Let's plot the regression of age on sysBP
ggplot() +
geom_point() +
geom_smooth()
# Multiple Linear Regression
## You can also make a model with more than 1 predictor variable as well. We can
## add terms to the formula using the `+` sign
## Let's model sysBP as a function of age and glucose level and store in sysBP.m.2vars
sysBP.m.2vars <- lm()
summary(sysBP.m.2vars)
## Using the `+` sign only adds main effects to the model. However some
## independent variables may interact with each other as well. You can add
## individual interaction terms using the `:` sign. However, putting naked
## interaction terms into the model is rare, you will usually include the main
## effects as well. You can add the main effects and interaction of two variables
## to a model using the `*` sign
## Let's add the interaction of age and glucose to the model using `*` and store in sysBP.m.int
sysBP.m.int <- lm()
summary(sysBP.m.int)