# Library Setup # 1. `tidyverse`: contains `dplyr` and `ggplot2` # 2. `broom`: contains the `tidy` function to convert a summary table to a data frame # 3. `knitr`: contains the `kable` function to export tidied summary tables to text library(knitr) library(broom) library(tidyverse) theme_set(theme_bw()) # Data Setup # We will be using a modified form of the framingham heart study dataset stored in "framingham_n50.RData" load() # Simple Linear Regression # The command for any basic linear regression is 'lm'. lm has the following form: # lm(formula, data, weights, ...)` # formula: a formula expressing the dependent variable as a function of the # predictor variables. # data: the data frame the variables can be found in # weights: You can assign each data point a weight here that # will determine how much it influences the final model. NULL by default ## For example, let's go ahead and model systolic blood pressure as a function of age and store in sysBP.m sysBP.m <- lm() ## Let's view the output of the linear model using `summary` ## Reporting a linear regression table The `tidy` function from the `broom` ## package does a nice job extracting the coefficient statistics from a summary ## table. The only input it needs is the model. ## Let's use tidy on sysBP.m and store in sysBP.tidy sysBP.tidy <- tidy() sysBP.tidy ## Now you can see the coefficient outputs in a neat table. If you want to ## create a formatted text table to paste to a form, you can also use the ## `kable` function on the tidy table. ## Let's use `kable` on the sysBP.tidy dataframe we just made kable() ## The table could use a little more tidyness, so let's round the output values and change the column names a bit kable() # This table would be a good thing to add to any test answer requiring regression # Plotting the Regression Plotting a linear relationship between two variables # is essentially the same as plotting a correlation line from last lab using # `geom_smooth` # Let's plot the regression of age on sysBP ggplot() + geom_point() + geom_smooth() # Multiple Linear Regression ## You can also make a model with more than 1 predictor variable as well. We can ## add terms to the formula using the `+` sign ## Let's model sysBP as a function of age and glucose level and store in sysBP.m.2vars sysBP.m.2vars <- lm() summary(sysBP.m.2vars) ## Using the `+` sign only adds main effects to the model. However some ## independent variables may interact with each other as well. You can add ## individual interaction terms using the `:` sign. However, putting naked ## interaction terms into the model is rare, you will usually include the main ## effects as well. You can add the main effects and interaction of two variables ## to a model using the `*` sign ## Let's add the interaction of age and glucose to the model using `*` and store in sysBP.m.int sysBP.m.int <- lm() summary(sysBP.m.int)