# Making Plots with Purrr

Published:

I was recently asked to make 4 plots for a collaborator. The plots are all the same, just a scatter plot and a non-linear trend line. Every time I have to do something repetitive, I wince, especially with respect to plots. I thought I would take this opportunity to write a short blog post on how to use functional programming in R to make the same plot for similar yet different data.

Side note: The result could probably be obtained through clever use of ggplot2’s geom_smooth, but I don’t think that workflow would generalize well.

# Step 1: Understand the Problem

The data contain some clinical variables and the concentration of a drug in the body. My collaborator just wants a scatter plot and a trend line. Linear regression isn’t appropriate because concentration must be positive and preliminary looks at the data show a non-linear trend. We both agreed a log transform is appropriate for the concentration data. So the plan is to:

• Log the concentration data
• Perorm a linear regression on the log scale
• Transform the data and the predictions back to the original scale
• Plot the resulting exponential trend as well as the original data.

I have to do that for 4 variables.

# Step 2: Put 4 Data Sets Into 1 Dataframe

I’ve loaded the data into a variable called c.data. There are 5 columns all in all (concentration and 4 covariates). What I want is 4 seperate data sets, each with 2 columns (1 for concentration, 1 for the covariate). If I use gather, group_by the covariate names, and then nest I can get what I want. I would suggest you make your own data set as I have described and follow along so you can see what is happening.

library(tidyverse)

# Make 4 datasets and store in one dataset
step.2 = c.data %>%
gather(var, value, -Concentration) %>%
group_by(var) %>%
nest(.key = 'obs.data')



Now I have 4 datasets which are essentially the same and they all live in step.2.

# Step 3: Map The Data To a Linear Regression

I can use map to store models fitted to each covariate. Each of the smaller covariate has a column called Concentration and another called value. If we had one of these smaller datasets infront of us, we would do lm(log(Concentration) ~ . , data = smaller.data) to fit the linear model. Since we have a column of datasets, I do


step.3 = step.2  %>%
mutate(
model = map(obs.data, ~lm(log(Concentration) ~ ., data = .x) #Fit models here
)



# Step 4: Set Up New Data To Predict On

In order to plot a trend line, I need a grid of new data. The library modelr has some great utilities for this sort of thing. The data_grid function does just what I want.

library(modelr)

step.4 = step3 %>%
mutate(
newdata = map(obs.data, ~ data_grid(.x, value = seq_range(value,20)) )
)


Now I have another new column called newdata which has a dataframe of evenly spaced observations for each covariate. All that is left to do is to predict on this data using the model we fit in step 3.

# Step 5: Predict & Transform

Remember, I took the log of concentration, so I will have to exponentiate my predictions. I have to pass both my model and the new data into the predict fucntion, so that means I have to use map2.

step.5 = step.4 %>%
mutate(
preds = map2(model, newdata, ~ exp(predict(.x, newdata = .y)) #Dont forget the exp!
)


I’ll be using ggplot2 to make the plots, so it is best if I have the x’s and the y’s for the trend line in the same data frame.


step.5 = step.5 %>%
mutate(
pred.data = map2(newdata,preds, ~ .x %>% bind_cols(pred = .y))
)



Now I have another column, pred.data which has the data I’ll need for drawing the trend line.

# Step 6: Make a Function For the Plots

I’ve written a nice helper function to help me draw the plots I want. It will take in as arguments 2 data sets, plotting a scatter plot for 1 data set, and a line plot for the other. Here is the function:


make.plot<-function(pred.data, obs.data, vars){

plot = ggplot()+
geom_point(data = obs.data, aes(value, Concentration), shape = 21, fill = 'gray')+
geom_line(data = pred.data, aes(value,pred), color = 'red' )+
labs(x = vars)+
theme_classic()

return(plot)

}



# Step 7: Reduce!

I can use pmap to pass in a named list to my make.plot function and then use patchwork and reduce to combine all the plots into a single figure!


library(patchwork)

arguments = list(pred.data = plotting.data$pred.data, obs.data = plotting.data$obs.data,
vars = c('Var1','Var2','Var3','Var4'))

plots<-pmap(arguments, make.plot)

final = reduce(plots,+)

ggsave('purrrplot.png', plot= final, dpi = 800)



Here is the final product! Another for loop cleverly averted.

Tags: