Making Plots with Purrr

4 minute read

Published:

I was recently asked to make 4 plots for a collaborator. The plots are all the same, just a scatter plot and a non-linear trend line. Every time I have to do something repetitive, I wince, especially with respect to plots. I thought I would take this opportunity to write a short blog post on how to use functional programming in R to make the same plot for similar yet different data.

Side note: The result could probably be obtained through clever use of ggplot2’s geom_smooth, but I don’t think that workflow would generalize well.

Step 1: Understand the Problem

The data contain some clinical variables and the concentration of a drug in the body. My collaborator just wants a scatter plot and a trend line. Linear regression isn’t appropriate because concentration must be positive and preliminary looks at the data show a non-linear trend. We both agreed a log transform is appropriate for the concentration data. So the plan is to:

  • Log the concentration data
  • Perorm a linear regression on the log scale
  • Transform the data and the predictions back to the original scale
  • Plot the resulting exponential trend as well as the original data.

I have to do that for 4 variables.

Step 2: Put 4 Data Sets Into 1 Dataframe

I’ve loaded the data into a variable called c.data. There are 5 columns all in all (concentration and 4 covariates). What I want is 4 seperate data sets, each with 2 columns (1 for concentration, 1 for the covariate). If I use gather, group_by the covariate names, and then nest I can get what I want. I would suggest you make your own data set as I have described and follow along so you can see what is happening.

library(tidyverse)

# Make 4 datasets and store in one dataset
step.2 = c.data %>% 
		 gather(var, value, -Concentration) %>% 
		 group_by(var) %>%
		 nest(.key = 'obs.data')


Now I have 4 datasets which are essentially the same and they all live in step.2.

Step 3: Map The Data To a Linear Regression

I can use map to store models fitted to each covariate. Each of the smaller covariate has a column called Concentration and another called value. If we had one of these smaller datasets infront of us, we would do lm(log(Concentration) ~ . , data = smaller.data) to fit the linear model. Since we have a column of datasets, I do


step.3 = step.2  %>% 
		mutate(
			model = map(obs.data, ~lm(log(Concentration) ~ ., data = .x) #Fit models here
			)

Step 4: Set Up New Data To Predict On

In order to plot a trend line, I need a grid of new data. The library modelr has some great utilities for this sort of thing. The data_grid function does just what I want.

library(modelr)

step.4 = step3 %>%
		 mutate(
		 	newdata = map(obs.data, ~ data_grid(.x, value = seq_range(value,20)) ) 
		 	)

Now I have another new column called newdata which has a dataframe of evenly spaced observations for each covariate. All that is left to do is to predict on this data using the model we fit in step 3.

Step 5: Predict & Transform

Remember, I took the log of concentration, so I will have to exponentiate my predictions. I have to pass both my model and the new data into the predict fucntion, so that means I have to use map2.

step.5 = step.4 %>%
		 mutate(
		 	preds = map2(model, newdata, ~ exp(predict(.x, newdata = .y)) #Dont forget the exp!
		 	)

I’ll be using ggplot2 to make the plots, so it is best if I have the x’s and the y’s for the trend line in the same data frame.


step.5 = step.5 %>%
		 mutate(
		 	pred.data = map2(newdata,preds, ~ .x %>% bind_cols(pred = .y))
		 	)

Now I have another column, pred.data which has the data I’ll need for drawing the trend line.

Step 6: Make a Function For the Plots

I’ve written a nice helper function to help me draw the plots I want. It will take in as arguments 2 data sets, plotting a scatter plot for 1 data set, and a line plot for the other. Here is the function:


make.plot<-function(pred.data, obs.data, vars){
  
  plot = ggplot()+
    geom_point(data = obs.data, aes(value, Concentration), shape = 21, fill = 'gray')+
         geom_line(data = pred.data, aes(value,pred), color = 'red' )+
         labs(x = vars)+
         theme_classic()
  
  return(plot)
  
}

Step 7: Reduce!

I can use pmap to pass in a named list to my make.plot function and then use patchwork and reduce to combine all the plots into a single figure!


library(patchwork)


arguments = list(pred.data = plotting.data$pred.data, 
                 obs.data = plotting.data$obs.data,
                 vars = c('Var1','Var2','Var3','Var4'))

plots<-pmap(arguments, make.plot)


final = reduce(plots,`+`)

ggsave('purrrplot.png', plot= final, dpi = 800)


Here is the final product! Another for loop cleverly averted.