# A tibble: 10 × 4
id smoking hypertension diabetes
<int> <chr> <chr> <chr>
1 1 Y Y N
2 2 N Y Y
3 3 Y N Y
4 4 N N N
5 5 N N N
6 6 N N N
7 7 N Y Y
8 8 Y N N
9 9 N N N
10 10 N N N
I know the aphorism is “If you write a piece of code 3 times, write a function” so I could do
# A tibble: 10 × 4
id smoking hypertension diabetes
<int> <chr> <chr> <chr>
1 1 Y Y N
2 2 N Y Y
3 3 Y N Y
4 4 N N N
5 5 N N N
6 6 N N N
7 7 N Y Y
8 8 Y N N
9 9 N N N
10 10 N N N
But I usually have to edit the name of the column too because often smoking is actually Smoking (Yes or No) (and while {janitor} helps it still leaves some cruft on the end of column names). This makes the first pattern more attractive, despite the repetition.
I asked gemini how it might handle this problem, and I was shown a neat little pattern I want to share. The mutate-across pattern us actually pretty useful, so let’s abstract that into a function and tack on a rename like so
apply_cleaners <-function(data, clean_func, rename_map){ data |>mutate(across(all_of(unname(rename_map)), ~clean_func(.x)) ) |>rename(all_of(rename_map))}
Now, I can specify a list of columns I want to clean with the same function, and the new names I want to use
# A tibble: 10 × 4
id smoking_cleaned hypertension_cleaned diabetes_cleaned
<int> <chr> <chr> <chr>
1 1 Y Y N
2 2 N Y Y
3 3 Y N Y
4 4 N N N
5 5 N N N
6 6 N N N
7 7 N Y Y
8 8 Y N N
9 9 N N N
10 10 N N N
which doesn’t seem that cool, unless you realize that what is returned is a tibble, which can be passed through apply_cleaners again. Hence, reduce seems like a good tool here
# Needs to be a list of lists cleaning_tasks <-list("yn_cleaning_tasks"=list(rename_map =c('smoking_cleaned'='smoking', 'hypertension_cleaned'='hypertension','diabetes_cleaned'='diabetes'),func = clean_yn ))reduce(cleaning_tasks, function(data, task) {apply_cleaners(data, task$func, task$rename_map)}, .init=raw_data)
# A tibble: 10 × 4
id smoking_cleaned hypertension_cleaned diabetes_cleaned
<int> <chr> <chr> <chr>
1 1 Y Y N
2 2 N Y Y
3 3 Y N Y
4 4 N N N
5 5 N N N
6 6 N N N
7 7 N Y Y
8 8 Y N N
9 9 N N N
10 10 N N N
This is overkill for many cleaning tasks, but when you’re cleaning dozens of columns, I think this is nice! It is very targets-y, which I like.