On the 4th of February I presented a workshop on functions in R at the R Ladies Manchester meetup kindly hosted at AutoTrader. During this session we dived straight in to writing some example functions, discussed some function best practices and looked at more advanced techniques including lappy, map and pipes.
Functions are a confusing concept when you first learn them. I remember the first time I was in a workshop on how to write functions in R and I found them just as confusing as those in the audience of my talk who were learning about functions for the first time. When you see them abstractly, without a real application, it’s difficult to see why you would want to use them over just writing some code not wrapped up in a function. Try and start to introduce functions in your day to day work and soon they will start to become a useful tool.
There are lots of different ways to get the same result. R is a great language for flexibility in coding style, however this flexibility means there are so many ways to write code that will give you the same result. I’ve gone through functions in R a couple of times now recently and every time I go theough it I end up going “and you can do it this way, or this way, or even this way…”. Some people like Base R, some like the tidyverse, some are more comfortable with the
apply family whereas some like
The most common discussion I’ve had is how to assign function results when applying functions over multiple columns in a data frame. Options include replacing columns, added extra columns, creating a new data frame (either a data frame where the number of rows and columns is pre-determined or starting with a NULL). Try a few different methods until you find the one that suits your coding style.
Long contained functions or functions wrapped in other functions? Normally I write long functions which do everything I need in one place. However recently I’ve started working with colleagues who break down functions into much smaller functions and I’m beginning to see the benefit. When I write long functions often there’s functionality that I then copy and paste into other functions which would benefit from being their own function. Breaking functions down into smaller functions still feels a bit weird but is something I’m trying to make a conscious effort to do more and more where it looks like it’s appropriate.
When using functions that require multiple columns of a data frame as inputs should the function be
function(data$x, data$y, ...)?.
This was one of the questions I was asked during my talk and I initially thought the answer was the latter but then decided the former is best, especially if you want to use pipes (
%>%) as the data frame can then be passed as
data %>% function(...) and then within the function you can call the required columns.
A lot of the content for this talk came from the Advanced R book by Hadley Wickham (http://adv-r.had.co.nz/Functions.html) which is a very good resource for learning to use functions. There are some examples that you can work through to learn more.