6 min read

R Basic Tips for Production Code

Disclaimer : No need to follow all the rules, tips and tricks you read.

You are still the best judge when it comes to writing code. Context is a huge factor and you should not rely on any particular advice. Start by writing something that works. It does not need to be perfect.

1. Microbench for performance critical code.

Every decision is a trade off in production. microbenchmark from the library of the same name can help you decide which piece of code to use. Test multiple scenarios with different inputs. Experimentation is key.

2. Use argument names in function call with more than one argument.

Clearly defining argument values instead of relying on position has two clear benefits : readability and futureproofness. First, it will eliminate some guesswork for the people reading your code. Second, if the function call change in the future, it will reduce the amount of errors due to an expected argument position.

sample(1:9, 15, TRUE)
##  [1] 5 4 1 2 4 3 8 5 3 8 6 9 3 9 8
sample(x = 1:9, size = 15, replace = TRUE)
##  [1] 6 8 9 1 4 1 9 7 5 2 6 6 4 6 8

CRAN R package typically do not change their function calls argument position. In any case, better safe than sorry.

3. Subset with [[ instead of $.

Declare x as a list with element crowd.

x <- list(crowd = 10)

Retrieve a similarly named element crow using $. What do you expect?

x$crow
## [1] 10

Retrieve a similarly named element crow using [[. What do you expect?

x[["crow"]]
## NULL

Explanation : The $ operator does partial name matching by default. You can use option warnPartialMatchDollar to be warned when this behavior occurs.

options(warnPartialMatchDollar = TRUE)
x$crow
## Warning in x$crow: partial match of 'crow' to 'crowd'
## [1] 10

Use [[ to avoid partial match. [[ is also ~25% to 33% more efficient.

microbenchmark::microbenchmark(x$crowd, x[["crowd"]], times = 10000)

You can learn about this behavior from the ?Extract documentation. If you feel adventurous, you can read these functions C definition here : [[ source code | $ source code.

4. Replace is.na + == with %in% when operating on data frame columns.

%in% operator returns TRUE or FALSE when evaluating NA.

c(NA, 2:5) == 5
## [1]    NA FALSE FALSE FALSE  TRUE
c(NA, 2:5) %in% 5
## [1] FALSE FALSE FALSE FALSE  TRUE

It can be useful when you have to check for NA and equality on vectors and data frame columns. Of course, %in% is also pretty useful when you need to validate if a value is part of predefined set.

y <- sample(x = c(NA, 2:5), size = 10000, replace = TRUE)
f1 <- function(x) ifelse(test = is.na(x), yes = FALSE, no = x == 5)
f2 <- function(x) x %in% 5
microbenchmark::microbenchmark(f1(y), f2(y))

5. Use an environment to store global variables.

Environment are practical to store application state and other parameters. You put all of them in one place instead of overpopulating the global environment.

store <- new.env()
store[["appstate"]] <- "working"
store[["loglevel"]] <- "info"
store[["dbpoolsize"]] <- 15
ls(envir = store)
## [1] "appstate"   "dbpoolsize" "loglevel"
store[["appstate"]]
## [1] "working"

6. Check for NA, NULL, length one all at once with isTRUE.

isTRUE will always return TRUE or FALSE, whatever you feed it. This is a convenience function. It could be used on user inputs validation for example.

isTRUE(mtcars)
## [1] FALSE
isTRUE(is.numeric(1:5))
## [1] TRUE
isTRUE(NA)
## [1] FALSE
isTRUE(NULL)
## [1] FALSE
isTRUE(c(TRUE, TRUE))
## [1] FALSE

7. Consider any and all on logical vectors.

When you have to evaluate if all values are TRUE (all) or if any one value is TRUE (any).

8. Know the difference between &/| and &&/||.

& and | operates elementwise and will return a logical vector. && and || will only use the first element of each vector.

if clause should use && and ||.

The ?Logic documentation has greater details.

9. You can call names<- and other assignment function directly.

Sometimes it is useful to call an assignment function directly to avoid creating temporary variables. You just have to put them between grave accents.

`names<-`(x = 1:5, value = c("apple", "orange", "lemon", "grape", "banana"))
##  apple orange  lemon  grape banana 
##      1      2      3      4      5

You can use this trick with the apply family of functions. Say you want all b elements from nested lists.

x <- list(list(a=5, b=3), list(a=4, b=6), list(a=9, b=1))
sapply(X = x, FUN = `[[`, ... = "b")
## [1] 3 6 1

10. Substitute library with a quiet requireNamespace.

A library call assumes the host that runs your code has already installed the required package. If you want to better manage what happens when a particular package is not available, you should use requireNamespace since it returns TRUE or FALSE.

The following code will load package xml2 if available or install it and then load it.

pkg <- "xml2"
if (!requireNamespace(package = pkg, quietly = TRUE)) {
  install.packages(pkg)
  loadNamespace(pkg)
}

It assumes connection to CRAN from host are possible. This might not be the case and you may want to issue a warning message instead.

11. Delete temporary files using on.exit expression.

Sometimes a function will require the creation of a temporary file. Since you should clean up after yourself, it is better to use on on.exit statement to unlink the file.

By using on.exit you can put the deletion action logic after the file creation. Plus, in case of error, the file is still deleted.

Most of the times, on.exit should be used with argument add set to TRUE to avoid replacing already defined on.exit expressions.

myfile <- tempfile()
on.exit(expr = unlink(myfile), add = TRUE)

12. Exploit attributes with attr and attributes.

Attributes in R are metadata about an object. They are used to store names, classes, levels or any sort of information about an R object.

You can leverage attributes to store model author and creation date for example. Another common use case is storing information about a data frame like source and description. You can store just about anything in an attribute.

dt <- mtcars
attr(x = dt, which = "source") <- "Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411."
attr(x = dt, which = "description") <- "The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)."
attributes(dt)
## $names
##  [1] "mpg"  "cyl"  "disp" "hp"   "drat" "wt"   "qsec" "vs"   "am"   "gear"
## [11] "carb"
## 
## $row.names
##  [1] "Mazda RX4"           "Mazda RX4 Wag"       "Datsun 710"         
##  [4] "Hornet 4 Drive"      "Hornet Sportabout"   "Valiant"            
##  [7] "Duster 360"          "Merc 240D"           "Merc 230"           
## [10] "Merc 280"            "Merc 280C"           "Merc 450SE"         
## [13] "Merc 450SL"          "Merc 450SLC"         "Cadillac Fleetwood" 
## [16] "Lincoln Continental" "Chrysler Imperial"   "Fiat 128"           
## [19] "Honda Civic"         "Toyota Corolla"      "Toyota Corona"      
## [22] "Dodge Challenger"    "AMC Javelin"         "Camaro Z28"         
## [25] "Pontiac Firebird"    "Fiat X1-9"           "Porsche 914-2"      
## [28] "Lotus Europa"        "Ford Pantera L"      "Ferrari Dino"       
## [31] "Maserati Bora"       "Volvo 142E"         
## 
## $class
## [1] "data.frame"
## 
## $source
## [1] "Henderson and Velleman (1981), Building multiple regression models interactively. Biometrics, 37, 391–411."
## 
## $description
## [1] "The data was extracted from the 1974 Motor Trend US magazine, and comprises fuel consumption and 10 aspects of automobile design and performance for 32 automobiles (1973–74 models)."