Use the same code three times, create a function.
Use the same function across multiple projects, create a package?
At CANA, we use the statistical programming language R across several projects for its ease of creating data-driven applications and reproducible reports. It’s a great option for exploring data, prototyping solutions, or even taking a model to production. This last summer, several R programmers on the CANA team collaborated on an internal package, canaR, to share standard functions and formatting across our R products. In this post, we share some lessons learned from creating an internal package.
The DRY Principle
Most programmers are familiar with the DRY principle (Don’t Repeat Yourself) which aims at reducing repetition in code. Reasons to follow this principle are abundant: less effort for the programmer, reduced chance for error across multiple uses of the same code, and streamlined testing.
In R, this translates to the use of functions and packages as a best practice. Packages are a natural unit to distribute code within a team as they include functions, tests, documentation, and vignettes. The package development process has become increasingly accessible in recent years due to tools such as usethis, testthat, and roxygen2.
Package Development Approach
The canaR development process was a collaborative effort led by CANA’s R programmers. For our internal package, functions fell into three key categories: style and branding, rounding, and visualizations.
Style and branding functions in the canaR package provide uniformity across our R products by creating a standard for document, table, and plot appearance. These offer a jumping off point for tailoring by analysts as they utilize the formatting functions and geoms on different projects. The canaR development team worked with CANA’s graphic designer, Koa Beam, to create custom, color blind-friendly palettes to fit categorical, sequential, and diverging types of data visualizations.
Rounding functions are another crucial and often underestimated challenge for standardized results. Working with repetition results of a simulation and/or collaborating with team members that use a different software can lead to some tricky rounding challenges. Sample functions in the canaR package that tackle these challenges perform actions such as aligning rounding results to what is expected in Excel, or controlling rounding for aggregations.
Visualization functions in canaR create unique data visuals not covered by existing packages. One of the functions that is helpful for working with dateless planning scenarios is the ‘relative timeline,’ created by CANA team member, Aaron Luprek. This data visualization function creates a time line centered around zero, which allows for showing results that are time-based but not associated with a certain date.
Sample Relative Timeline Graphic
Once the functions were built, the package development team worked together to create clear documentation, examples, and vignettes for future users. We used the testthat package to streamline our testing approach.
The final touch was the package naming convention. In the tradition of some of our favorite tidyverse packages, we saw the opportunity to incorporate a little French (ala magrittr) and an animal reference (ala purrr). Hence, canaR as a sly reference to the ‘canard,’ French for duck, and a diverting duck logo.
Resources
Interested in creating your own R package? There are many resources available! Here are just a few that our team found helpful in developing canaR:
Hadley Wickham and Jenny Bryan - R Packages (Book)
Hilary Parker - Writing an R package from scratch (Blog)
Malcolm Barrett - Zen and the Art of R Package Development (Video)
John Muschelli - R Package Development Series (Video)
Lucia Darrow
Is a Senior Operations Research Analyst at CANA Advisors and can be reached through her LinkedIn profile, or via
email at: ldarrow@canallc.com
Comentários