top of page

An Idea From UNIX for R Tidyverse Pipelines

Updated: Nov 28, 2023


ree

R’s Tidyverse is a collection of R packages that work together with a common functional interface to accomplish data transformations and analysis in the R programming language.

To see it in action, let’s consider the following sequence of data transformations on some dataframe (called df) as an example.

df <- read_csv(“some_file.csv”)
df <- groupby(df, name, a, b, c)
df <- summarize(df, qty = qty)
df <- ungroup(df)
df <- mutate(df, version = “20230130”)

Each of the functions groupby( ), summarize( ), ungroup( ), and mutate( ) are from the Tidyverse collection. Notice that the first argument of each of these functions is the data (usually a dataframe) to be operated upon and is the data just updated on the previous line.


However, the standard practice is to code such a sequence, not as consecutive assignment statements, as above, but as ONE expression where each step is a “link” in a “chain” of transformations, as in the following code.

df <- read_csv(“some_file.csv”) %>%
      groupby(name, a, b, c)    %>%
      summarize(qty = qty)      %>%
      ungroup()                 %>%
      mutate(version = “20230130”)

Notice that each step of the transformation is now “linked” to the next with the infix operator %>% (sometimes called “pipe”). Also, note that the first argument of each Tidyverse function is now dropped. That is because the passing of the data from one transformation function to the next is handled by the pipe operator %>%. Eliminating the repeated “noise” on each line also makes the code more readable. The chain of such transformations is called a “pipeline.”


Code Usage Example

Now, here is a usage of the above idea that we see a lot in our code. If we want to create a dataframe A for output (and possibly use A for further processing), we like to write code like the following (where [step i] is not real R code, but represents some Tidyverse function call, as in the concrete example above).

A <- [step 1] %>%
     [step 2] %>%
     [step 3] %>%
     [step 4] %>%
     [step 5]


write_csv(A, "A.csv")

But in our codebase, we've seen a variant of this code too, where we want to peek at what the data looks like after step 3, but before going to step 4, as in the following:

A_intermediate <- [step 1] %>%
                  [step 2] %>%
                  [step 3]


write_csv(A_intermediate, "A_intermediate.csv")


A <- A_intermediate %>%
     [step 4] %>%
     [step 5]


write_csv(A, "A.csv")

The variable names often don't say "intermediate" and this fragmentation of the pipeline is often more difficult for the reader to read, especially when this “peeking” method occurs several times in a long pipeline.


To the rescue, enter the idea of tee which is an old UNIX utility. UNIX shell expressions have pipelines too -- I believe that R pipelines are inspired by them (or by something else that was in turn inspired by UNIX shell pipelines).

A Unix pipeline looks like this:

[step 1] | [step 2] | [step 3]

where each [step i] is some UNIX command, and each command is “linked” to the next by the | character (analogous to %>% in Tidyverse pipelines). If a call to the utility tee is inserted into the pipeline, e.g., as follows:

[step 1] | [step 2] | tee foo.txt | [step 3]

then the file foo.txt has the intermediate result of the pipeline after step 2, but before step 3, and can be inspected for audit or debug purposes.

Using this idea, we write an R function called tee( ) so that our code looks more like the original (non-fragmented) version of the pipeline:

A <- [step 1] %>%
     [step 2] %>%
     [step 3] %>%
     tee("A_intermediate.csv") %>%
     [step 4] %>%
     [step 5]


 write_csv(A, "A.csv")

The definition of tee( ) is simply: grab the given dataframe, write it to a file and then pass it along (for further processing down the rest of the pipeline).

tee <- function(df, filename)
{
  write_csv(df, filename)
  df
}

More Tee Fun

We can simplify our new code even further by reusing tee( ) for writing the final version of A also.

A <- [step 1] %>%
     [step 2] %>%
     [step 3] %>%
     tee("A_intermediate.csv") %>%
     [step 4] %>%
     [step 5] %>%
     tee("A.csv")

This puts all the processing for dataframe A in one expression, making it easy for the reader to see it as completely self-contained. This final version is the form of the data transformation found in our codebases.


Summary

We can see how the Tidyverse package collection is an extremely useful programming construct for data analysis. Tidyverse functions have a common interface which can be leveraged so that we can code our data transformations into easy-to-read pipeline expressions. Easier reading entails easier development, less coding mistakes, and easier debugging, if it should come to that. We have additionally leveraged an idea from UNIX to help us “peek” into the partially transformed data in a pipeline to help us understand what the data looks like at a particular point in the transformation pipeline.


ree



Rick Hanson is our Senior Operations Research Analyst here at CANA. You can reach him at rhanson@canallc.com or on Linkedin.

21 Comments


uma awasthi
uma awasthi
2 days ago

LotusBook247 stands out as one of the finest platforms for sports betting and online casino games. Offering players a smooth, safe, and exciting experience anytime, anywhere, it covers everything from cricket betting to live casino tables. With easy navigation and secure payment options, both beginners and seasoned players will find it convenient. The platform becomes even more attractive with daily bonuses and cashback offers. To start your journey, head over to the LotusBook247 Login 


Team :  Lotusbook247 

https://lotusbook247.ind.in/


Like

BektA KanI
BektA KanI
2 days ago

Laser247 is India’s growing platform for online sports betting and live casino games. From cricket to card games, it has everything you need in one place. The platform is safe, secure, and fully optimized for mobile users. Fast UPI withdrawals and instant deposits make it hassle-free. Laser247 Login is perfect for both beginners and pro-level bettors. The gaming experience is smooth and the support team is always available

Team : Laser247

 https://thelaser247.club/  


Like

Amita Uttam
Amita Uttam
2 days ago

Diamond exchange Official is India’s most trusted diamond exchange betting app, offering a premium online platform for sports lovers. From cricket and football to live casino games, Diamond Exchange gives users a safe and exciting place to play and win. With instant deposits, fast withdrawals, and 24/7 support, DiamondExchOfficial ensures a smooth gaming experience for every user. Whether you’re a beginner or a pro, the Diamond Exchange platform provides real-time odds, live scores, and secure betting. Join DiamondExchOfficial today and experience the thrill of online betting like never before – fast, fair, and full of rewards.


Team : Diamond exchange

 https://diamondexchofficial.com


Like

shekhar dixit
shekhar dixit
2 days ago

Gold365 is a leading online gaming and entertainment platform offering users a secure and exciting experience. With Gold365 Login, players can easily access their accounts, explore a wide range of games, and enjoy seamless transactions. The Gold365 website ensures high-quality performance, fast withdrawals, and user-friendly navigation. Whether you are a beginner or an experienced player, Gold365 Login provides a convenient gateway to thrilling entertainment and exclusive rewards. Join Gold365 today to experience trusted gaming, excellent customer support, and real-time updates. Stay connected and enjoy a smooth experience every time you log in through Gold365 Login.

Team : Gold365

http://gold365ss.in/ 


Like

RaliyA KahaY
RaliyA KahaY
2 days ago

Join Diamond Exchange, the rusted name in online betting. Unlock a premium betting experience with Diamond Exch Login – your gateway to nonstop action.From cricket betting to live casinos, Diamond Exchange has it all.Fast signup, secure access, and 24/7 support make it effortless. Play smart, win faster, and enjoy real money rewards at Diamond Exch.

Team : Diamondexch

https://diamondexch.ing/


Like
bottom of page