Pitfalls With Using the data.table Package
- Aaron Luprek
- Jan 10, 2022
- 3 min read

An R mystery
Someone recently sent out a message in our development team internal chatroom with the subject line "[s]ome bizarre R code behavior." We had used a typical workflow where we duplicated a data frame into a copy of another data frame, i.e., df_new <- df_original. This allows us to do some data munging on the new data frame, while the original is left intact. Since assignment in R creates a distinct copy in memory, any changes to the new data frame would have no impact on the original.
That is not what was happening.
Instead, modifying column names of the new data frame was also modifying them in the original! This caused a crashing error further along in the script where there was code that depended on the column names. Basically, R was behaving like other programming languages where an assignment just created a pointer or reference, and any changing of the value of one variable changed the value for both.
A simplified version of the buggy code would look something like this:


This gives the error:

The Problem
Why doesn't the column col1 exist in the data frame df_original? df_original should be unchanged, and we never deliberately removed the column.
The hint is in the packages that we used, specifically, data.table. data.table is a powerful package that includes many convenience functions. One huge advantage of data.table over base R objects like data.frame is that data.table has better performance on large datasets. But in this case, the catch is how it achieves this performance gain. According to the documentation, "all set* functions change their input by reference. That is, no copy is made at all, other than temporary working memory, which is as large as one column." [https://cran.r-project.org/web/packages/data.table/data.table.pdf]
In other words, this extends the base R data.frame to behave similarly to some other programming languages. It allows modification in place in memory, rather than having to do inefficient copying. This sounds like exactly what is going on in our buggy code, where we don't seem to have a separate copy of the data frame.
In this case, a data.table function sneaked into our code without us realizing it -- setnames(). Because we were mixing functions from various packages, the issue was not obvious to us.
So what is really going on?
A deep dive into R memory handling
Let’s use the lobstr package [https://cran.r-project.org/web/packages/lobstr/index.html] to investigate the structure of how these data frames are actually stored in memory.

This returns:

The possibly-intimidating-looking hex codes (e.g., 0x7fb6f3302008) give the address in memory of that object and all of its properties. If you look closely, you’ll see that all of the addresses are identical between df_original and df_new. So both variables are pointing at the same object in memory. How can this be, if R always makes a new copy when you assign a variable?
The answer is that despite R’s frequent copying, it tries to be efficient in its memory handling. When the df_new variable is first assigned, it just points to the same address in memory. But if any data is modified, then R creates a copy and modifies that:

Result:

You can see how the memory address of the whole object has changed from 0x7fb6f3302008 to 0x7fb6f2f389c8. And more importantly, the address of col2 has changed, while col1 has stayed the same.
We’re narrowing in on the problem with using a data.table function. Notice that the data.frames have an attribute called names. This is the vector that holds the names of the columns. Also, notice that the address of this attribute is identical between the two data frames (0x7fb6f3a4dc88). Recall from the data.table documentation that “set” functions, including setnames(), change their input by reference. So if you call setnames() on df_new, it will change the single value in memory that both variables point to:

Returns:


If we were to use a base R function to rename the columns, such as names() or colnames(), rather than the data.table function, then R would simply copy the names attribute, just like it did above when we set a value in col2. So the solution to our buggy code from the top is to simply replace the call to setnames() with names():

Result:

Only the column in df_new is renamed, and no errors!
Lesson learned?
The conclusion that I drew from this, is that while data.table can be very useful and efficient, there are pitfalls that developers need to watch out for. You need to be intentional about using data.table functions and be especially careful about using them on data frames and not actual data.table objects.
One suggestion is to explicitly use the package name before functions, e.g., data.table::setnames() to avoid confusion of what package the function comes from.
Thanks to CANA’s Renee Carlucci, Rick Hanson, and Rocky Graciani for helping to solve this R mystery.

Aaron Luprek is a Senior Software Developer here at CANA. You can contact Aaron at aluprek@canallc.com or LinkedIn.





LotusBook247 stands out as one of the finest platforms for sports betting and online casino games. Offering players a smooth, safe, and exciting experience anytime, anywhere, it covers everything from cricket betting to live casino tables. With easy navigation and secure payment options, both beginners and seasoned players will find it convenient. The platform becomes even more attractive with daily bonuses and cashback offers. To start your journey, head over to the LotusBook247 Login
Team : Lotusbook247
https://lotusbook247.ind.in/
Laser247 is India’s growing platform for online sports betting and live casino games. From cricket to card games, it has everything you need in one place. The platform is safe, secure, and fully optimized for mobile users. Fast UPI withdrawals and instant deposits make it hassle-free. Laser247 Login is perfect for both beginners and pro-level bettors. The gaming experience is smooth and the support team is always available
Team : Laser247
https://thelaser247.club/
Diamond exchange Official is India’s most trusted diamond exchange betting app, offering a premium online platform for sports lovers. From cricket and football to live casino games, Diamond Exchange gives users a safe and exciting place to play and win. With instant deposits, fast withdrawals, and 24/7 support, DiamondExchOfficial ensures a smooth gaming experience for every user. Whether you’re a beginner or a pro, the Diamond Exchange platform provides real-time odds, live scores, and secure betting. Join DiamondExchOfficial today and experience the thrill of online betting like never before – fast, fair, and full of rewards.
Team : Diamond exchange
https://diamondexchofficial.com
Gold365 is a leading online gaming and entertainment platform offering users a secure and exciting experience. With Gold365 Login, players can easily access their accounts, explore a wide range of games, and enjoy seamless transactions. The Gold365 website ensures high-quality performance, fast withdrawals, and user-friendly navigation. Whether you are a beginner or an experienced player, Gold365 Login provides a convenient gateway to thrilling entertainment and exclusive rewards. Join Gold365 today to experience trusted gaming, excellent customer support, and real-time updates. Stay connected and enjoy a smooth experience every time you log in through Gold365 Login.
Team : Gold365
http://gold365ss.in/
Join Diamond Exchange, the rusted name in online betting. Unlock a premium betting experience with Diamond Exch Login – your gateway to nonstop action.From cricket betting to live casinos, Diamond Exchange has it all.Fast signup, secure access, and 24/7 support make it effortless. Play smart, win faster, and enjoy real money rewards at Diamond Exch.
Team : Diamondexch
https://diamondexch.ing/