CANA Advisors

7371 Atlas Walk Way
Gainesville, Virginia 20155
Telephone (703) 317-7378
Facsimile (571) 248-2563

CANA Advisors A Veteran-Owned and Woman-Owned Company. © CANA LLC 2019. All Rights Reserved.

Notes from FiveThirtyEight Talk on Telling Stories

February 17, 2017

“This is the best talk I’ve attended in over a year.”- Harrison Schramm

 

You may know Harrison Schramm from his “5 Minute Analyst” articles and blog posts, and when he isn’t thinking of the cost of the Death Star or solving the logistics problems of Harry Potter, he also is one of CANA Advisors’ Principal Operations Research Analysts. Recently he had the opportunity to go to a FiveThirtyEight Talk on Telling Stories (at the RStudio::conf ). In his words, Harrison said, “[t]his is the best talk I’ve attended in over a year.” In a change of pace from writing a blog post or article on the talk, we asked Harrison if he would share his notes on the event, and he was kind enough to pass them along. We hope these notes spark your interest in not just the ‘how’ but the ‘why’ of statistical analysis.

 

****From the Event Notebook of Harrison Schramm****

 

Data Journalism Principles:

Story leads data follows use rigorous but interminable methods: Be accurate, Be fast, and  Be transparent.

 

Useful tools for R.

tidyverse is the tool of choice for data. (The tidyverse is a set of packages that work in harmony because they share common data representations and API design. https://blog.rstudio.org/2016/09/15/tidyverse-1-0-0/)

 

In the interest of transparency, FiveThityEight has created an R package. (Nate Silver’s FiveThirtyEight uses statistical analysis — hard numbers — to tell compelling stories about politics, sports, science, economics and culture. https://github.com/fivethirtyeight).  For example, if you would like to see a breakdown of Avengers Characters by longevity and gender, you can do the following:

 

Install.packages(“fivethirtyeight”)

 

Library(ggplot2); library(magrittr); library(“fivethirtyeight”)

 

avengers %>% ggplot(aes(factor(death1), years_since_joining)) + geom_violin() + facet_wrap(~gender) + xlab("Currently Living?") + ylab("Years Since Joining") + ggtitle("Avengers Characters Violin Plot - Status vs. Years")

 

 The Six Types of Data Stories

  1. Novelty

  2. Outlier

  3. Archetype

  4. Trend

  5. Debunking

  6. Forecast

 
Novelty Data Story: Basic questions are first.
  • New Data Story danger: Triviality

  • Remedy: Simple Summaries

  • Ask yourself: Is this data meaningful to others?

Outlier Stories
  • Danger: Spurious Result

  • Tactic: Characters - talk about who the outlier is: who is it, what company is it, etc.

  • Profile one of the characters from the outlier group, then introduce the statistics

  • Ask yourself: Is this really so different?

Archetype Stories
  • Danger: Oversimplification

  • Tactic: Modeling

  • Ask Yourself: What Variables am I leaving out?

Trend
  • Trends: Terrorism overall declining in the EU, but religiously inspired attacks rising.

  • Done using dplyr, data %>% group_by %>% summarize %>% ggplot 

  • Danger: Variance - regression to the mean

  • Tactic: Be Conservative

  • Ask yourself: Is this signal or noise?

  • Fun Quote: If you can always tell a valid trend, you should be trading on wall street, not telling data stories

Debunking
  • Bechdel test: Examines how women are portrayed in movies. 1. Are there 2 or more women, 2. Do they talk to each other, 3. Do they talk to each other about something other than men?

  • Danger: Confirmation Bias - your own belief in the debunking action.

  • Tactic: Showcase Failures

  • Ask Yourself: How much do I want to debunk this?

  • Quote about p-hacking: Warning: This is evil (statistical) work. Do not go to the dark side. Do not try this at home.  Note:  You can read Harrison’s piece on P-hacking appearing in OR/MS Today here: https://www.informs.org/ORMS-Today/Public-Articles/June-Volume-43-Number-3/P-value-Primer-P-OR-P-values-in-operations-research-M-N-O-P-Q-R-S-T

  • Example of p-hacking: Eating potato chips leads to higher SAT Math scores.

Forecast (You work a narrow path here)
  • Danger: Overfitting

  • Tactic: Simulations and scenarios

  • Ask Yourself: Am I properly conveying the uncertainty in my model?

We hope these notes from Harrison Schramm on R and how to use it to tell a story with your statistical and analytical data is useful.

 

Follow Harrison (@5MinuteAnalyst on twitter) and the rest of the CANA Advisors’ Team (@CANAADVISORS on Facebook and twitter) for more insight, blog posts and articles devolving into data, logistics and analytics in creative and helpful ways.

 

Other interesting CANA Articles on R:

 

Blog Article: Document Preparation... in R?
http://www.canallc.com/single-post/2016/09/02/Document-Preparation-in-R

 

Blog Article: Notes on The Seven Pillars of Statistical Wisdom
http://www.canallc.com/single-post/2016/09/16/Notes-on-The-Seven-Pillars-of-Statistical-Wisdom

Please reload

Recent Posts

November 11, 2019

Please reload