“Big Data Cup” Experience



The Big Data Cup was an open-source hockey analytics contest, set up by stathletes.com, for students and professionals. There were two sets of data available to analyze, scouting data from the men’s Canadian junior hockey league and a Women’s Olympic hockey data set. I was part of the CANA team put on this task, along with CANA’s Walt DeGrange, Lucia Darrow, and Thomas Scully.


This contest was the first time that I had participated in an analytics project, and it was extremely interesting and educational. We set up a hackathon to work on the majority of the contest all in one day. A hackathon is a scheduled workshop where a larger number of people get together to collaborate on a specific project. Before the hackathon, we were tasked with going over the data set and understanding the meaning of each row and column. During the hackathon, we established what research question we wanted to explore, and how we wanted to think outside of the box. We knew that everyone was going to want to analyze something involving goals for or goals against, but that is one of the most analyzed aspects of hockey already. We decided to take a deeper dive into the passing strategies of each team, and how they changed due to events in the games.


Passing strategy dictates the flow of a hockey game, a major factor in influencing in-game strategic coaching decisions. North-South and East-West passing, referring to the direction of the pass, are two key strategies considered in this analysis. In this report, we explore the question: which passing strategy did teams employ under different game scenarios? Using the Women’s Olympic hockey data set provided, we set out to uncover insights regarding the usage of the two strategies. In this analysis, we explore the relationship between score difference and passing strategy through visualization, clustering, and in-depth game analysis.


Exhibit A. A Sample of the Game Visualization and Analysis


This analysis took a bit longer than expected, leading to two hackathon sessions. In the first one, we spent a good amount of time developing the algorithm to properly identify which passes were considered North-South and East-West. While the second hackathon was mainly focused on creating the visualizations for our analysis. The results of this analysis showed us many things. The first being that teams who are losing games usually become more aggressive with their passing strategy, North-South passing. It also provided key areas where passing the puck was most successful and most intercepted, giving coaches a sense of where to and where not to pass on the ice. However, it was what our visuals did not show us that helped our analysis the most (see above Exhibit A). The empty areas, on the ice, in our visuals were where passes were rarely even attempted, which led us to believe that is where the opposing team was set up. What this does is help coaches create a defensive strategy to counter the passing strategy used by each team or to deploy an offensive passing strategy that will be the most effective. Again, it would all depend on the team that you are facing because even though there are similar patterns made by every team, each team has their own strategy and set plays that you have to account for.


Overall this was an extremely fun event to be a part of, especially for someone who has a strong passion for hockey and is new to programming and analytics!


By: Jack Murray, Sports Analytics Intern




Jack is an intern with CANA Advisors. To learn more about CANA’s internship program, please contact Ms. Cherish Joosteberns at cjoostberns@canallc.com.