DEEP LEARNING IN THE CANA CAR: A CANCER TRIAL DATA SCIENCE CHALLENGE
- Jerome Dixon 
- Oct 14, 2020
- 3 min read
Updated: Oct 16, 2020

As a Senior Operations Research Analyst at CANA Advisors, I’ve had the opportunity to apply both my military logistics experience and healthcare analytics expertise to a variety of challenging problems. I recently shared a learning experience with peers at our monthly CANA Analytics Roundtable (CAR). The CANA “CAR” is a monthly gathering open to all hands of the company, which highlights employee-chosen topics related to analytics techniques and technologies. Each monthly CAR typically hosts four to five short presentations. During our most recent roundtable, I talked about my strategy and initial execution of a cancer trial data science challenge hosted by Oak Ridge National Laboratories.
The challenge set forth was to analyze cancer patient information and data sets to appropriately determine an individual’s assignment to a select clinical trial. I identified 25 input features within the patients’ information database that would eventually feed into the target variable – “Selected for Study, Yes or No.” I noted a specific need for improvement based on doctors’ feedback: trial names did not always match to the relevant disease site, thereby missing a critical linkage point between patients and potential trial participation.

I used PyTextRank as one means to address the data challenge. Although the “bag of words” is a common model in text mining, it focuses mostly on simple word identification and count. In this instance, I felt Python’s Pytextrank library was the right tool, given the types of trials and abstracts in the challenge, to classify titles to the correct cancer sites. Pytextrank can be used not only for identification and counting but also to select keywords, assign importance to the word, and build summary sentences from text. I used Pytextrank to review the summaries and to establish an Eigenvalue centrality metric that ranked node importance based on not only the number of connections but also the quality of the connected nodes, essentially creating a network strength metric.

Another critical tool was the R deep learning API, Keras. Although it appears a difficult language to work in, it seemed most of the heavy lifting work was done in data preprocessing to put the data in matrix vector format. Keras was critical in addressing my intent to define and train the model to appropriately match a large body of cancer trials to specific cancer anatomical sites, e.g., brain, breast, prostate, etc., thereby enabling efficacious patient match-up to a potentially useful trial. In order to put these different elements together, I used Reticulate to embed Python Pytextrank in R. This approach was fairly effective, and I was able to demonstrate initial iterations of my model.

As I continued through validation and analysis of my approach, I realized the classification model did not produce what I considered significant results. I need to further feature engineer the text corpus dataset and improve the model's input features. This iterative process will help determine features that best represent, classify, and connect the data flowing into the model to provide optimal results. My next steps are to experiment and test out the methods used here in https://cloud4scieng.org/2020/08/28/deep-learning-on-graphs-a-tutorial/.
This deep learning approach may reveal more about the underlying structure of the cancer study data; define the nodes and edges that detail its connections and features; identify or predict links and communities; and enable classification between classes. I intend to, quite literally, connect the dots of the data to solve this cancer clinical trial challenge.

Jerome Dixon is a Senior Operations Research Analyst at CANA Advisors. jdixon@canallc.com




I can see why; you did an excellent job of making it intriguing. I really appreciate what you do. The excitement of jumping is the main focus of Wacky Flip.
MagicWin India offers a seamless and secure platform for both new and experienced players. The step-by-step guide on the MagicWin login process is incredibly helpful, ensuring users can access their accounts effortlessly. With a wide range of games and betting options, it's clear why MagicWin is becoming a top choice for online gaming enthusiasts. Highly recommend exploring their offerings
–Team Magicwinind
https://magicwinind.in/
The developers of Steal A Brainrot actively listen to community feedback, delivering updates that refine balance and introduce fresh content regularly.
Looking for family-friendly online games? Uno Online is engaging and easy for all ages to learn.
Thanks for the write up! Also, just a heads up, your RSS feeds aren’t working. Could you take a look at that? merge fellas