DEEP LEARNING IN THE CANA CAR: A CANCER TRIAL DATA SCIENCE CHALLENGE

Jerome Dixon
Oct 14, 2020
3 min read

Updated: Oct 16, 2020

As a Senior Operations Research Analyst at CANA Advisors, I’ve had the opportunity to apply both my military logistics experience and healthcare analytics expertise to a variety of challenging problems. I recently shared a learning experience with peers at our monthly CANA Analytics Roundtable (CAR). The CANA “CAR” is a monthly gathering open to all hands of the company, which highlights employee-chosen topics related to analytics techniques and technologies. Each monthly CAR typically hosts four to five short presentations. During our most recent roundtable, I talked about my strategy and initial execution of a cancer trial data science challenge hosted by Oak Ridge National Laboratories.

The challenge set forth was to analyze cancer patient information and data sets to appropriately determine an individual’s assignment to a select clinical trial. I identified 25 input features within the patients’ information database that would eventually feed into the target variable – “Selected for Study, Yes or No.” I noted a specific need for improvement based on doctors’ feedback: trial names did not always match to the relevant disease site, thereby missing a critical linkage point between patients and potential trial participation.

Example of PyTextRank word association links — Example of PyTextRank

I used PyTextRank as one means to address the data challenge. Although the “bag of words” is a common model in text mining, it focuses mostly on simple word identification and count. In this instance, I felt Python’s Pytextrank library was the right tool, given the types of trials and abstracts in the challenge, to classify titles to the correct cancer sites. Pytextrank can be used not only for identification and counting but also to select keywords, assign importance to the word, and build summary sentences from text. I used Pytextrank to review the summaries and to establish an Eigenvalue centrality metric that ranked node importance based on not only the number of connections but also the quality of the connected nodes, essentially creating a network strength metric.

Another critical tool was the R deep learning API, Keras. Although it appears a difficult language to work in, it seemed most of the heavy lifting work was done in data preprocessing to put the data in matrix vector format. Keras was critical in addressing my intent to define and train the model to appropriately match a large body of cancer trials to specific cancer anatomical sites, e.g., brain, breast, prostate, etc., thereby enabling efficacious patient match-up to a potentially useful trial. In order to put these different elements together, I used Reticulate to embed Python Pytextrank in R. This approach was fairly effective, and I was able to demonstrate initial iterations of my model.

As I continued through validation and analysis of my approach, I realized the classification model did not produce what I considered significant results. I need to further feature engineer the text corpus dataset and improve the model's input features. This iterative process will help determine features that best represent, classify, and connect the data flowing into the model to provide optimal results. My next steps are to experiment and test out the methods used here in https://cloud4scieng.org/2020/08/28/deep-learning-on-graphs-a-tutorial/.

This deep learning approach may reveal more about the underlying structure of the cancer study data; define the nodes and edges that detail its connections and features; identify or predict links and communities; and enable classification between classes. I intend to, quite literally, connect the dots of the data to solve this cancer clinical trial challenge.

Jerome Dixon is a Senior Operations Research Analyst at CANA Advisors. jdixon@canallc.com

6 Comments

peterson tyler

Jun 17

Looking for family-friendly online games? Uno Online is engaging and easy for all ages to learn.

nytwordlehints

May 28

Thanks for the write up! Also, just a heads up, your RSS feeds aren’t working. Could you take a look at that? merge fellas

Free game

May 14

These articles interest me, they’re great. I am pulling for you in your next work.

Dragonsweeper

FiddleBops Incredibox has a vibrant community of players who enjoy competing and sharing their experiences. Join online forums and social media groups to discuss strategies, share tips, and participate in community events.

unknownytube

Feb 23

Click here provide members with discounts on over-the-counter medications, vitamins, and health essentials, promoting better health management and cost-effective wellness solutions. kaiserotcbenefits.com - more details here

Click here help you find recent death notices, providing information about funeral services, memorials, and tributes for loved ones in your area. obituariesnearme.com - more details here

Click here? Many users have had mixed experiences with the platform, so it's important to read reviews and verify deals before booking. istravelurolegit.com - more details here

DEEP LEARNING IN THE CANA CAR: A CANCER TRIAL DATA SCIENCE CHALLENGE

Recent Posts

6 Comments

CANA Site Map

CONTACT US

Thanks! Message sent.