Alex’s Lemonade Stand Foundation’s lab leverages big data in fight against cancer
September 11, 2019 Category: Featured, Medium, Purpose“We thought, let’s go for something big,” said Jay Scott, co-executive director of Alex’s Lemonade Stand Foundation, “And let’s try to build a center that can use big data to help fight childhood cancer.”
The foundation, which raises funds for pediatric cancer research, established the Childhood Cancer Data Lab (CCDL) in 2017. Based in Center City, the lab’s data scientists, coders, and designers create data tools and trainings “to empower childhood cancer researchers to utilize vast amounts of data to make more robust discoveries and cures, faster and cheaper.”
The CCDL is hosting a Data Science Training workshop in Philadelphia from Oct. 14 to 16, which is already fully booked. The training’s four modules cover machine learning, how to process bulk and single-cell RNA sequencing data and R, a programming language for statistical computing.
Jaclyn Taroni, principal data scientist at the CCDL, said there was some work to be done around training pediatric cancer experts. “Science works best when we all understand as much of the process as possible,” she added.
The training provides an understanding of skills needed for gene expression analysis. Gene expression data is a kind of measurement, Taroni said. “It’s a molecular snapshot of what’s happening in that tissue,” she added. “So that could be cells in a dish, that could be a tumor. It could be a liver biopsy.”
The training materials were updated after a pilot training in Philadelphia in July 2018. This year, the CCDL has held trainings in Houston, Chicago and California’s Bay Area.
Robert Allaway, a research scientist at Sage Bionetworks, a nonprofit organization promoting open science for biomedical research, attended the Houston training in March.
“It was great to see how other people, like other labs, what their best practices are for a lot of this type of work,” Allaway said. “Whether it’s their approaches to doing the analysis or their approaches to tracking experiments.”
The people at the training had different levels of coding experience, he added. Although Allaway has some computational biology experience, the CCDL staff helped him analyze a dataset that had unique constraints.
At the training, Allaway used the CCDL’s data tool, refine.bio, which is a repository of harmonized transcriptome data from publicly available sources. Transcriptome data can be used to identify new potential therapies or try to predict tumor progression, Allaway said.
“Because I’m working in rare disease, there aren’t many datasets out there,” Allaway added. “I was easily able to find everything that I could possibly find, in terms of publicly available data.”
Lindsay Williams, a postdoctoral fellow in the Department of Pediatrics at the University of Minnesota, attended the Houston training as well. Williams, an epidemiologist who specializes in population-based analysis, wasn’t familiar with molecular analysis.
“The training was wonderful,” Williams said. “I mean, they walk you through absolutely everything that you would need to know, to analyze RNA sequencing data from start to finish.”
The CCDL staff taught Williams how to find gene expression differences by sex in the data she brought to the training, she added.
Williams’s research interests include pediatric brain tumors, an area that’s received more molecular work than epidemiological. “For me, I kind of approach it as trying to bridge the two fields together,” Williams said.
The CCDL hopes these trainings will give researchers the skills they need to analyze more of their own data themselves and communicate better with analysts.
“If more people have this more basic set of skills that they get from training, they will be less reliant on people who are more specialized,” Taroni said. “The idea being that they will only [seek] help from someone more specialized, as far as computational skills, when they need that.”