Kamile Taouk, Bioinformatics Engineering, UNSW AND Sabrina Yan, Bioinformatics RA, Children’s Cancer Institute sits down with John Furrier for DockerCon 2020.
Docker helps Australia cure cancer, one child at a time
Containerization is helping fight childhood cancer, as Docker Inc. makes big-data research pipelines scalable and simplifies the sharing of data sets.
Cancer kills more children in developed countries than any other disease. It is the number one cause of death in school-aged children, according to a study conducted by the New England Journal of Medicine, and the only disease to hit the top five causes of death across children from birth to 19 years of age. But that figure could drop to zero thanks to faster, more efficient research pipelines that allow researchers to create precision treatments that target unique cancers within each individual child.
“What we do is we find the mutations that are causing the cancer, and that helps us determine what treatments or what clinical trials might be most effective for the kid,” said Sabrina Yan (pictured, right), bioinformatics research assistant at the Children’s Cancer Institute in Sydney, Australia.
“We’ve made a substantial impact on the survivability of several high-risk cancers in pediatrics,” stated Kamile Taouk (pictured, left), who is a student and intern.
Taouk and Yan spoke with John Furrier, host of theCUBE, SiliconANGLE Media’s livestreaming studio, during DockerCon Live. They discussed how the Children’s Cancer Institute has “Dockerized” its big-data analysis pipeline, enabling fast, efficient and cost-effective individualized cancer research for pediatric patients. (* Disclosure below.)
Personalization makes cancer treatments more effective
cancer_finalAs every engineer knows, even the most precisely planned processes can veer off course. The human cell is no exception. The first known diagnosis of a malignant tumor is shown on a papyrus from Ancient Egypt. And for centuries, the disease was thought to have no cure. Even now, many fear the diagnosis as a death sentence.
Despite the figures that show cancer as the leading cause of death for children, the survival rate is an optimistic 84% across children and adolescents in the U.S. But those figures drop dramatically when narrowed down to aggressive forms of cancer.
Children’s Cancer Institute is helping to lead the The Zero Childhood Cancer Program, which includes a study of some of the approximately 200 young Australians diagnosed with high-risk cancers each year. These children currently face only a 30% chance of living to adulthood. That containerization could help change these odds seems a far-fetched use of the technology. But cancer has gone from being untreatable to beatable, and modern data science is behind the change.
Most cancers are treated with a combination of surgery, radiation and chemotherapy. However, “children are unique in the sense that a lot of the typical treatments we use for adults may or may not work, or will have adverse side effects in kids,” Yan stated. This is where technology comes to play; enabling researchers to focus on the profile of a patient’s individual tumor through genetic sequencing.
“That allows us to specialize the medication and the treatment for that patient and essentially lets us improve the efficiency and the effectiveness of the treatment, which in turn obviously has an impact on the survivability of the cancer,” Taouk stated.
Terabytes of data per child
Discovering a patient’s genomic profile requires creating a whole genome and RNA sequencing process pipeline. “We sequence the healthy cells, and we sequence their tumor cells,” Yan said, adding that the information is then analyzed to pinpoint the mutations specific to that child.
While the research process may sound simple when described as such, each patient has several hundred terabytes worth of data. And until recently, Taouk, Yan, and fellow researchers relied on an inflexible system of multiple, fine-tuned applications with multiple dependencies. This meant setting up a single pipeline to research treatments for one child could take days, or even weeks.
“And even then, a lot of things didn’t work,” Yan stated.
The problem was that each pipeline required customization for tools with multiple different dependencies. “To install four different versions of Python, or three different versions of R, or different versions of Java onto one machine in order just to run [a process] is a bit of a pain,” Yan said.