There are cancer luminaries who believe that if we stopped funding and performing cancer research now, we could still save thousands of lives by sharing data and knowledge that already exists. Whether this is true is open to debate, but when you consider the petabytes or maybe even exabytes of siloed scientific and clinical data, people like Jill Biden, formerly of the Biden Cancer Initiative, might be onto something.

Dr. Biden was one of many experts speaking at AACR 2019 about data sharing and other ongoing programs to encourage and facilitate a more collaborative approach to aggregation, access, and analysis of data. The Biden Cancer Initiative was formed to accelerate progress in cancer prevention, detection, diagnosis, research, and care, and the group has honed in on data sharing, patient access to data, and data standards as areas requiring attention.

The importance of data sharing was echoed by Jaime Guidry Auvil, director of the new NCI Office of Data Sharing. According to Dr. Auvil, “Data sharing has often been seen as something that was altruistic or maybe an afterthought to science but is becoming, especially in our digital age, more of a focus of discovery so not only can you replicate the science that is going on but you can make new discoveries from the information that is there.”

The ODS is currently working a comprehensive data- sharing vision and strategy for the cancer research community that includes meaningful incentives to encourage collaborative work.

The NCI’s Genomic Data Commons (GDC) is currently undergoing a tranformation, according to Robert Grossman, Ph.D., a professor at the University of Chicago as well as a principal investigator for the GDC. At the AACR meeting, he spoke about the efforts to transform GDC from a standalone repository to part of a cancer data ecosystem. This larger entity, which will include proteomics as well as imaging data, will make it easier to share and analyze data for the thousands of scientists all over the world who access GDC data, he said.

Building communities

Justin Guinney, Ph.D., vice president of computational oncology at Sage Bionetworks, is actively involved in encouraging researchers to study data from others and spoke about DREAM Challenges as a way to convene global communities and incentivize researchers to share data and propose solutions to fundamental biomedical questions.

Sage provides the expertise and infrastructure to host challenges, which are community designed and run. Ongoing challenges include the Malaria DREAM Challenge, Tumor Deconvolution DREAM Challenge, and IDG-DREAM Drug-Kinase Binding Prediction Challenge.

Overcoming health disparities

James Lillard, Ph.D., professor and associate dean at Morehouse School of Medicine, is also a PI in AACR’s 2020 by 2020 initiative, which has set out to perform genomic sequencing of both malignant and benign tumor tissues from 2020 consented African American cancer patients by 2020.

The initiative seeks to improve the democratization of cancer precision medicine and bring the best discoveries, biomarkers, and therapies to all communities. He said that when you look at specimens and data sets that are shared among cancer researchers, only 2–3% are from African American patients, when to be truly representative of the population, that percentage should be closer to 13%.

The AACR touts this program as a very important initiative to help address cancer health disparities. The 2020 by 2020 program is overseen by AACR and partners including Pelotonia, M2Gen, and ORIEN, and the goal is to create a unique, extensive, and freely available data set that “holds the promise for significantly enhancing cancer prevention, diagnosis, and treatment in a medically underserved population.”

Health data dashboard

Many of the data-sharing experts at AACR spoke about health data ownership as well as access and sharing. Tatyana Kanzaveli, CEO of Open Health Network, has created a health data exchange platform that allows patients to gather all their health data into one data dashboard. That product, called PatientSphere, uses cutting-edge technology, including AI to help patients make sense out of their data and blockchain for patient identity and consent management. The platform facilitates data sharing and potentially monetization of health data in a HIPPA-compliant way.

In what Kanzaveli referred to as “a new paradigm on how to engage with patients,” PatientSphere allows users to publish their metadata in a de-identified metadata server, where they can be “found” by researchers or pharma companies who might want to engage with them or pay them for use of their data.

Next steps

Data sharing in cancer research is definitely a work in progress with many new technologies, methodologies, and passionate groups working together for the benefit of patients.

What will it take to convince all investigators, no matter the location or career stage, to share data? Many funders and institutions are looking into how they can incentivize researchers to collaborate and share data. The University of Florida, for instance, has built a requirement for “team science” into its tenure and promotion process. So far the results have been encouraging and a collaborative spirit is rising in Florida as well as at many other institutions in the U.S. and worldwide.

Collaborative efforts advancing PDX models

Patient-derived xenograft (PDX) models are used extensively in cancer research and provide invaluable insights into many areas including cancer biology, identification of cancer biomarkers, and drug screening. The Jackson Laboratory, which has over 400 PDX models, is involved in a number of initiatives that have a collaborative component.

PDXNet is an NCI program that seeks to bring scientists together to collaborate on the development and preclinical testing of targeted therapeutic agents in PDX models. JAX and Seven Bridges form the data commons and coordination center (PDCCC) of the initiative. They are building a data-storage, sharing, and analysis platform that harmonizes PDXNet data with other large datasets and analysis workflows available in the NCI Cancer Genomics Cloud. They are also creating analysis workflows for many common genomics analysis tasks such as mutation calling and expression analysis, in addition to specialized workflows optimized for use on PDX data.

Also, PDX Finder was developed by EMBL-EBI and JAX to facilitate PDX model discovery. It is an open, online catalog of PDX models from academia, large consortia, and CROs. The web portal is intended to save researchers’ time because now they can search one central catalog instead of multiple individual resources.

PDX Finder is actively recruiting PDF model data. According to its website, it can collect and display data, making models more discoverable and visible for end users. It is supported by public funding and is completely free to providers and users. PDX Finder also encourages producers to “send data (OMIC or drug dosing studies, for example). PDX Finder can upload and store any OMIC data and drug dosing, making models even more discoverable.”