The NCI Genomic Data Commons (GDC) was launched in 2016 and is now used by over 100,000 researchers each year to explore interactively over 2.5 PB of cancer genomics and associated clinical, imaging and other data. Using cloud-based platforms to accelerate research and discovery by harmonizing submitted data and sharing it with the research community through a variety of applications is clearly having an impact and other cloud-based data platforms have been developed over the past couple of years, including:
- the Kids First Data Resource for pediatric cancer and birth defects;
- the DataSTAGE platform that is being being built for TOPMed and other data by NHLBI; and,
- the AnVIL platform that is being developed for genomics data by NHGRI.
As the number of cloud-based platforms for biomedical data begins to grow, it is becoming increasingly important to understand how best to interoperate these and other emerging platforms.
A NIH Workshop on the interoperability of cloud-based data platforms took place in Chapel Hill, NC on October 3-4, 2019 to begin to explore these and related issues.
I wrote a white paper for the workshop that you can find on Medium. I also posted my slides on Slideshare.