Gen3 is an open source data platform that enables the creation of cloud-based data commons and ecosystems for omics data. It can be deployed on Amazon Web Services, Google Cloud Platform, Microsoft Azure, and OpenStack environments. Gen3 aims to accelerate scientific research discovery by making it easy to explore, analyse and share data with researchers, developers and health institutions.
As a highly interoperable and portable data platform, Gen3 simplifies the process of setting up a data infrastructure, requiring much less effort compared to building monolithic applications from scratch. This ease of use is further supported by a strong community that assists individuals in establishing their own data commons. Gen3’s open source nature and interoperability allow for the creation of a data mesh, where multiple Gen3 data commons can seamlessly interact and collaborate. This interconnected ecosystem maximises the value of data for researchers, breaking down silos and enabling more efficient and effective research processes.
Key features of Gen3 include:
- Open APIs: Each Gen3 service supports APIs for submitting data, indexing data objects, managing metadata and custom query requests, or app development.
- Security and tiered data access: Gen3 allows for controlled user access to datasets, with optional tiered access, so users only see summary information for data they don’t have access to.
- Open-source community: Users are able to engage with the Gen3 community to provide feedback, influence development of new features, discuss technical and scientific issues and ask any questions of the Support team.
- Customised Gen3 experience: The modular nature of the Gen3 service allows for customisation of the features included e.g. query requests, programmatic data search, custom app development. It can also provide a foundation for exploring new tools for sharing and analysing data across resources in the cloud.
Gen3 aims to follow GA4GH standards to enable interoperability with other systems and to simplify the use of a Gen3 Data Commons. GA4GH products included in Gen3 are the Data Repository Service (DRS), Data Connect, Passports and Visas, and Task Execution Service (TES).
Gen3 is currently maintained by the Center for Translational Data Science at The University of Chicago
Additional Resources:
- BioCommons article - What is Gen3?
- BioCommons article - How do I establish/deploy a Gen3 instance?
- BioCommons article - How do I develop a Gen3 data dictionary?