What is OmixAtlas?
Vast and diverse biological data are being generated every year and deposited in repositories by academic labs and organizations worldwide. These data hold tremendous potential for reuse and drug discovery but are scattered across multiple, disparate sources and lack standardization. Thus, the availability of data does not equate to its easy usability.
Additionally, no single type of data, be it metabolomic, proteomic or genomic, will be sufficient to capture the complexity of biological phenomena. These challenges to integrate different data types to increase usability of vast amounts of data needs to be addressed. OmixAtlas aims to address these issues by ensuring that the metadata from different data types and across different data sources have been curated and harmonized and made ready for downstream machine learning and analytical applications.
OmixAtlas is a collection of datasets with curated biomolecular data for understanding human physiology and disease pathology. It aims to adopt an integrated approach with access to ML-ready biology-centric data from public and proprietary repositories to generate new insights. Curated metadata makes the datasets more searchable and queryable.
What are the FAIR principles OmixAtlas is based upon?
OmixAtlas, is a repository of FAIR (Findable, Accessible, Interoperable, Reusable) data. It is a collection of millions of datasets from public, proprietary, and licensed sources that have been curated, harmonized, and made ready for downstream machine learning and analytical applications.
All datasets on Polly go through a 2-step process:
- Data Engineering: This includes transforming data to fit a proprietary data schema that is uniform across several datatypes. The transformation streamlines data in one consistent schema and allows users to query multiple data types on a single data infrastructure.
- Metadata Harmonization: This involves tagging each sample and dataset with a uniform ontology.
What are the user benefits/features of OmixAtlas?
User benefits
Working with multi-omics data | How our OmixAtlases can help |
---|---|
Finding multi-omics data relevant to user can take weeks as data is semi-structured and scattered across several sources | Access to thousands of tissue-derived or disease specific multi-omics datasets from multiple sources in one place |
Enriching data and preparing it for machine learning applications takes time and effort that could be better spent on insight discovery | OmixAtlas data is processed through a standard pipeline and enriched with harmonized, scientifically relevant metadata, ready for machine learning applications |
User wants access to curated data but don't want to switch to another computational platform | OmixAtlas data can be accessed and analyzed on your own existing computational infrastructure |
Key features
- Multi-omics data in one place: OmixAtlas provides access to over 26 different data types from over 30 data sources.
- Continuously evolving : In keeping with the rapid pace of biological data generation, the OmixAtlas is updated with a frequency that is in sync with the source repositories, wherever applicable.
- ML-ready data made available through Curation : All types of data from diverse sources have been structured, metadata have been harmonized with controlled vocabularies and ontologies. Curated metadata makes datasets searchable and findable across data sources.
- Integrability : OmixAtlas is made available on Polly's data infrastructure, which allows querying of the data through point and click solutions and advanced querying through polly libraries. Code-based advanced access means that you can use these data on Polly or outside of Polly on a platform of your choice.
How is OA different from available biomedical data repositories?
The data in OmixAtlas are curated through Polly's ML-based curation workflow that structures different types of data, harmonizes metadata, and makes the data analysis ready. All data available in OmixAtlas can be queried and directly used in downstream statistical or ML-based analyses. With OmixAtlas one can:
- Access data from heterogeneous sources: OmixAtlas has over 26 data types from over 30 public repositories and other sources in one central location. You can access study, samples, data files, and associated metadata all in one place.
- Leverage curation infrastructure to standardize data : Build machine learning applications faster with analysis-ready data. Use Polly's curation models to process public and proprietary data through standard ingestion pipelines.
- Query heterogeneous biomolecular data with code : Perform integrative analysis with powerful code-based querying capabilities across the OmixAtlas data catalog. Use Polly Libraries for exploring data in depth through code.
- Access and analyze data on any computational environment : Stream data to your compute infrastructure from OmixAtlas. Focus on analysis while OmixAtlas takes care of data storage and management.
- Manage in-house data at scale with Enterprise OmixAtlas : Harmonize proprietary data using the scalable curation infrastructure of Polly, making it ML-ready for discovering new insights.
Data Sources at OmixAtlasses
There are two types of data sources at OmixAtlasses - 1. Where Elucidata provides all the data to the users and users can find the data themselves and choose their datasets of interest. 2. Where users come to our experts and request for the kind of datasets they are interested in. Out experts will find the relevant datasets.
For the first data source, we have a new offering called 'Polly Basic'. In this, there are two types of OmixAtlasses - 1. Source Atlas - Source atlas is where all the ML-ready curated datasets are stored. There are 2 source atlasses - Bulk RNASeq OmixAtlas and Single Cell RNASeq OmixAtlas.
2. Destination Atlas - Destination atlas is where users store their datasets of interest. Users can find datasets in the source atlas and move datasets of interest to destination atlas for their analysis. To move the datasets, users would have to use the credtis that are prepaid. Users can do downstream analysis or on their own or they can contact our experts for any special service or requests (such as curating specific field, or for downstream analysis) that is chargeable with extra credits.
Destination Atlas Services
We have following services related to OmixAtlas - 1. Data Curation - Our experts can curate data on demand if the users have any specific metadata that needs to be added. 2. Downstream Analysis - Users can request for downstream analysis with the dataset of their interest.