Analyzing Data

The Analyze class contains functions which can be used to identify cohorts in datasets, perform differential expression and pathway analysis, and execute meta-analysis workflows.

Parameters:

token (str, default: None ) –

Authentication token from polly

Usage

from polly.analyze import Analyze

analysis = Analyze(token)

identify_cohorts

identify_cohorts(repo_key, dataset_id)

This function is used to get the cohorts that can be created from samples in a GEO dataset. Please note: Currently only Bulk RNASeq datasets from GEO source are supported. If results are generated for other datatypes or datasource, they may be inaccurate. If you want to use this functionality for any other data type and source, please reach out to polly.support@elucidata.io

Parameters:

repo_key (int / str) –

repo_id or repo_name in str or int format
dataset_id (str) –

dataset_id of the GEO dataset. eg. "GSE132270_GPL11154_raw"

Returns:

DataFrame –

Dataframe showing values of samples across factors/cohorts.

run_meta_analysis

run_meta_analysis(repo_key, workspace_id, analysis_name, design_formulas, samples_to_remove=[])

Use this function to execute the Polly DIY Meta-Analysis Pipeline. Only the 'geo_transcriptomics_omixatlas' omixatlas is supported currently.

Parameters:

repo_key (int / str) –

repo_id or repo_name in str or int format
workspace_id (int) –

the workspace in which the datasets and results should be stored
analysis_name (str) –

name of your analysis, eg. "MA_BRCA_run1". The reports, datasets, and results will be stored in a folder with this name.
design_formulas (dict) –

key:value pair of atleast 2 dataset ids and a list of design formulas. eg. dataset_id:[design formula control, design formula perturbation]
samples_to_remove (list, default: [] ) –

List of samples to omit, if any. Defaults to [].

Examples

# Install polly python
!sudo pip3 install polly-python --quiet

# Import libraries
from polly import analyze
AUTH_TOKEN=(os.environ['POLLY_REFRESH_TOKEN'])
analysis = analyze.Analyze(AUTH_TOKEN)

Identify Cohorts

analysis.identify_cohorts("geo_transcriptomics_omixatlas","GSE117482_GPL17021_raw")

Factors based on which cohorts can be created for this dataset: ['genotype', 'gender', 'age__at_the_time_of_sacrifice_', 'treatment', 'experiment', 'curated_genetic_modification_type']

	genotype	gender	age__at_the_time_of_sacrifice_	treatment	experiment	curated_genetic_modification_type
geo_accession
GSM3301731	ko	sex: female	4 weeks	nd	exp1	[knockout]
GSM3301732	ko	sex: female	4 weeks	nd	exp1	[knockout]
GSM3301736	control	sex: male	4 weeks	nd	exp1	[wildtype]
GSM3301737	control	sex: male	4 weeks	nd	exp1	[wildtype]
GSM3301739	control	sex: female	4 weeks	nd	exp1	[wildtype]
GSM3301742	ko	sex: female	21 weeks	nd	exp2	[knockout]
GSM3301744	ko	sex: female	21 weeks	nd	exp2	[knockout]
GSM3301749	control	sex: female	21 weeks	hfd	exp2	[wildtype]
GSM3301750	control	sex: female	21 weeks	nd	exp2	[wildtype]
GSM3301753	control	sex: female	21 weeks	hfd	exp2	[wildtype]
GSM3301754	control	sex: female	21 weeks	nd	exp2	[wildtype]
GSM3301757	ko	sex: female	21 weeks	hfd	exp2	[knockout]
GSM3301758	ko	sex: female	21 weeks	hfd	exp2	[knockout]
GSM3301759	ko	sex: female	21 weeks	hfd	exp2	[knockout]
GSM3301760	ko	sex: female	21 weeks	hfd	exp2	[knockout]
GSM3301761	ko	sex: female	21 weeks	nd	exp2	[knockout]
GSM3301727	control	sex: male	4 weeks	nd	exp1	[wildtype]
GSM3301728	ko	sex: male	4 weeks	nd	exp1	[knockout]
GSM3301729	ko	sex: female	4 weeks	nd	exp1	[knockout]
GSM3301730	control	sex: female	4 weeks	nd	exp1	[wildtype]
GSM3301733	ko	sex: female	4 weeks	nd	exp1	[knockout]
GSM3301734	ko	sex: female	4 weeks	nd	exp1	[knockout]
GSM3301735	control	sex: female	4 weeks	nd	exp1	[wildtype]
GSM3301738	control	sex: male	4 weeks	nd	exp1	[wildtype]
GSM3301740	control	sex: female	4 weeks	nd	exp1	[wildtype]
GSM3301741	control	sex: female	4 weeks	nd	exp1	[wildtype]
GSM3301743	ko	sex: female	21 weeks	nd	exp2	[knockout]
GSM3301745	ko	sex: female	21 weeks	nd	exp2	[knockout]
GSM3301746	control	sex: female	21 weeks	hfd	exp2	[wildtype]
GSM3301747	control	sex: female	21 weeks	hfd	exp2	[wildtype]
GSM3301748	control	sex: female	21 weeks	hfd	exp2	[wildtype]
GSM3301751	control	sex: female	21 weeks	nd	exp2	[wildtype]
GSM3301752	control	sex: female	21 weeks	nd	exp2	[wildtype]
GSM3301755	control	sex: female	21 weeks	nd	exp2	[wildtype]
GSM3301756	ko	sex: female	21 weeks	hfd	exp2	[knockout]

# If you want to plot a sunburst on specific columns of interest, please use the following code:
import plotly.express as px
metadata = identify_cohorts(repo_key, dataset_id)
fig = px.sunburst(metadata, path=['column_1','column_2','column_n'])
fig.show()

Run Meta-Analysis

repo = "bulkrnaseq_staging_oa"
datasets = ['GSE144269_GPL24676_raw','GSE114564_GPL11154_raw','GSE77314_GPL9052_raw']
designformula = {'GSE144269_GPL24676_raw' : 
                 [{'tumor_non_tumor':'non-tumor'}, {'tumor_non_tumor':'tumor'}],
                'GSE114564_GPL11154_raw' : 
                [{'curated_disease':'[Normal]'}, {'curated_disease':'[Carcinoma, Hepatocellular]'}],
                'GSE77314_GPL9052_raw' : 
                [{'curated_control':'0'}, {'curated_control':'1'}]
                }
ws_id= 14164
analysis_name = 'HCC_test1'

analysis.run_meta_analysis(repo, ws_id, analysis_name, designformula)

from polly import jobs
job = jobs.jobs()
job.job_status('14164','cb4074a13e464f3282310be2f5c60845')

	Job ID	Job Name	Job State
0	cb4074a13e464f3282310be2f5c60845	Polly DIY pipeline: Meta Analysis	SUCCEEDED

job.job_status('14164','47ab24a74c644a6089d13de828106b94')

	Job ID	Job Name	Job State
0	553b1df3fcff41c4a91b15ecbb582633	Polly DIY pipeline: Meta Analysis	SUCCEEDED

job.job_logs('14164','cb4074a13e464f3282310be2f5c60845')

A new version of Polly CLI is available. To update, execute the command npm update -g @elucidatainc/pollycli
Success: Logged in to Polly as shreya.raghavendra@elucidata.io
================================== POLLY LOGIN SUCCESS =======================================
“=============================== GIT CLONING POLLY-PYTHON ===================================”
Cloning into 'polly-python-code'...
“================================== GIT CLONING SUCCESS =======================================”
A new version of Polly CLI is available. To update, execute the command npm update -g @elucidatainc/pollycli
polly://HCC_test1_cohorts.csv
./cohort.csv
Success: Download complete
==================================== EXECUTING JOBS ==========================================
downloading data file:GSE77314_GPL9052_raw.gct: 100%|██████████| 29.4M/29.4M [00:00<00:00, 69.9MiB/s]
downloading data file:GSE144269_GPL24676_raw.gct: 100%|██████████| 34.3M/34.3M [00:01<00:00, 33.7MiB/s]
downloading data file:GSE114564_GPL11154_raw.gct: 100%|██████████| 58.7M/58.7M [00:01<00:00, 32.8MiB/s]
[1] "reading read_design_formula_csv "
[1] "done reading read_design_formula_csv "
[1] "process_gct_files started ....."
[1] "processing GCT files..."
parsing as GCT v1.3
datasets/GSE114564_GPL11154_raw.gct 62548 rows, 106 cols, 8 row descriptors, 40 col descriptors
parsing as GCT v1.3
datasets/GSE144269_GPL24676_raw.gct 29565 rows, 140 cols, 1 row descriptors, 42 col descriptors
parsing as GCT v1.3
datasets/GSE77314_GPL9052_raw.gct 35831 rows, 100 cols, 2 row descriptors, 42 col descriptors
================================= EXECUTING JOBS SUCCESS =====================================
==================================== SYNC RESULTS TO POLLY - STARTING ==========================================
A new version of Polly CLI is available. To update, execute the command npm update -g @elucidatainc/pollycli
Success: Sync complete
A new version of Polly CLI is available. To update, execute the command npm update -g @elucidatainc/pollycli
Success: Sync complete
================================= SYNC RESULTS TO POLLY - SUCCESS =====================================

Tutorial Notebooks

Identifying Cohorts for Multi-Factor Experiments From data findability, identifying cohorts, to running Meta-analysis