Skip to content

Atlas

How do Atlas work?

An atlas is a collection of tables. Each table contains data around a particular clinical factor or other criteria. There can be multiple tables inside of an Atlas

You can access it through the Atlas class.

from polly.auth import Polly
from polly.atlas import Atlas, Table, Column

Polly.auth("<access_key>")

atlas = Atlas(atlas_id="atlas_1")

Here's how you can list all your atlases.

Atlas.list_atlases()

# [
#   Atlas(atlas_id=test_polly_atlas),
#   Atlas(atlas_id=data_model_demo),
#   Atlas(atlas_id=test_atlas)
# ]

Here's how you create a new atlas.

Atlas.create_atlas(atlas_id='my_atlas', atlas_name="My Atlas")

# Atlas(atlas_id=my_atlas)

Here's how you list the tables inside the atlas.

atlas.list_tables()

#[
# Table(
#  name='gene_table', 
#  columns=[
#   Column(name='gene', col_type='string', constraint='PRIMARY KEY'),
#   Column(name='basemean', col_type='float', constraint='None'),
#   Column(name='log2foldchange', col_type='float', constraint='None'),
#   Column(name='lfcse', col_type='float', constraint='None'),
#   Column(name='stat', col_type='float', constraint='None'),
#   Column(name='pvalue', col_type='float', constraint='None'),
#   Column(name='padj', col_type='float', constraint='None'), 
#   Column(name='negative_log10_padj', col_type='float', constraint='None'),
#   Column(name='data_type', col_type='string', constraint='None'),
#   Column(name='dataset_id', col_type='string', constraint='None')
#  ]
# ),
# Table(
#  name='patient',
#  columns=[
#    Column(name='curated_patient_id', col_type='string', constraint='PRIMARY KEY'),
#    Column(name='alcohol_history', col_type='boolean', constraint='None'),
#    Column(name='alcohol_intensity', col_type='string', constraint='None'),
#    Column(name='tobacco_smoking', col_type='integer', constraint='None')
#  ]
# )
#]

Each table is uniquely identified by an atlas_id and name and has some other attributes associated with it.

table = atlas.get_table(atlas_id='atlas_1', name='patient')
print(table)

#Table(
# name='patient',
# columns=[
#   Column(name='curated_patient_id', col_type='string', constraint='PRIMARY KEY'), 
#   Column(name='alcohol_history', col_type='boolean', constraint='None'),
#   Column(name='alcohol_intensity', col_type='string', constraint='None'), 
#   Column(name='tobacco_smoking', col_type='integer', constraint='None')
# ]
#)

One can use the query function to query any table from the atlas.

atlas.query("SELECT * FROM patient LIMIT 10;")
#                              alcohol_history alcohol_intensity  tobacco_smoking
# curated_patient_id                                                             
# patient_1718265078961285000            False              None                4
# patient_1718265078961325000             True          Moderate                2
# patient_1718265078961336000            False          Moderate                2
# patient_1718265078961344000            False               Low                9
# patient_1718265078961350000            False               Low                4
# patient_1718265078961358000            False               Low                7
# patient_1718265078961364000            False              None                0
# patient_1718265078961371000             True               Low                1
# patient_1718265078961377000             True              None                7
# patient_1718265078961385000             True          Moderate                4

To add a new table to an atlas

columns = [
    Column(name="curated_patient_id", col_type="string", constraint="PRIMARY KEY"),
    Column(name="alcohol_history", col_type="boolean", constraint=None),
    Column(name="alcohol_intensity", col_type="string"),
    Column(name="tobacco_smoking", col_type="integer"),
]
new_table = atlas.create_table(table_name="patient_exposure", columns=columns)
print(new_table)

# Table(
#   name='patient_exposure', 
#   columns = [
#     Column(name="curated_patient_id", col_type="string", constraint="PRIMARY KEY"),
#     Column(name="alcohol_history", col_type="boolean", constraint=None),
#     Column(name="alcohol_intensity", col_type="string"),
#     Column(name="tobacco_smoking", col_type="integer"),
#   ]     
# )

To add a table from a dataframe into an Atlas. Make sure to set index for dataframe which is taken as the primary key for the table

#Data Cleaning
df = pd.read_csv('path/to/exposure_file.csv')
df.set_index("curated_patient_id", inplace=True)

#curated_patient_id column will be considered as primary key fof the table `exposure`
atlas.create_table_from_df(table_name="exposure", df=df)

Here's how you can delete an existing table from an atlas

atlas.delete_table(table_name="my_table")

How do Tables work?

A table is a collection of user data. The table represents a database table and stores the user's data.

You can access it through the Table class.

from polly.auth import Polly
from polly.atlas import Atlas, Table

Polly.auth("<access_key>")

exposure_table = Table(atlas_id="atlas_1", name="patient_exposure")
print(exposure_table)

# Table(
#   name='patient_exposure', 
#   columns = [
#     Column(name="curated_patient_id", col_type="string", constraint="PRIMARY KEY"),
#     Column(name="alcohol_history", col_type="boolean", constraint=None),
#     Column(name="alcohol_intensity", col_type="string"),
#     Column(name="tobacco_smoking", col_type="integer"),
#   ]     
# )

To view first 5 rows of the table

df=exposure_table.head()
print(df)

# curated_patient_id  alcohol_history alcohol_intensity  tobacco_smoking
#              P0031             True              High               10
#              P0032            False              None                0
#              P0033             True          Moderate                5
#              P0034             True               Low                2
#              P0035            False              None                0

To add a new column to the table

bmi_column = exposure_table.add_column(Column(name="bmi", col_type="integer"))
print(bmi_column)

# Column(name='bmi', col_type='integer', constraint='NONE')

To delete an existing column

my_table.delete_column(column_name="bmi")

To iterate over the rows of the table. It iterates in batches of 500 records

for page in my_table.iter_rows():
  for record in page:
    print(record)

#{'curated_patient_id': 'P0031', 'alcohol_history': True, 'alcohol_intensity': 'High', 'tobacco_smoking': 10, 'bmi': None}
#{'curated_patient_id': 'P0032', 'alcohol_history': False, 'alcohol_intensity': 'None', 'tobacco_smoking': 0, 'bmi': None}
#{'curated_patient_id': 'P0033', 'alcohol_history': True, 'alcohol_intensity': 'Moderate', 'tobacco_smoking': 5, 'bmi': None}
#{'curated_patient_id': 'P0034', 'alcohol_history': True, 'alcohol_intensity': 'Low', 'tobacco_smoking': 2, 'bmi': None}
#{'patient_id': 'P0035', 'alcohol_history': False, 'alcohol_intensity': 'None', 'tobacco_smoking': 0, 'bmi': None}

To load the entire table data into a dataframe

df = exposure.to_df()

# curated_patient_id  alcohol_history alcohol_intensity  tobacco_smoking   bmi
#              P0031             True              High               10  None
#              P0032            False              None                3  None
#              P0033             True          Moderate                5  None
#              P0034             True               Low                2  None
#              P0035            False              None                1  None
#              P0036            True               None                2  None
#              P0037            False              None                0  None

API Reference

Atlas

Attributes:

Name Type Description
atlas_id str

Atlas ID

__init__

Initializes the internal data Atlas with a given Atlas ID

Parameters:

Name Type Description Default
atlas_id str

The identifier for the Atlas

required

Examples:

>>> atlas = Atlas(atlas_id='atlas_1')

get_name

Retrieves the name of the Atlas using the Atlas ID

Returns:

Type Description
str

The name of the Atlas as a string

Examples:

>>> atlas = Atlas(atlas_id='atlas_1')
>>> atlas.get_name()
'My Atlas'

list_tables

Retrieves the list of tables associated with an Atlas.

Returns:

Type Description
List[Table]

A list of Table objects representing the tables associated with an Atlas.

Examples:

>>> atlas = Atlas(atlas_id='atlas_1')
>>> tables = atlas.list_tables()

get_table

Retrieves a specific table object by name.

Parameters:

Name Type Description Default
table_name str

The name of the table to retrieve.

required

Returns:

Type Description
Table

The Table object representing the specified table.

Notes

It loads the table object and not the table data. Use to_df() function to do so.

Examples:

>>> atlas = Atlas(atlas_id='1234')
>>> table = atlas.get_table(table_name='my_table')

create_table

Creates a new table with the specified name and columns.

Parameters:

Name Type Description Default
table_name str

The name of the new table to create.

required
columns List[Column]

A list of Column objects representing the columns of the new table.

required
rows list

A list of key-value pairs representing the table data.

None

Returns:

Type Description
Table

The newly created Table object.

Examples:

>>> atlas = Atlas(atlas_id='my_atlas')
>>> columns = [
>>>    Column(name='patient_id', col_type='integer', constraint='PRIMARY KEY'),
>>>    Column(name='patient_ name', col_type='string')
>>> ]
>>> patient_table = atlas.create_table(table_name='patient', columns=columns)

create_table_from_df

Creates a new table with the specified table name and schema derived from the Pandas DataFrame.

Optionally loads the data into the table.

Raises Validation error if the datatype is not supported.

Supported column types are [int, float, bool, object]

Parameters:

Name Type Description Default
table_name str

The name of the new table to create.

required
df DataFrame

A Pandas DataFrame representing the data and schema for the new table

required

Returns:

Type Description
Table

The newly created table object showing first 5 rows from the table.

Examples:

>>> atlas = Atlas(atlas_id='my_atlas')
>>> data = {'patient_id': ["P0031", "P0032"], 'patient_age': ['Sam', 'Ron']}
>>> df = pd.DataFrame(data)
>>> df.set_index('patient_id', inplace=True)
>>> new_table = atlas.create_table_from_df(table_name='patient', df=df)

delete_table

Deletes the table from the atlas.

Parameters:

Name Type Description Default
table_name str

The name of the table to delete.

required

Examples:

>>> atlas = Atlas(atlas_id='atlas_1')
>>> atlas.delete_table(table_name='patient')

query

Executes a query on the Atlas tables.

Parameters:

Name Type Description Default
query str

The SQL query to execute.

required

Returns:

Type Description
Union[DataFrame, List[Dict]]

The result of the query execution.

Examples:

>>> atlas = Atlas(atlas_id='atlas_1')
>>> result = atlas.query(query='SELECT * FROM patient;')

Table

Attributes:

Name Type Description
atlas_id str

The unique identifier for the Atlas

name str

The name of the table

columns List[Column]

List of columns in the table

__init__

Initializes an instance of a Table with the unique identifier atlas_id, table_name and optional list of columns

Parameters:

Name Type Description Default
atlas_id str

The unique identifier for the Atlas.

required
name str

The name of the table to be initialized.

required
columns List[Column]

List of column objects representing the columns in the table.

None

Examples:

>>> table = Table(atlas_id='1234', name='my_table')

list_columns

Retrieve the list of columns associated with the table.

Returns:

Type Description
List[Column]

A list of Column objects representing the columns in the table.

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> columns = patient_table.list_columns()

get_column

Retrieves a specific column from the table based on its name.

Parameters:

Name Type Description Default
column_name str

The name of the column to retrieve.

required

Returns:

Type Description
Column

The Column object representing the specified column.

Raises:

Type Description
ValueError

If no column with the specified name is found in the table.

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> column = table.get_column(column_name='patient_id')

add_column

Adds a new column to the table.

Parameters:

Name Type Description Default
column Column

The Column object representing the column to add.

required

Returns:

Type Description
Column

The Column object that was added to the table.

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> new_column = Column(name='patient_age', col_type='int', constraint="PRIMARY KEY")
>>> added_column = patient_table.add_column(column=new_column)

delete_column

Deletes a column from the table based on its name.

Parameters:

Name Type Description Default
column_name str

The name of the column to be deleted

required

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> patient_table.delete_column(column_name='patient_age')

add_rows

Adds new rows to the tabl

Parameters:

Name Type Description Default
rows List[dict]

A list of key-value pairs representing rows to be added.

required

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> rows = [
>>>     {"patient_id": "P0311", "patient_age": 23},
>>>     {"patient_id": "P0312", "patient_age": 24},
>>> ]
>>> patient_table.add_rows(rows)

delete_rows

Deletes rows from the table based on the column value

Parameters:

Name Type Description Default
rows List[dict]

A list of key-value pairs representing rows to delete, where the key is the primary key column name and value is the corresponding entry.

required

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> rows = [
>>>     {'patient_id': 'P0311'},
>>>     {'patient_id': 'P0322'}
>>> ]
>>> patient_table.delete_rows(rows=rows)

update_rows

Updates rows in the table based on provided row data.

Parameters:

Name Type Description Default
rows List[dict]

A list of dictionaries representing the rows to update.

required

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> rows = [
>>>    {"patient_id": "P0311", "patient_age": 23},
>>>    {"patient_id": "P0322", "patient_age": 24},
>>> ]
>>> patient_table.update_rows(rows=rows)

head

Retrieves the first five rows of the table as a Pandas DataFrame.

Returns:

Type Description
DataFrame

A Pandas DataFrame containing the first five rows of the table.

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> head_df = patient_table.head()

iter_rows

Iterates over the rows of the table in a paginated manner.

Yields:

Type Description
List[Dict[str, Any]]

A list of dictionaries representing rows of the table, with column names as keys and corresponding values.

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> for page_rows in patient_table.iter_rows():
>>>     for row in page_rows:

to_df

Returns the complete table as a Pandas DataFrame.

Returns:

Type Description
DataFrame

A Pandas DataFrame containing the data from the table.

Examples:

>>> patient_table = Table(atlas_id='atlas_1', name='patient')
>>> df = patient_table.to_df()

Column

Attributes:

Name Type Description
name

The name of the column

col_type

The type of the column

constraint

The constraint on the column (optional). Can be one of ["PRIMARY KEY", None].

__init__

Initializes a Column instance with a given name, type, and optional constraint.

Parameters:

Name Type Description Default
name str

The name of the column

required
col_type str

The type of the column

required
constraint Optional[str]

The constraint on the column. If not provided, it will be set to None.

None

Examples:

>>> column = Column(name='patient_id', col_type='string', constraint='PRIMARY_KEY')