Cellxgene
What is Cellxgene?
Cellxgene was developed by the Chan Zuckerberg Initiative (CZI) in collaboration with the open-source community. The cellxgene project was started in 2018 as an open-source project under the auspices of CZI, and it is currently maintained and developed by CZI and the wider scientific community. Detailed documentation of the Cellxgene application can be found here.
When should Cellxgene be used?
- To examine categorical metadata: Categorical metadata (such as tissue of origin or cell type) can be visualized and examined in a number of ways within cellxgene such as coloring embedding plots (i.e. color UMAP by cell type), looking at cell counts, making selections of cells or viewing the interaction between different categorical metadata fields.
- To find cells where a gene is expressed: Numerical metadata about gene expression features or the number of genes can be examined on the embedding plot and be used to filter and select cells.
- To select and subset cells: Cells in the embedding plot can be selected based on the gene expression cutoffs, and categorical metadata attributes.
- To compare the expression of multiple genes: Cellxgene allows the user to compare the expression of multiple genes via bivariate plots.
- To use gene sets to learn about cell population functional characteristics: cellxgene allows users to examine groups of genes via the gene sets feature.
- To find Marker Genes: Cellxgene allows the user to find marker genes between selected cell populations.
Cellxgene user journey through Polly OmixAtlas
Select single cell RNAseq OmixAtlas on the OmixAtlas homepage after logging into Polly.
Find relevant datasets
Users can find relevant datasets using the powerful search bar on the OmixAtlas homepage. Salient features of the search bar are -
- The search bar is driven by Elasticsearch which allows users to search with keywords and long queries.
- The keywords are present across source metadata (title, description, and study design) and curated metadata (cell type, cell line, tissue, drug, etc.).
- It allows fuzzy search, for example, if users search for 'transcriptomics', it will show results for 'transcript' and 'transcriptome' as well.
- There are operators such as 'exact', 'and', 'or', 'not', and 'group' for better search.
Users can filter the datasets using configurable filters beside the search bar. A detailed description of how to find datasets can be found [here]
Starting Cellxgene
Step 1 - Selection of dataset of interest
Select the dataset of interest, click on 'Options', and then click on 'Analyze'.
Step 2 - Open the Cellxgene application
After clicking 'Analyze', a side window opens up. Select 'Cellxgene' application and fill in the details about the name and workspace where the analysis is to be saved.
Cellxgene interface
The cellxgene interface is divided into 3 sections -
- Left-hand sidebar contains categorical metadata and fields curated for each dataset ingested in OmixAtlas.
- Center panel contains an embedding plot where each dot represents a cell. Cellxgene also allows users to choose different embeddings based on their needs.
- Right-hand sidebar contains numerical metadata (QC metadata) and information about genes and gene sets. The gene plots on this sidebar automatically appear for each curated dataset opened through Polly OmixAtlas.
Examining categorical metadata
Categorical data can be examined in multiple ways -
- Colour embedding plots - The plot can be colored by different metadata. The plot can be colored by clicking on the drop icon next to metadata of interest. Color code is displayed next to the cell count of the chosen metadata.
- Cell counts - Each metadata category will have the cell counts along with the color code.
- Making the selection of cells - Upon hovering over the metadata on the left side, users can see the type of cells highlighted in the embedding plot.
Users can click on the 'display categories' icon to display the labels over the cluster centroid.
Users can select/deselect the cells by choosing the checkbox next to the value in the categorical metadata field. The unselected cells will have a smaller point size on the embedding plot.
Users can use the checkmark on the parent metadata to deselect all the cells and then select the cells of interest, this makes it easier to highlight the cells that pertain to a specific tissue.
- Viewing the relationship between different categorical fields - After users color by a particular metadata category, for example, 'Cell type', users can see the distribution of the cell type in any other category by expanding that categorical metadata field, for example, 'tissue'. In the plot below, cell types belonging to a particular tissue are clearly shown with the colors in the bar.
Finding cells where gene is expressed
Numerical metadata on the embedding plot can be examined and used to filter and select cells. The numerical data is present on the right-hand sidebar and users can click on the droplet icon to color the plot by qc metrics (for example, n_genes_by_counts - number of genes that have been detected in the cell).
To deal with outliers, users can clip the data by clicking on the 'clip' icon on top of the toolbar. For example, here we have set the values to 0 to 99 percentile and we can observe the change in the scale, graph, and embedding plot.
To find the gene expression pattern for a particular gene, type the gene name in the search bar.
For example, if we search for STAT3, a graph will appear on the top right side that contains
- Gene name
- A histogram depicting the gene expression
- Remove icon
- X-plot/Y-plot for bivariate plots
- Droplet icon to color the cells based on the gene expression
Users can use the clipping tool to remove the outliers to understand the gene expression better.
Selecting and subsetting cells
Cellxgene allows a user to select, subset, and filter cells based on gene expression cutoffs and categorical metadata attributes. There are multiple ways to do this. Here we will subset a population of 'B- cells' in different ways described below -
- Lasso selection - Select the lasso tool and draw a lasso around the B-cell cluster. Then click on the subset icon to create the subset based on lasso selection. All other cells will disappear from the embedding plot. To bring back all the cells, click on the full dataset icon next to the subset icon.
- Categorical selection - Select the cell type of interest from the categorical metadata on the left-hand side bar and click on the subset icon.
Comparing the expression of multiple genes
Cellxgene Explorer allows users to compare the expression of multiple genes via bivariate plots. Here, for example, we have chosen two genes - CD8A (specific to T cells) and CD14 (specific to monocytes/macrophages). First, search the genes and create a subset of associated cell types.
Then, using the clipping tool, remove the outliers. When users color the embedding plot, gene expression would be evident in the associated cell type.
Users can create a bivariate plot by plotting one gene on the x-axis and another on the y-axis.
Finding marker genes
Cellxgene can be used to find marker genes for a cell type. Here we will compare the expression of genes in T cells versus B cells. Select T and B cells on the categorical metadata - cell type and create a subset of these populations.
Using the lasso tool, select the cells and set the population 1 on the toolbar. Similarly, define the second population.
Note: Cellxgene can only analyze 50000 cells so each population should not exceed this number.
Click on the differential expression icon in the toolbar and a list of genes differentially expressed will appear in the right-hand sidebar.
Click on the droplet icon to observe the gene expression in the two populations of cells.