Title: | Cell-Typing using Variance Adjusted Mahalanobis Distances with Multi-Labeling |
---|---|
Description: | Creates multi-label cell-types for single-cell RNA-sequencing data based on weighted VAM scoring of cell-type specific gene sets. Schiebout, Frost (2022) <https://psb.stanford.edu/psb-online/proceedings/psb22/schiebout.pdf>. |
Authors: | H. Robert Frost [aut], Courtney Schiebout [aut, cre] |
Maintainer: | Courtney Schiebout <[email protected]> |
License: | GPL (>= 2) |
Version: | 1.0.0 |
Built: | 2024-11-24 04:44:14 UTC |
Source: | https://github.com/cran/CAMML |
BuildGeneSets takes in an expression matrix, either as a Seurat Object or as a simple matrix (where cells are the columns, and genes the rows), and labels for each of the cells. EdgeR differential expression analysis is then run within the function and gene sets are built based on a log fold change (FC) cut-off (the default is 2). Cut-offs can also be set by FC and -log10(p-value). Of note, when more than one cut-off is given, genes must meet ALL criteria. Gene weights are recorded for each gene in the gene set as the log2(FC), FC, or -log10(p-value) (the default is log2(FC)). Each gene in the gene set is converted into its corresponding Ensembl ID, necessitating that users also provide the species of interest. Currently humans "Hs", mice "Mm", and zebrafish "Dr" are accepted (the default is humans).
BuildGeneSets(exp.data, labels = as.character(Idents(exp.data)), cutoff.type = "logfc", cutoff = 2, species = "Hs", weight.type = "logfc")
BuildGeneSets(exp.data, labels = as.character(Idents(exp.data)), cutoff.type = "logfc", cutoff = 2, species = "Hs", weight.type = "logfc")
exp.data |
A Seurat Object or expression matrix for the reference data that has previously been normalized and scaled. |
labels |
A vector of the cell type labels for each cell in the expression matrix. This must have a label for each cell and be in the order the cells appear as columns in the expression matrix. The default will be the Idents of the Seurat Object. |
cutoff.type |
One or more of the following: "logfc", "fc", and "-logp". This value will determine what value(s) genes must have to be included in a given gene set. "logfc" and "fc" cut-off genes based on their log fold change and fold change values respectively. "-logp" will cut-off genes based on the significance of their differential expression according to the -log10(p-value). When more than one cut-off is given, genes must meet ALL criteria. |
cutoff |
A number or vector of numbers that correspond to the cutoff.types. The value should be greater than 0 if cutoff.type is "logfc", greater than 1 if cutoff.type is "fc", and greater than 1 if cutoff.type is "-logp". The number of values given should match the number of cutoff types listed and should be in the same order as the cutoff types vector. |
species |
Either "Hs", "Mm", or "Dr" for human, mouse, and zebrafish respectively. Used to convert gene symbols into Ensembl IDs. |
weight.type |
Can be one of the following: "logfc","fc", or "-logp". The former two options will assign gene weights by their log2FC or FC respectively in differential expression analysis. The final option will assign weight by the negative log10 of the p-values for each gene from differential expression analysis. |
A data.frame
with the cell type, gene name, ensembl ID, and weight for each gene in each gene set.
edgeR,org.Hs.eg.db,org.Mm.eg.db
#Only run code if Seurat package is available if (requireNamespace("Seurat", quietly=TRUE) & requireNamespace("SeuratObject", quietly=TRUE)) { #See vignettes for more examples BuildGeneSets(exp.data=SeuratObject::pbmc_small, labels = c(rep(1,40),rep(2,40)), cutoff.type = "logfc", cutoff = 2, species = "Hs", weight.type = "logfc") }
#Only run code if Seurat package is available if (requireNamespace("Seurat", quietly=TRUE) & requireNamespace("SeuratObject", quietly=TRUE)) { #See vignettes for more examples BuildGeneSets(exp.data=SeuratObject::pbmc_small, labels = c(rep(1,40),rep(2,40)), cutoff.type = "logfc", cutoff = 2, species = "Hs", weight.type = "logfc") }
Multi-label cell-typing method for single-cell RNA-sequencing data. CAMML takes in cell type-specific gene sets, with weights for each gene, and builds weighted Variance-Adjusted Mahalanobis (VAM) scores for each of them. CAMML then outputs a Seurat Object with an assay for CAMML that has the weighted VAM score for each cell type in each cell. CAMML takes in several arguments: seurat
: a Seurat Object of the scRNA-seq data, gene.set.df
: a data frame with a row for each gene and the following required columns: "ensemb.id" and "cell.type" and optional columns of "gene.weight" and "gene.symbol".
CAMML(seurat, gene.set.df)
CAMML(seurat, gene.set.df)
seurat |
A Seurat Object that has previously been normalized and scaled. |
gene.set.df |
A list of lists of genes in each gene set, with each gene set list named for the cell type it represents. |
A SeuratObject
with a CAMML assay with the scores for each cell type in each cell. This will be in the form of a matrix with columns for each cell and rows for each cell type that was scored.
# Only run example code if Seurat package is available if (requireNamespace("Seurat", quietly=TRUE) & requireNamespace("SeuratObject", quietly=TRUE)) { # See vignettes for more examples seurat <- CAMML(seurat=SeuratObject::pbmc_small, gene.set.df = data.frame(cbind(ensembl.id = c("ENSG00000172005", "ENSG00000173114","ENSG00000139187"), cell.type = c("T cell","T cell","T cell")))) seurat@assays$CAMML@data }
# Only run example code if Seurat package is available if (requireNamespace("Seurat", quietly=TRUE) & requireNamespace("SeuratObject", quietly=TRUE)) { # See vignettes for more examples seurat <- CAMML(seurat=SeuratObject::pbmc_small, gene.set.df = data.frame(cbind(ensembl.id = c("ENSG00000172005", "ENSG00000173114","ENSG00000139187"), cell.type = c("T cell","T cell","T cell")))) seurat@assays$CAMML@data }
ChIMP takes in the output of CAMML and a list of the CITE-seq markers designated for each cell type. For each marker, a k=2 means clustering will be applied to discretize their presence, resulting in a 0 in cells where the marker expression is in the lower value cluster and a 1 in cells where the marker expression is in the higher value cluster. Additionally, if a quantile cutoff is desired instead, this method can be designated and a cutoff can be given (the default is .5). These discretized scores are then multiplied by the CAMML score for each cell type in each cell. The function also takes in a vector of booleans the length of the number of cell types being evaluated that designates whether each cell type is required to have all markers score 1 or any marker score a 1 in order for the CAMML score to be maintained. If the boolean is true, ChIMP will weight CAMML by the maximum marker score for each cell type. For example, if both CD4 and CD8 are listed markers for T cells and either marker scoring a 1 is sufficient, the boolean will be true. If it is false, all markers designated for a cell type need to be in the higher value cluster for a given cell. ChIMP can also use the absence of a CITE-seq marker as support for a cell type by designating it "FALSE" with the greater argument. For example, if one is looking to identify non-immune cell types, CD45 can be used with greater = FALSE to support cell-type scores for a non-immune cell type.
ChIMP(seurat, citelist, method = "k", cutoff = .5, anyMP = rep(T, length(rownames(seurat))), greater = rep(T, length(unlist(citelist))))
ChIMP(seurat, citelist, method = "k", cutoff = .5, anyMP = rep(T, length(rownames(seurat))), greater = rep(T, length(unlist(citelist))))
seurat |
A Seurat Object that has previously been run on CAMML. |
citelist |
A list of all the surface markers for each cell type, named by their cell type. |
method |
Either a "k" or a "q" to designate the desired method. "k" will use a k=2 k-means clustering method for discretization. "q" will use a quantile cutoff method. |
cutoff |
A value between 0 and 1 designating the cutoff to be used if the quantile method is selected. |
anyMP |
A vector of booleans regarding whether the CITE-seq weighting will take any positive marker protein score (TRUE) or requires all positive marker scores (FALSE) |
greater |
A vector of booleans for every CITE-seq marker designating whether to evaluate it as present (TRUE) in a cell type or absent (FALSE) in a cell type. |
A SeuratObject
with a ChIMP assay with the scores for each cell type in each cell, weighted by their CITE-seq score. This will be in the form of a matrix with columns for each cell and rows for each cell type that was scored.
# Only run example code if Seurat package is available if (requireNamespace("Seurat", quietly=TRUE) && requireNamespace("SeuratObject", quietly=TRUE)) { # See vignettes for more examples seurat <- CAMML(seurat=SeuratObject::pbmc_small, gene.set.df = data.frame(cbind(ensembl.id = c("ENSG00000172005", "ENSG00000173114","ENSG00000139187"), cell.type = c("T cell","T cell","T cell")))) cite <- matrix(c(rnorm(40), rnorm(40,2,1)), nrow = length(rownames(seurat@assays$CAMML)), ncol = length(colnames(seurat@assays$CAMML))) rownames(cite) <- "marker" colnames(cite) <- colnames(seurat) assay <- SeuratObject::CreateAssayObject(counts = cite) seurat[["ADT"]] <- assay citelist <- list() citelist[[1]] = "marker" names(citelist) = "T cell" seurat <- ChIMP(seurat, citelist) seurat@assays$ChIMP@data }
# Only run example code if Seurat package is available if (requireNamespace("Seurat", quietly=TRUE) && requireNamespace("SeuratObject", quietly=TRUE)) { # See vignettes for more examples seurat <- CAMML(seurat=SeuratObject::pbmc_small, gene.set.df = data.frame(cbind(ensembl.id = c("ENSG00000172005", "ENSG00000173114","ENSG00000139187"), cell.type = c("T cell","T cell","T cell")))) cite <- matrix(c(rnorm(40), rnorm(40,2,1)), nrow = length(rownames(seurat@assays$CAMML)), ncol = length(colnames(seurat@assays$CAMML))) rownames(cite) <- "marker" colnames(cite) <- colnames(seurat) assay <- SeuratObject::CreateAssayObject(counts = cite) seurat[["ADT"]] <- assay citelist <- list() citelist[[1]] = "marker" names(citelist) = "T cell" seurat <- ChIMP(seurat, citelist) seurat@assays$ChIMP@data }
This function takes in the Seurat Object output from the CAMML function and returns one of four labelling options. "top1" will return the top cell type for each cell. "top2" will return the top two highest scoring cell types for each cell. "top10p" will return the top scoring cell type and all other cell types with 10% of that score for each cell. "2xmean" will return all cell types with scores greater than twice the mean of all scores for a given cell.
GetCAMMLLabels(seurat, labels = "top1")
GetCAMMLLabels(seurat, labels = "top1")
seurat |
A Seurat Object with a CAMML assay with weighted VAM scores for each cell type in each query cell. This is the output from the CAMML function. |
labels |
One of the following: "top1", "top2", "top10p", or "top2xmean". "top1" will return the single-label for the top-scoring cell type for each cell. "top2" will return the labels for the two top-scoring cell types for each cell. "top10p" will return the top scoring cell type and any other cell types with scores within 10% of the top score for each cell. "top2xmean" will return any cell types with scores two times the average of all cell type scores for each cell. |
A list with the labels designated by the "labels" argument.
# Only run example code if Seurat and CAMML packages are available if (requireNamespace("Seurat", quietly=TRUE) & requireNamespace("SeuratObject", quietly=TRUE) & requireNamespace("CAMML", quietly=TRUE)) { # See vignettes for more examples seurat <- CAMML(seurat=SeuratObject::pbmc_small, gene.set.df=data.frame(cbind(ensembl.id = c("ENSG00000172005", "ENSG00000173114","ENSG00000139187"), cell.type = c("T cell","T cell","T cell")))) GetCAMMLLabels(seurat, labels = "top1") }
# Only run example code if Seurat and CAMML packages are available if (requireNamespace("Seurat", quietly=TRUE) & requireNamespace("SeuratObject", quietly=TRUE) & requireNamespace("CAMML", quietly=TRUE)) { # See vignettes for more examples seurat <- CAMML(seurat=SeuratObject::pbmc_small, gene.set.df=data.frame(cbind(ensembl.id = c("ENSG00000172005", "ENSG00000173114","ENSG00000139187"), cell.type = c("T cell","T cell","T cell")))) GetCAMMLLabels(seurat, labels = "top1") }
GetGeneSets takes in a one of the following: "immune.cells","skin.immune.cells","T.subset.cells", or "mouse.cells" and a Seurat Object that will be cell-typed using CAMML. The function will then build a gene.set.collection and a list of gene.weights based on one of the pre-built gene sets.
GetGeneSets(data = "immune.cells")
GetGeneSets(data = "immune.cells")
data |
One of the following: "immune.cells","skin.immune.cells","T.subset.cells", or "mouse.cells".
All datasets were built using differential expression of data in the package celldex using the package EdgeR. |
A data.frame
with the cell type, gene name, ensembl ID, and weight for each gene in each gene set.
GetGeneSets("immune.cells")
GetGeneSets("immune.cells")