PRECAST: Human Breast Cancer Data Analysis

Wei Liu

2024-01-24

This vignette introduces the PRECAST workflow for the analysis of integrating multiple spatial transcriptomics datasets. The workflow consists of three steps

We demonstrate the use of PRECAST to two sliced human breast cancer Visium data that are here, which can be downloaded to the current working path by the following command:

githubURL <- "https://github.com/feiyoung/PRECAST/blob/main/vignettes_data/bc2.rda?raw=true"
download.file(githubURL, "bc2.rda", mode = "wb")

Then load to R

load("bc2.rda")

Download data from 10X: another method to access data This data is also available at 10X genomics data website:

Users require the two folders for each dataset: spatial and filtered_feature_bc_matrix. Then the data can be read by the following commond.

dir.file <- "Section"  ## the folders Section1 and Section2, and each includes two folders spatial and filtered_feature_bc_matrix
seuList <- list()
for (r in 1:2) {
    message("r = ", r)
    seuList[[r]] <- DR.SC::read10XVisium(paste0(dir.file, r))
}
bc2 <- seuList

The package can be loaded with the command:

library(PRECAST)
library(Seurat)

View human breast cancer Visium data from DataPRECAST

bc2  ## a list including two Seurat object

Check the content in bc2

head(bc2[[1]])

Create a PRECASTObject object

We show how to create a PRECASTObject object step by step. First, we create a Seurat list object using the count matrix and meta data of each data batch. Although bc2 is a prepared Seurat list object, we re-create it to show the details of the Seurat list object. At the same time, check the meta data that must include the spatial coordinates named “row” and “col”, respectively. If the names are not, they are required to rename them.

## Get the gene-by-spot read count matrices countList <- lapply(bc2, function(x)
## x[['RNA']]@counts)
countList <- lapply(bc2, function(x) {
    assay <- DefaultAssay(x)
    GetAssayData(x, assay = assay, slot = "counts")

})

M <- length(countList)
## Get the meta data of each spot for each data batch
metadataList <- lapply(bc2, function(x) x@meta.data)

for (r in 1:M) {
    meta_data <- metadataList[[r]]
    all(c("row", "col") %in% colnames(meta_data))  ## the names are correct!
    head(meta_data[, c("row", "col")])
}


## ensure the row.names of metadata in metaList are the same as that of colnames count matrix
## in countList

for (r in 1:M) {
    row.names(metadataList[[r]]) <- colnames(countList[[r]])
}


## Create the Seurat list object

seuList <- list()
for (r in 1:M) {
    seuList[[r]] <- CreateSeuratObject(counts = countList[[r]], meta.data = metadataList[[r]], project = "BreastCancerPRECAST")
}

bc2 <- seuList
rm(seuList)
head(meta_data[, c("row", "col")])

Prepare the PRECASTObject with preprocessing step.

Next, we use CreatePRECASTObject() to create a PRECASTObject based on the Seurat list object bc2. This function will do three things:

If the argument customGenelist is not NULL, then this function only does (3) not (1) and (2). User can retain the raw seurat list object by setting rawData.preserve = TRUE.

Add the model setting

Add adjacency matrix list and parameter setting of PRECAST. More model setting parameters can be found in .

Fit PRECAST using this data

Fit PRECAST

For function PRECAST, users can specify the number of clusters \(K\) or set K to be an integer vector by using modified BIC(MBIC) to determine \(K\). First, we try using user-specified number of clusters. For convenience, we give the selected number of clusters by MBIC (K=14).

Select a best model if \(K\) is an integer vector. Even if \(K\) is a scalar, this step is also neccessary to re-organize the results in PRECASTObj object.

Integrate the two samples using the IntegrateSpaData function. For computational efficiency, this function exclusively integrates the variable genes. Specifically, in cases where users do not specify the PRECASTObj@seuList or seuList argument within the IntegrateSpaData function, it automatically focuses on integrating only the variable genes. The default setting for PRECASTObj@seuList is NULL when rawData.preserve in CreatePRECASTObject is set to FALSE. For instance:

Integrating all genes There are two ways to use IntegrateSpaData integrating all genes, which will require more memory. We recommand running for all genes on server. The first one is to set value for PRECASTObj@seuList.

The second method is to set a value for the argument seuList:

First, user can choose a beautiful color schema using chooseColors().

Show the spatial scatter plot for clusters

Users can re-plot the above figures for specific need by returning a ggplot list object. For example, we plot the spatial heatmap using a common legend.

Show the spatial UMAP/tNSE RGB plot to illustrate the performance in extracting features.

Show the tSNE plot based on the extracted features from PRECAST to check the performance of integration.

Combined differential expression analysis

Plot DE genes’ heatmap for each spatial domain identified by PRECAST.

Session Info