Spatial domain detection in DLPFC dataset

We present our evaluation source code of DLPFC slice 151507 in this section.
This tutorial will show you how to obtain DLPFC data from st_datasets and how to train stCluster.
If you have more than one GPU in your device, you may need to set the GPU you want, or it will run at GPU:0 by default.

Data Preparation

First, we obtain the sc.Anndata format DLPFC data from st_datasets.

[1]:
import scanpy as sc
from st_datasets.dataset import get_data, get_dlpfc_data

adata, n_cluster = get_data(dataset_func=get_dlpfc_data, id='151507')
sc.pl.spatial(adata, color=['cluster'], title=['Ground truth'])
>>> INFO: Download dataset: 100%|██████████| 98.8M/98.8M [00:05<00:00, 18.1MB/s]
>>> INFO: dataset name: dorsolateral prefrontal cortex (DLPFC), slice: 151507, size: (4226, 33538), cluster: 7.(6.832s)
_images/section1_3_2.png

Train stCluster and clustering by the latent representation

Then, we can use the adata to generate latent representation by stCluster and evaluate the clustering performance.

[2]:
from stCluster.train import train
from stCluster.run import evaluate_embedding

adata, _ = train(adata, radius=150, ae_rate=0.8, adj_rate=0.2, pred_rate=0.3, seed=0)
adata, score = evaluate_embedding(adata=adata, n_cluster=n_cluster, cluster_method=['mclust'], cluster_score_method='ARI')

# # or we can easily learn representation and evaluate clustering result by `train_and_evaluate`
# from stCluster.run import train_and_evaluate
# adata, score = train_and_evaluate(adata, radius=150, ae_rate=0.8, adj_rate=0.2, pred_rate=0.3, seed=0, n_cluster=n_cluster, cluster_method=['mclust'], cluster_score_method='ARI')

sc.pl.spatial(adata, color=['cluster', 'mclust'], title=['Ground truth', 'stCluster (ARI={:.3f})'.format(score['mclust'])])
>>> INFO: Input size torch.Size([4226, 3000]).
>>> INFO: Graph contains 28996 edges, average 6.861 edges per node.
>>> INFO: Build graph success!
>>> INFO: Finish generate precluster embedding!
>>> INFO: Finish pre-cluster, result image is saved at "None", begin to prune graph.
>>> INFO: Finish pruning graph, result image is saved at "None".
>>> INFO: Graph contains 124500 edges, average 29.460 edges per node.
>>> INFO: Build graph success!
>>> INFO: Finish model preparations, begin to train model, input data size: (4226, 3000).
>>> INFO: Training: 100%|██████████| 1000/1000 [00:16<00:00, 60.09it/s]
R[write to console]:                    __           __
   ____ ___  _____/ /_  _______/ /_
  / __ `__ \/ ___/ / / / / ___/ __/
 / / / / / / /__/ / /_/ (__  ) /_
/_/ /_/ /_/\___/_/\__,_/____/\__/   version 5.4.10
Type 'citation("mclust")' for citing this R package in publications.

>>> INFO: Finish embedding process, total time: 25.643s.
fitting ...
  |======================================================================| 100%
_images/section1_5_3.png

Due to the DLPFC dataset containing two type of clusters number which are clusters numbers of 7 and 5, stCluster offers hyperparameter configurations for each of these two categories. The hyperparameter settings are as follows:

clusters number

cutting_prob_1

cutting_prob_2

ae_rate

adj_rate

pred_rate

seed

7

0.05

0.1

0.8

0.2

0.3

0

5

1.0

1.0

0.9

0.4

0.2

0

Visualization

Moreover, we visualized the latent representation by UMAP and show the trajectory inference by PAGA algorithm.

[3]:
adata = adata[adata.obs['cluster']!='nan', :]

sc.pp.neighbors(adata, use_rep='embedding')
sc.tl.umap(adata)
sc.tl.paga(adata, groups='cluster')

sc.pl.paga_compare(adata, title='Visualization in 151507', legend_fontsize=15, legend_fontoutline=3, size=50, frameon=True)
_images/section1_8_0.png