Generate sc.AnnData by the gene expression file and the spatial coordination

stCluster requires the input as a sc.AnnData. In this section, we will introduce how to generate a sc.AnnData by the csv files.

Data generation

First, we save the DLPFC 151507 slice’s gene expression, spatial coordination, and metadata of spots to csv. To simplify this process, we only save 300 HVGs for each spot.

[1]:
import scanpy as sc
import pandas as pd
from st_datasets.dataset import get_data, get_dlpfc_data

adata, n_cluster = get_data(dataset_func=get_dlpfc_data, id='151507', top_genes=300)
adata = adata[:, adata.var.highly_variable]
gene_expression = pd.DataFrame(adata.X.todense().A, index=adata.obs.index, columns=adata.var.index).to_csv('gene_exp.csv')
coors = pd.DataFrame(adata.obsm['spatial']).to_csv('coors.csv', index=None)
adata.obs.to_csv('metadata.csv')
>>> INFO: Use local data.
>>> INFO: dataset name: dorsolateral prefrontal cortex (DLPFC), slice: 151507, size: (4226, 33538), cluster: 7.(0.381s)

Load the files

Then, we can load those data via the file path.
In the gene expression file, each row is a spot and each column is a gene.
[2]:
gene_exp_file = pd.read_csv('gene_exp.csv', index_col=0)
gene_exp_file
[2]:
AL357140.1 EPHA2 C1QC AL009181.1 TEKT2 NT5C1A FAM183A KDM4A-AS1 AL158840.1 AC092813.2 ... YWHAH C22orf42 RFPL2 PVALB Z82188.2 FBLN1 CPT1B PCP4 TFF1 LINC01678
AAACAACGAATAGTTC-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2.446557 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0
AAACAAGTATCTCCCA-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 3.450282 0.0 0.0 0.000000 0.0 1.208025 0.0 0.000000 1.739366 0.0
AAACAATCTACTAGCA-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 0.000000 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0
AAACACCAATAACTGC-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.937048 0.0 0.0 0.000000 0.0 0.000000 0.0 1.378545 0.000000 0.0
AAACAGCTTTCAGAAG-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2.403673 0.0 0.0 1.471228 0.0 0.000000 0.0 0.000000 0.000000 0.0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
TTGTTGTGTGTCAAGA-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2.896793 0.0 0.0 2.257376 0.0 0.000000 0.0 0.000000 0.000000 0.0
TTGTTTCACATCCAGG-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2.259678 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0
TTGTTTCATTAGTCTA-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 2.580975 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0
TTGTTTCCATACAACT-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 3.162901 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0
TTGTTTGTGTAAATTC-1 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 ... 1.645596 0.0 0.0 0.000000 0.0 0.000000 0.0 0.000000 0.000000 0.0

4226 rows × 300 columns

In the spatial coordination file, each column is an axis.

[3]:
coors_file = pd.read_csv('coors.csv')
coors_file
[3]:
0 1
0 3276 2514
1 9178 8520
2 5133 2878
3 3462 9581
4 2779 7663
... ... ...
4221 7464 6239
4222 5045 9466
4223 4218 9703
4224 4017 7906
4225 5683 3359

4226 rows × 2 columns

In the spatial coordination file, each column is a metadata.

[4]:
spot_metadata_file = pd.read_csv('metadata.csv', index_col=0)
spot_metadata_file
[4]:
in_tissue array_row array_col cluster
AAACAACGAATAGTTC-1 1 0 16 Layer_1
AAACAAGTATCTCCCA-1 1 50 102 Layer_3
AAACAATCTACTAGCA-1 1 3 43 Layer_1
AAACACCAATAACTGC-1 1 59 19 WM
AAACAGCTTTCAGAAG-1 1 43 9 Layer_6
... ... ... ... ...
TTGTTGTGTGTCAAGA-1 1 31 77 Layer_3
TTGTTTCACATCCAGG-1 1 58 42 Layer_6
TTGTTTCATTAGTCTA-1 1 60 30 WM
TTGTTTCCATACAACT-1 1 45 27 Layer_6
TTGTTTGTGTAAATTC-1 1 7 51 Layer_1

4226 rows × 4 columns

generate adata

Next, we can generate the sc.AnnData object by the stCluster.

[5]:
from stCluster.utils import gen_adata

adata = gen_adata(gene_exp_file, coors_file, spot_metadata_file, gene_exp_file.columns.to_list(), gene_exp_file.index.to_list())
adata
[5]:
AnnData object with n_obs × n_vars = 4226 × 300
    obs: 'in_tissue', 'array_row', 'array_col', 'cluster'
    obsm: 'spatial'

The gene expression matrix is saved at adata.X as a sparse matrix. The spatial coordination can be accessed at adata.obsm['spatial']