Generate `sc.AnnData` by the gene expression file and the spatial coordination

stCluster requires the input as a sc.AnnData. In this section, we will introduce how to generate a sc.AnnData by the csv files.

Data generation

First, we save the DLPFC 151507 slice’s gene expression, spatial coordination, and metadata of spots to csv. To simplify this process, we only save 300 HVGs for each spot.

[1]:

import scanpy as sc
import pandas as pd
from st_datasets.dataset import get_data, get_dlpfc_data

adata, n_cluster = get_data(dataset_func=get_dlpfc_data, id='151507', top_genes=300)
adata = adata[:, adata.var.highly_variable]
gene_expression = pd.DataFrame(adata.X.todense().A, index=adata.obs.index, columns=adata.var.index).to_csv('gene_exp.csv')
coors = pd.DataFrame(adata.obsm['spatial']).to_csv('coors.csv', index=None)
adata.obs.to_csv('metadata.csv')

>>> INFO: Use local data.
>>> INFO: dataset name: dorsolateral prefrontal cortex (DLPFC), slice: 151507, size: (4226, 33538), cluster: 7.(0.381s)

Load the files

Then, we can load those data via the file path.

In the gene expression file, each row is a spot and each column is a gene.

[2]:

gene_exp_file = pd.read_csv('gene_exp.csv', index_col=0)
gene_exp_file

[2]:

	AL357140.1	EPHA2	C1QC	AL009181.1	TEKT2	NT5C1A	FAM183A	KDM4A-AS1	AL158840.1	AC092813.2	...	YWHAH	C22orf42	RFPL2	PVALB	Z82188.2	FBLN1	CPT1B	PCP4	TFF1	LINC01678
AAACAACGAATAGTTC-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	2.446557	0.0	0.0	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0
AAACAAGTATCTCCCA-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	3.450282	0.0	0.0	0.000000	0.0	1.208025	0.0	0.000000	1.739366	0.0
AAACAATCTACTAGCA-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	0.000000	0.0	0.0	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0
AAACACCAATAACTGC-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	1.937048	0.0	0.0	0.000000	0.0	0.000000	0.0	1.378545	0.000000	0.0
AAACAGCTTTCAGAAG-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	2.403673	0.0	0.0	1.471228	0.0	0.000000	0.0	0.000000	0.000000	0.0
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
TTGTTGTGTGTCAAGA-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	2.896793	0.0	0.0	2.257376	0.0	0.000000	0.0	0.000000	0.000000	0.0
TTGTTTCACATCCAGG-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	2.259678	0.0	0.0	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0
TTGTTTCATTAGTCTA-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	2.580975	0.0	0.0	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0
TTGTTTCCATACAACT-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	3.162901	0.0	0.0	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0
TTGTTTGTGTAAATTC-1	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	0.0	...	1.645596	0.0	0.0	0.000000	0.0	0.000000	0.0	0.000000	0.000000	0.0

4226 rows × 300 columns

In the spatial coordination file, each column is an axis.

[3]:

coors_file = pd.read_csv('coors.csv')
coors_file

[3]:

	0	1
0	3276	2514
1	9178	8520
2	5133	2878
3	3462	9581
4	2779	7663
...	...	...
4221	7464	6239
4222	5045	9466
4223	4218	9703
4224	4017	7906
4225	5683	3359

4226 rows × 2 columns

In the spatial coordination file, each column is a metadata.

[4]:

spot_metadata_file = pd.read_csv('metadata.csv', index_col=0)
spot_metadata_file

[4]:

	in_tissue	array_row	array_col	cluster
AAACAACGAATAGTTC-1	1	0	16	Layer_1
AAACAAGTATCTCCCA-1	1	50	102	Layer_3
AAACAATCTACTAGCA-1	1	3	43	Layer_1
AAACACCAATAACTGC-1	1	59	19	WM
AAACAGCTTTCAGAAG-1	1	43	9	Layer_6
...	...	...	...	...
TTGTTGTGTGTCAAGA-1	1	31	77	Layer_3
TTGTTTCACATCCAGG-1	1	58	42	Layer_6
TTGTTTCATTAGTCTA-1	1	60	30	WM
TTGTTTCCATACAACT-1	1	45	27	Layer_6
TTGTTTGTGTAAATTC-1	1	7	51	Layer_1

4226 rows × 4 columns

generate adata

Next, we can generate the sc.AnnData object by the stCluster.

[5]:

from stCluster.utils import gen_adata

adata = gen_adata(gene_exp_file, coors_file, spot_metadata_file, gene_exp_file.columns.to_list(), gene_exp_file.index.to_list())
adata

[5]:

AnnData object with n_obs × n_vars = 4226 × 300
    obs: 'in_tissue', 'array_row', 'array_col', 'cluster'
    obsm: 'spatial'

The gene expression matrix is saved at adata.X as a sparse matrix. The spatial coordination can be accessed at adata.obsm['spatial']

Generate sc.AnnData by the gene expression file and the spatial coordination

Data generation

Load the files

generate adata

Generate `sc.AnnData` by the gene expression file and the spatial coordination