Tutorial 2: Detecting anomalous cells from scRNA-seq human lung datasets (in terminal)¶
M2ASDA also supports running in terminal. In this case, there is no need to install the Python package, and a single command can implement the pipeline of training and saving the results from scratch.
In this tutorial, we will show how to use M2ASDA for detecting anomalous cells. Here, we use two human scRNA-seq datasets: a healthy lung tissue (also 10xG-hHL) and a lung cancer (also 10xG-hLC-A). Detailed information about these datasets will be provided as follows.
Setting working path¶
import os
# Change the working path to the source codes automatically
notebook_path = os.path.abspath('')
project_root = os.path.abspath(os.path.join(notebook_path, '..', '..'))
src_path = os.path.join(project_root, 'src/')
os.chdir(src_path)
Running in Terminal¶
M2ASDA also supports running in terminal. All acceptable parameters and their descriptions are described below:
!python -m m2asda.anomaly --h
/usr/local/anaconda3/lib/python3.9/runpy.py:127: RuntimeWarning: 'm2asda.anomaly' found in sys.modules after import of package 'm2asda', but prior to execution of 'm2asda.anomaly'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) usage: anomaly.py [-h] [--ref_path REF_PATH] [--tgt_path TGT_PATH] [--result_path RESULT_PATH] [--pth_path PTH_PATH] [--n_epochs N_EPOCHS] [--batch_size BATCH_SIZE] [--learning_rate LEARNING_RATE] [--n_critic N_CRITIC] [--alpha ALPHA] [--beta BETA] [--gamma GAMMA] [--lambda LAMBDA] [--GPU GPU] [--random_state RANDOM_STATE] [--n_genes N_GENES] [--run_gmm RUN_GMM] M2ASDA for anomaly detection. optional arguments: -h, --help show this help message and exit Data Parameters: --ref_path REF_PATH Path to read the reference h5ad file --tgt_path TGT_PATH Path to read the target h5ad file --result_path RESULT_PATH Path to save the output csv file --pth_path PTH_PATH Path to save the trained generator AnomalyModel Parameters: --n_epochs N_EPOCHS Number of epochs --batch_size BATCH_SIZE Batch size --learning_rate LEARNING_RATE Learning rate --n_critic N_CRITIC Number of discriminator iterations per generator iteration --alpha ALPHA Loss weight alpha --beta BETA Loss weight beta --gamma GAMMA Loss weight gamma --lambda LAMBDA Loss weight lambda --GPU GPU GPU ID for training, e.g., cuda:0 --random_state RANDOM_STATE Random seed --n_genes N_GENES Number of genes --run_gmm RUN_GMM Run GMM for obtaining binary label
Then, you should set the path to 1) read reference dataset; 2) read target dataset; 3) save the output result csv file; 4) save the trained generator (optional). Here, saving trained file is mainly for the next steps, batch alignment and subtyping.
!python -m m2asda.anomaly \
--ref_path '/volume3/kxu/scdata/Cancer/Process_A.h5ad' \
--tgt_path '/volume3/kxu/scdata/Cancer/Process_B.h5ad' \
--result_path '../results/anomaly.csv' \
--pth_path '../results/generator.pth'
/usr/local/anaconda3/lib/python3.9/runpy.py:127: RuntimeWarning: 'm2asda.anomaly' found in sys.modules after import of package 'm2asda', but prior to execution of 'm2asda.anomaly'; this may result in unpredictable behaviour warn(RuntimeWarning(msg)) =============== AnomalyModel Parameters =============== n_epochs = 30 batch_size = 256 learning_rate = 0.0001 n_critic = 2 random_state = 2024 n_genes = 3000 device = cuda:0 loss_weight = {'alpha': 30, 'beta': 10, 'gamma': 1, 'lambda': 10} g_configs = {'input_dim': 3000, 'hidden_dim': [1024, 512, 256], 'latent_dim': 256, 'memory_size': 512, 'threshold': 0.005, 'temperature': 0.1, 'normalization': True, 'activation': True, 'dropout': 0.1} d_configs = {'input_dim': 3000, 'hidden_dim': [1024, 512, 256], 'latent_dim': 256, 'normalization': True, 'activation': True, 'dropout': 0.1} gmm_configs = {'random_state': 2024, 'max_iter': 100, 'tol': 1e-05, 'prior_beta': [1, 10]} =============== AnomalyModel Training =============== Begin to train M2ASDA on the reference dataset... Training Epochs: 100%|█| 30/30 [01:17<00:00, 2.57s/it, D_Loss=-1.32, G_Loss=7.3 Training process has been finished. Begin to detect anomalies on the target dataset... Anomalous spots have been detected. Inference Epochs: 22%|█████▎ | 22/100 [00:00<00:01, 75.79it/s]GMM-based thresholder has converged. Inference Epochs: 22%|█████▎ | 22/100 [00:00<00:01, 72.43it/s] =============== Result Saving =============== Prediction result has been saved at ../results/anomaly.csv! Generator has been saved at ../results/generator.pth!