Using get_normalized_expression (scVI) vs. Log-Normalized Counts
0
0
Entering edit mode
10 weeks ago
Nicolas • 0

Hello

I’m working with four single-cell RNA-seq samples, each from a different genotype (WT, two single knockouts, and one double knockout). To integrate these samples, I used the scVI model, as shown below:

# Sets up the AnnData object for this model
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key='sample')

# Create the SCVI model
model_scvi = scvi.model.SCVI(adata)

# Train the mmodel_scvi
model_scvi.train(max_epochs=500, early_stopping=True)

After training, I obtained the latent representation and used it to compute a UMAP embedding:


# Get the latent representation of the SCVI model
adata.obsm["X_scVI"] = model_scvi.get_latent_representation()

# Calcualte the neighbors and UMAP using the X_scVI embedding
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)

# Rename cell_type to paper_cell_type
adata.obs.rename(columns={'cell_type' : 'paper_cell_type'}, inplace=True)

# Cluster cells with leiden algorithm
sc.tl.leiden(adata, resolution=0.7, random_state=124)

# Plot UMAPs. Visually identify leiden clusters corresponding to the
# cells annotations in the paper
sc.pl.umap(adata, color=['leiden'])

Next, I wanted to examine gene expression across clusters. I tried two approaches:

1 - Using the normalized expression from scVI's get_normalized_expression:

# Get the scvi normalized counts
adata.layers['scvi_normalized'] = model_scvi.get_normalized_expression(library_size=1e4)

2 - Using standard log-normalized counts:

# Log normalize the counts
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# Save a copy of the log_normalized counts in the layer log_normalized
adata.layers['log_normalized'] = adata.X.copy()

Which of these normalized matrices is better suited for downstream analysis? I haven't found benchmarks comparing the results of scVI's get_normalized_expression with standard log normalization. When I plot gene expression on the UMAP using both methods, the results look quite different:

With log normalized matrix:

enter image description here

With get_normalized_expression matrix:

enter image description here

Thank you!

scVI normalization RNA-seq single-cell UMAP • 199 views
ADD COMMENT

Login before adding your answer.

Traffic: 1492 users visited in the last hour
Help About
FAQ
Access RSS
API
Stats

Use of this site constitutes acceptance of our User Agreement and Privacy Policy.

Powered by the version 2.3.6