Question

Using get_normalized_expression (scVI) vs. Log-Normalized Counts

0

Entering edit mode

10 weeks ago

Nicolas • 0

Hello

I’m working with four single-cell RNA-seq samples, each from a different genotype (WT, two single knockouts, and one double knockout). To integrate these samples, I used the scVI model, as shown below:

# Sets up the AnnData object for this model
scvi.model.SCVI.setup_anndata(adata, layer="counts", batch_key='sample')

# Create the SCVI model
model_scvi = scvi.model.SCVI(adata)

# Train the mmodel_scvi
model_scvi.train(max_epochs=500, early_stopping=True)

After training, I obtained the latent representation and used it to compute a UMAP embedding:


# Get the latent representation of the SCVI model
adata.obsm["X_scVI"] = model_scvi.get_latent_representation()

# Calcualte the neighbors and UMAP using the X_scVI embedding
sc.pp.neighbors(adata, use_rep="X_scVI")
sc.tl.umap(adata)

# Rename cell_type to paper_cell_type
adata.obs.rename(columns={'cell_type' : 'paper_cell_type'}, inplace=True)

# Cluster cells with leiden algorithm
sc.tl.leiden(adata, resolution=0.7, random_state=124)

# Plot UMAPs. Visually identify leiden clusters corresponding to the
# cells annotations in the paper
sc.pl.umap(adata, color=['leiden'])

Next, I wanted to examine gene expression across clusters. I tried two approaches:

1 - Using the normalized expression from scVI's get_normalized_expression:

# Get the scvi normalized counts
adata.layers['scvi_normalized'] = model_scvi.get_normalized_expression(library_size=1e4)

2 - Using standard log-normalized counts:

# Log normalize the counts
sc.pp.normalize_total(adata, target_sum=1e4)
sc.pp.log1p(adata)

# Save a copy of the log_normalized counts in the layer log_normalized
adata.layers['log_normalized'] = adata.X.copy()

Which of these normalized matrices is better suited for downstream analysis? I haven't found benchmarks comparing the results of scVI's get_normalized_expression with standard log normalization. When I plot gene expression on the UMAP using both methods, the results look quite different:

With log normalized matrix:

enter image description here

With get_normalized_expression matrix:

enter image description here

Thank you!

scVI normalization RNA-seq single-cell UMAP • 199 views

ADD COMMENT • link 10 weeks ago by Nicolas • 0