Open
Description
I would like to use variations of CLR transformation.
e.g., non-zero or iqlr CLR, both using only a subset of the features to estimate the geometric mean.
I can transform a phyloseq object before the taxatree_models, but then it complains that the input contains non-zero values, which messes up the aggregation. I found no way to turn off the aggregation.
Ideally, one could add a custom function to tax_aggregate or directly implement these transformations in microViz.
Here is my function:
#' @title get otu table as a matrix with taxa are columns format
#' @description A wrapper to get otu table as a matrix with taxa are columns format
#' @param pseq The phyloseq object.
#' @return otu table as a matrix with taxa are columns format
#' @examples
#' get_otu_matrix(pseq)
#' @export
#' @import phyloseq
otu_matrix <- function(pseq) {
OTU <- as(phyloseq::otu_table(pseq), "matrix")
if (phyloseq::taxa_are_rows(pseq)) {
OTU <- t(OTU)
}
OTU
}
#' @title robust (non zero) centered log transformation
#' @description A wrapper to perform Bayesian replacement of zeroes and CLR-transformation. It is rebust in the sense that it takes only values that are not zero in the calculation of the geometric mean.
#' @param pseq The phyloseq object.
#' @param log_fun The log function to use. Default is log2.
#' @return phyloseq object with CLR-transformed abundances.
#' @export
#'
rclr_transformation <- function(pseq, log_fun = log2) {
if (class(pseq) != "phyloseq") {
stop("pseq must be a phyloseq object.")
}
abundances <- otu_matrix(pseq)
mdat <- microbiome::meta(pseq)
if (!identical(rownames(abundances), rownames(mdat))) {
stop("OTU table and metadata are not in order.")
}
# remove all columns with all zeros
abundances <- abundances[, colSums(abundances) > 0]
abundances.log <- zCompositions::cmultRepl(abundances, label = 0, method = "CZM",z.delete = FALSE) |>
log_fun() |>
as.matrix()
abundances.log.nz <- abundances.log
abundances.log.nz[abundances == 0] <- NaN
geo_means <- rowMeans(abundances.log.nz, na.rm = TRUE)
transformed <- abundances.log - geo_means
ps <- phyloseq::phyloseq(
phyloseq::otu_table(t(transformed), taxa_are_rows = TRUE),
phyloseq::sample_data(pseq),
phyloseq::phy_tree(pseq, errorIfNULL = FALSE),
phyloseq::tax_table(pseq)
)
return(ps)
}