Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add neighborhood diversity calculation and plotting #138

Open
wants to merge 2 commits into
base: devel
Choose a base branch
from

Conversation

dtm2451
Copy link
Owner

@dtm2451 dtm2451 commented Dec 8, 2023

This PR adds a new functionality and plotters to go with it:

  • calculation of diversity within cells' nearest neighbors

Motivation: Plotting such data can be VERY useful for batch effect assessment

The goal is to create 3 primary functions:

  1. calcNeighborMetadataDiversity(): to perform the diversity calculations
  2. dittoNeighborDiversityPlot(): to automate plotting these data using dittoDimPlot()
  3. dittoNeighborDiversityHex(): to automate plotting these data using dittoDimHex()

Steps:

  • create calcNeighborMetadataDiversity()
    • initial documentation
    • tests
  • create dittoNeighborDiversityPlot()
    • initial documentation
    • tests
  • create .default_neighbors() for backend (auto-determination of Neighbors-data to use within the user giving it)
    • Establish all dafualt neighbors graph names of Seurat
  • create dittoNeighborDiversityHex()
    • initial documentation
    • tests
  • Investigate methodologies for SCEs to ensure not creating too much extra work for SCE users
  • Finalize documentation
  • Finalize tests

@dtm2451
Copy link
Owner Author

dtm2451 commented Dec 8, 2023

This work is underway, and function and input names are all subject to potentially change until merged into devel. But as this also seems to be a major hole in the field, anyone wishing to use/test these functionality early can install from this branch with remotes::install_github("dtm2451/dittoSeq@neighbor-diversity"), then restart your R if you'd already had dittoSeq loaded.

@dtm2451
Copy link
Owner Author

dtm2451 commented Dec 21, 2023

One thing to note, and I'll aim to work this into the documentation, is that the algorithm I'm basing the calcNeighborMetadataDiversity() calculation off of made use of a NearestNeighbors calculation run specifically for this purpose that used k=sqrt(ncells). This number of neighbors may be overkill for gigantic datasets, but the important thing is that using only the 20-or-so neighbors regularly recorded after a "standard" Seurat algorithm run 1) cannot yield quite as accurate of results, and 2) negates some of the utility of using a quantile cutoff on neighbor distances due to the relative inflation of such distances.

So:

  • adjust documentation to recommend running neighbors calculation with higher number of neighbors retained
  • adjust defaulting to 1) warn and 2) not use the distance quantile trimming unless the number of neighbors in neighbors is at least 30.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant