Skip to content

Payne-Wang/GPTCelltype

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

57 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

注意:使用的模型最好是智力比较高的,最起码是claude-3-5-sonnet-20240620、gpt-4o及以上的水平,不然无法准确执行提示词而导致无法模式化输出,进一步导致正则判断错误而陷入死循环。 以下是容易陷入死循环的部分:

allres <- sapply(1:cutnum,function(i) {
      id <- which(cid==i)
      flag <- 0
      while (flag == 0) {
        k <- openai::create_chat_completion(
          model = model,base_url=base_url,
          message = list(list("role" = "user", "content" = paste0('Identify cell types of ',tissuename,' cells using the following markers separately for each\n row. Only provide the cell type name. Do not show numbers before the name.\n Some can be a mixture of multiple cell types.\n',paste(input[id],collapse = '\n'))))
        )
        res <- strsplit(k$choices[,'message.content'],'(\n){1,}')[[1]]  #注意这个位置,原代码中只有一个回车符匹配,我这里做了多回车符匹配。原因是一些执行能力较弱的模型,输出中有可能出现多个回车符。
        print(res)
        if (length(res)==length(id))
          flag <- 1
      }
      names(res) <- names(input)[id]
      res
    },simplify = F)

GPTCelltype: Automatic cell type annotation with GPT-4

Installation

To install the latest version of GPTCelltype package via Github, run the following commands in R:

install.packages("openai")
remotes::install_github("Winnie09/GPTCelltype")

🚀 Quick start with Seurat pipeline


# IMPORTANT! Assign your OpenAI API key. See Vignette for details
Sys.setenv(OPENAI_API_KEY = 'your_openai_API_key')

# Load packages
library(GPTCelltype)
library(openai)

# Assume you have already run the Seurat pipeline https://satijalab.org/seurat/
# "obj" is the Seurat object; "markers" is the output from FindAllMarkers(obj)
# Cell type annotation by GPT-4
res <- gptcelltype(markers, model = 'gpt-4')

# Assign cell type annotation back to Seurat object
obj@meta.data$celltype <- as.factor(res[as.character(Idents(obj))])

# Visualize cell type annotation on UMAP
DimPlot(obj,group.by='celltype')

⚠️Warning: avoid sharing your API key with others or uploading it to public spaces.

Vignette

You can view the complete vignette here.

Trouble Shooting

GPTCelltype software can be installed via Github in seconds. Users should have R > 3.5.x installed. R can be downloaded here: http://www.r-project.org/.

For Windows users, Rtools is also required to be installed. Rtools can be downloaded here: (https://cloud.r-project.org/bin/windows/Rtools/). For R version 3.5.x, Rtools35.exe is recommended. Use default settings to perform the installation.

For mac users, if there is any problem with installation problem, please try download and install clang-8.0.0.pkg from the following URL: https://cloud.r-project.org/bin/macosx/tools/clang-8.0.0.pkg

For increased accuracy, you can supply optional tissuename as an argument "tissuename='your_tissue_name'" to gptcelltype.

Introduction

Cell type annotation is an essential step in single-cell RNA-seq analysis. However, it is a time-consuming process that often requires expertise in collecting canonical marker genes and manually annotating cell types. Automated cell type annotation methods typically require the acquisition of high-quality reference datasets and the development of additional pipelines. We assessed the performance of GPT-4, a highly potent large language model, for cell type annotation, and demonstrated that it can automatically and accurately annotate cell types by utilizing marker gene information generated from standard single-cell RNA-seq analysis pipelines. Evaluated across hundreds of tissue types and cell types, GPT-4 generates cell type annotations exhibiting strong concordance with manual annotations and has the potential to considerably reduce the effort and expertise needed in cell type annotation. We also developed this software, GPTCelltype, an open-source R software package to facilitate cell type annotation by GPT-4.

Citation

Hou, W. and Ji, Z., 2023. Assessing GPT-4 for cell type annotation in single-cell RNA-seq analysis. bioRxiv, pp.2023-04, doi: https://doi.org/10.1101/2023.04.16.537094.

Contact

Authors: Wenpin Hou (wh2526@cumc.columbia.edu), Zhicheng Ji (zhicheng.ji@duke.edu).

Report bugs and provide suggestions by sending email to the maintainer Dr. Wenpin Hou (wh2526@cumc.columbia.edu) or open a new issue on this Github page.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 99.9%
  • R 0.1%