-
Loading...
- Shanghai, China
- https://huoyijie.cn
- https://www.zhihu.com/people/huoyijie
- All languages
- Batchfile
- C
- C#
- C++
- CSS
- Clojure
- CoffeeScript
- Dart
- Dockerfile
- EJS
- Go
- HTML
- Java
- JavaScript
- Jinja
- Jupyter Notebook
- Kotlin
- Less
- Lua
- MATLAB
- MDX
- Makefile
- Markdown
- Mustache
- Objective-C
- PHP
- Pug
- Python
- R
- Rich Text Format
- Ruby
- Rust
- SCSS
- Scala
- Shell
- Solidity
- Starlark
- Svelte
- Swift
- TeX
- TypeScript
- Vue
- reStructuredText
Starred repositories
The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multi…
A curated list of resources dedicated to table recognition
A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.
Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.
A curated list of papers about key information extraction.
Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System
OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation
tiktoken is a fast BPE tokeniser for use with OpenAI's models.
Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…
Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.
docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.
Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
huoyijie / Qwen
Forked from QwenLM/QwenThe official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.
huoyijie / FunCodec
Forked from modelscope/FunCodecFunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.
Retinaface get 80.99% in widerface hard val using mobilenet0.25.
Sequence modeling benchmarks and temporal convolutional networks
Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
Vector (and Scalar) Quantization, in Pytorch
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.
A multi-voice TTS system trained with an emphasis on quality
Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch
PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html
State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.
A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.