Skip to content
View huoyijie's full-sized avatar

Block or report huoyijie

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse

Starred repositories

Showing results

The official repo for [CVPR'23] "DeepSolo: Let Transformer Decoder with Explicit Points Solo for Text Spotting" & [ArXiv'23] "DeepSolo++: Let Transformer Decoder with Explicit Points Solo for Multi…

Python 247 35 Updated Aug 9, 2024

A curated list of resources dedicated to table recognition

369 50 Updated Jan 28, 2024

A Curated List of Awesome Table Structure Recognition (TSR) Research. Including models, papers, datasets and codes. Continuously updating.

133 6 Updated Sep 9, 2024

Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.

Python 2,861 170 Updated Oct 4, 2024

A collection of original, innovative ideas and algorithms towards Advanced Literate Machinery. This project is maintained by the OCR Team in the Language Technology Lab, Tongyi Lab, Alibaba Group.

C++ 1,450 173 Updated Sep 30, 2024

Algorithms, papers, datasets, performance comparisons for Document AI. Continuously updating.

159 3 Updated Aug 29, 2024

A curated list of papers about key information extraction.

78 7 Updated Aug 14, 2024

Real-Time and Accurate Full-Body Multi-Person Pose Estimation&Tracking System

Python 8,010 1,970 Updated May 13, 2024

OpenPose: Real-time multi-person keypoint detection library for body, face, hands, and foot estimation

C++ 31,145 7,856 Updated Aug 3, 2024

tiktoken is a fast BPE tokeniser for use with OpenAI's models.

Python 12,278 831 Updated Oct 3, 2024

Use PEFT or Full-parameter to finetune 400+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…

Python 4,057 358 Updated Nov 1, 2024

Detectron2 is a platform for object detection, segmentation and other visual recognition tasks.

Python 30,400 7,468 Updated Oct 14, 2024

docTR (Document Text Recognition) - a seamless, high-performing & accessible library for OCR-related tasks powered by Deep Learning.

Python 3,791 434 Updated Oct 29, 2024

Vary-tiny codebase upon LAVIS (for training from scratch)and a PDF image-text pairs data (about 600k including English/Chinese)

Python 66 4 Updated Sep 21, 2024

Data processing with ML and LLM

Python 3,593 373 Updated Oct 24, 2024

Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.

Shell 9,210 571 Updated Oct 30, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 1 Updated Jun 27, 2024

The official repo of Qwen (通义千问) chat & pretrained large language model proposed by Alibaba Cloud.

Python 13,880 1,127 Updated Sep 24, 2024

FunCodec is a research-oriented toolkit for audio quantization and downstream applications, such as text-to-speech synthesis, music generation et.al.

Python 1 Updated Jul 16, 2024

Retinaface get 80.99% in widerface hard val using mobilenet0.25.

Python 2,619 769 Updated Jun 28, 2023

Sequence modeling benchmarks and temporal convolutional networks

Python 4,162 876 Updated Mar 28, 2022

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.

Python 28,299 3,378 Updated Oct 31, 2024

Vector (and Scalar) Quantization, in Pytorch

Python 2,566 205 Updated Oct 23, 2024

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Python 134,277 26,847 Updated Oct 31, 2024

A multi-voice TTS system trained with an emphasis on quality

Jupyter Notebook 13,145 1,814 Updated Aug 19, 2024

Implementation of Voicebox, new SOTA Text-to-speech network from MetaAI, in Pytorch

Python 608 51 Updated Oct 1, 2024

PyTorch implementation of VALL-E(Zero-Shot Text-To-Speech), Reproduced Demo https://lifeiteng.github.io/valle/index.html

Python 2,037 320 Updated Nov 14, 2023

State-of-the-art deep learning based audio codec supporting both mono 24 kHz audio and stereo 48 kHz audio.

Python 3,491 305 Updated Jan 4, 2024

A Fundamental End-to-End Speech Recognition Toolkit and Open Source SOTA Pretrained Models, Supporting Speech Recognition, Voice Activity Detection, Text Post-processing etc.

Python 6,761 717 Updated Nov 1, 2024
Next