Skip to content
View tianxie-9's full-sized avatar

Block or report tianxie-9

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

Magnificent app which corrects your previous console command.

Python 84,942 3,428 Updated Jul 19, 2024

Grok open release

Python 49,458 8,324 Updated Aug 30, 2024

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 460 27 Updated Aug 15, 2024
Python 7,095 549 Updated Aug 12, 2024

Library for fast text representation and classification.

HTML 25,870 4,711 Updated Mar 22, 2024

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

HTML 296 33 Updated Dec 26, 2023

All-in-one text de-duplication

Python 593 69 Updated May 21, 2024

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 1,971 138 Updated Oct 3, 2024

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,153 543 Updated Sep 27, 2024
Python 2,492 304 Updated May 19, 2024

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 12,141 998 Updated Jul 5, 2024

Incredibly fast Whisper-large-v3

Jupyter Notebook 1,839 103 Updated Feb 16, 2024

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 11,755 1,242 Updated Aug 21, 2024

Mamba SSM architecture

Python 12,742 1,074 Updated Sep 26, 2024

Manipulate audio with a simple and easy high level interface

Python 8,833 1,038 Updated Jul 25, 2024

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 6,030 758 Updated Sep 26, 2024

Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"

Python 283 23 Updated Dec 20, 2023

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,122 210 Updated Sep 26, 2024

DeepSeek Coder: Let the Code Write Itself

Python 6,619 460 Updated May 21, 2024

Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)

Python 2,223 114 Updated Mar 13, 2024

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,711 454 Updated May 3, 2024

State-of-the-Art Text Embeddings

Python 14,978 2,444 Updated Sep 30, 2024

万卷1.0多模态语料

537 26 Updated Oct 20, 2023

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…

Jupyter Notebook 12,013 1,841 Updated Oct 4, 2024

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 8,834 559 Updated Apr 16, 2024

Making Reddit data accessible to researchers, moderators and everyone else. Interact with the data through large dumps, an API or web interface.

TypeScript 252 19 Updated Oct 5, 2024

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 27,869 4,110 Updated Oct 5, 2024

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTML 8,665 707 Updated Oct 5, 2024

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 16,014 1,570 Updated Oct 3, 2024

Inference code for CodeLlama models

Python 15,927 1,850 Updated Aug 12, 2024
Next