tianxie-9

Follow

Tian Xie tianxie-9

Follow

Research Engineer @ Character.ai

17 followers · 32 following

Character.ai
Palo Alto, CA
14:45 (UTC -07:00)
https://www.linkedin.com/in/tian-xie-a13287128/

Achievements

Achievements

Stars

nvbn / thefuck

Magnificent app which corrects your previous console command.

Python 84,942 3,428 Updated Jul 19, 2024

xai-org / grok-1

Grok open release

Python 49,458 8,324 Updated Aug 30, 2024

lucidrains / ring-attention-pytorch

Implementation of 💍 Ring Attention, from Liu et al. at Berkeley AI, in Pytorch

Python 460 27 Updated Aug 15, 2024

LargeWorldModel / LWM

Python 7,095 549 Updated Aug 12, 2024

facebookresearch / fastText

Library for fast text representation and classification.

HTML 25,870 4,711 Updated Mar 22, 2024

sangmichaelxie / doremi

Pytorch implementation of DoReMi, a method for optimizing the data mixture weights in language modeling datasets

HTML 296 33 Updated Dec 26, 2023

ChenghaoMou / text-dedup

All-in-one text de-duplication

Python 593 69 Updated May 21, 2024

huggingface / datatrove

Freeing data processing from scripting madness by providing a set of platform-agnostic customizable pipeline processing blocks.

Python 1,971 138 Updated Oct 3, 2024

FMInference / FlexiGen

Running large language models on a single GPU for throughput-oriented scenarios.

Python 9,153 543 Updated Sep 27, 2024

openai / weak-to-strong

Python 2,492 304 Updated May 19, 2024

lukas-blecher / LaTeX-OCR

pix2tex: Using a ViT to convert images of equations into LaTeX code.

Python 12,141 998 Updated Jul 5, 2024

chenxwh / insanely-fast-whisper

Forked from Vaibhavs10/insanely-fast-whisper

Incredibly fast Whisper-large-v3

Jupyter Notebook 1,839 103 Updated Feb 16, 2024

m-bain / whisperX

WhisperX: Automatic Speech Recognition with Word-level Timestamps (& Diarization)

Python 11,755 1,242 Updated Aug 21, 2024

state-spaces / mamba

Mamba SSM architecture

Python 12,742 1,074 Updated Sep 26, 2024

jiaaro / pydub

Manipulate audio with a simple and easy high level interface

Python 8,833 1,038 Updated Jul 25, 2024

pyannote / pyannote-audio

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Jupyter Notebook 6,030 758 Updated Sep 26, 2024

lm-sys / llm-decontaminator

Code for the paper "Rethinking Benchmark and Contamination for Language Models with Rephrased Samples"

Python 283 23 Updated Dec 20, 2023

intel / intel-extension-for-transformers

⚡ Build your chatbot within minutes on your favorite device; offer SOTA compression techniques for LLMs; run LLMs efficiently on Intel Platforms⚡

Python 2,122 210 Updated Sep 26, 2024

deepseek-ai / DeepSeek-Coder

DeepSeek Coder: Let the Code Write Itself

Python 6,619 460 Updated May 21, 2024

thunlp / UltraChat

Large-scale, Informative, and Diverse Multi-round Chat Data (and Models)

Python 2,223 114 Updated Mar 13, 2024

jzhang38 / TinyLlama

The TinyLlama project is an open endeavor to pretrain a 1.1B Llama model on 3 trillion tokens.

Python 7,711 454 Updated May 3, 2024

UKPLab / sentence-transformers

State-of-the-Art Text Embeddings

Python 14,978 2,444 Updated Sep 30, 2024

opendatalab / WanJuan1.0

万卷1.0多模态语料

537 26 Updated Oct 20, 2023

meta-llama / llama-recipes

Scripts for fine-tuning Meta Llama with composable FSDP & PEFT methods to cover single/multi-node GPUs. Supports default & custom datasets for applications such as summarization and Q&A. Supporting…

Jupyter Notebook 12,013 1,841 Updated Oct 4, 2024

facebookresearch / nougat

Implementation of Nougat Neural Optical Understanding for Academic Documents

Python 8,834 559 Updated Apr 16, 2024

ArthurHeitmann / arctic_shift

Making Reddit data accessible to researchers, moderators and everyone else. Interact with the data through large dumps, an API or web interface.

TypeScript 252 19 Updated Oct 5, 2024

vllm-project / vllm

A high-throughput and memory-efficient inference and serving engine for LLMs

Python 27,869 4,110 Updated Oct 5, 2024

Unstructured-IO / unstructured

Open source libraries and APIs to build custom preprocessing pipelines for labeling, training, or production machine learning pipelines.

HTML 8,665 707 Updated Oct 5, 2024

huggingface / peft

🤗 PEFT: State-of-the-art Parameter-Efficient Fine-Tuning.

Python 16,014 1,570 Updated Oct 3, 2024

meta-llama / codellama

Inference code for CodeLlama models

Python 15,927 1,850 Updated Aug 12, 2024