-
Shanghai Jiao Tong University
- Shanghai, China
-
23:56
(UTC -12:00) - https://drsy.github.io/
Lists (3)
Sort Name ascending (A-Z)
Stars
Inpaint anything using Segment Anything and inpainting models.
📖 Full Stack Practice of the Large Language Model Training @ RLChina 2024
Use PEFT or Full-parameter to finetune 350+ LLMs or 100+ MLLMs. (LLM: Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, Gemma2, ...; MLLM: Qwen2-VL, Qwen2-Audio, Llama3.2-V…
Extensible, parallel implementations of t-SNE
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
official repo for "VideoScore: Building Automatic Metrics to Simulate Fine-grained Human Feedback for Video Generation" [EMNLP2024]
Vchitect-2.0: Parallel Transformer for Scaling Up Video Diffusion Models
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Together Mixture-Of-Agents (MoA) – 65.1% on AlpacaEval with OSS models
Video+code lecture on building nanoGPT from scratch
Arena-Hard-Auto: An automatic LLM benchmark.
Qwen2.5 is the large language model series developed by Qwen team, Alibaba Cloud.
Gemma 2B with 10M context length using Infini-attention.
Benchmarking LLMs with Challenging Tasks from Real Users
Simple macOS menu bar application to view and interact with reminders. Developed with SwiftUI and using Apple Reminders as a source.
A work in progress. Trying to write about all interesting or necessary pieces in the current development of LLMs and generative AI. Gradually adding more topics.
llama3 implementation one matrix multiplication at a time
GEAR: An Efficient KV Cache Compression Recipefor Near-Lossless Generative Inference of LLM
flash attention tutorial written in python, triton, cuda, cutlass
Advanced Quantization Algorithm for LLMs. This is official implementation of "Optimize Weight Rounding via Signed Gradient Descent for the Quantization of LLMs"
DeepSeek-V2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model