-
Dalian University of Technology
- Dalian,Liaoning,China
Stars
OmniTokenizer: one model and one weight for image-video joint tokenization.
An elegant \LaTeX\ résumé template. 大陆镜像 https://gods.coding.net/p/resume/git
SeqTrackv2: Unified Sequence-to-Sequence Learning for Single- and Multi-Modal Visual Object Tracking
MambaOut: Do We Really Need Mamba for Vision?
An efficient pure-PyTorch implementation of Kolmogorov-Arnold Network (KAN).
Llama中文社区,Llama3在线体验和微调模型已开放,实时汇总最新Llama3学习资料,已将所有代码更新适配Llama3,构建最好的中文Llama大模型,完全开源可商用
[MICCAI 2024] Official Code for "MambaMIL: Enhancing Long Sequence Modeling with Sequence Reordering in Computational Pathology"
(ICML 2024) Spider: A Unified Framework for Context-dependent Concept Segmentation
[IJCAI-24] Spatial-Temporal-Decoupled Masked Pre-training for Spatiotemporal Forecasting
YOLOv5 🚀 in PyTorch > ONNX > CoreML > TFLite
A simple and efficient Mamba implementation in pure PyTorch and MLX.
(CVPR2023/TPAMI2024) Integrally Pre-Trained Transformer Pyramid Networks -- A Hierarchical Vision Transformer for Masked Image Modeling
ViT Prisma is a mechanistic interpretability library for Vision Transformers (ViTs).
Collection of AWESOME vision-language models for vision tasks
[NeurIPS 2024] VastTrack: Vast Category Visual Object Tracking
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
PyTorch code and models for the DINOv2 self-supervised learning method.
[CVPR 2024 Oral] InternVL Family: A Pioneering Open-Source Alternative to GPT-4o. 接近GPT-4o表现的开源多模态对话模型
Visual Object Tracking
MiniCPM3-4B: An edge-side LLM that surpasses GPT-3.5-Turbo.
The simplest, fastest repository for training/finetuning medium-sized GPTs.
[ICML 2024] Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
This repository provides the code and model checkpoints of the research paper: Scalable Pre-training of Large Autoregressive Image Models