-
Tongji University
- Shanghai, China
Starred repositories
OpenGVLab / OV-OAD
Forked from ZQSIAT/OV-OADThis repo takes the initial step towards leveraging text learning for online action detection without explicit human supervision.
Progress-Aware Online Action Segmentation for Egocentric Procedural Task Videos
A deep metric learning approach for action segmentation
LongMIT: Essential Factors in Crafting Effective Long Context Multi-Hop Instruction Datasets
[NeurIPS 2024] SimPO: Simple Preference Optimization with a Reference-Free Reward
The official implementation of Self-Play Fine-Tuning (SPIN)
This is work done by the Oxen.ai Community, trying to reproduce the Self-Rewarding Language Model paper from MetaAI.
Implementation of the ICML 2024 paper "Training Large Language Models for Reasoning through Reverse Curriculum Reinforcement Learning" presented by Zhiheng Xi et al.
g1: Using Llama-3.1 70b on Groq to create o1-like reasoning chains
Implementation of the Quiet-STAR paper (https://arxiv.org/pdf/2403.09629.pdf)
Official PyTorch implementation of CODA-LM(https://arxiv.org/abs/2404.10595)
[ECCV 2024 Oral] DriveLM: Driving with Graph Visual Question Answering
An VideoQA dataset based on the videos from ActivityNet
✨✨Latest Advances on Multimodal Large Language Models
open-source multimodal large language model that can hear, talk while thinking. Featuring real-time end-to-end speech input and streaming audio output conversational capabilities.
For the paper "Learning Discriminative Action Representations in Videos via Embedding Distance Correlation"
🔍 An LLM-based Multi-agent Framework of Web Search Engine (like Perplexity.ai Pro and SearchGPT)
PyTorch implementation of Depthwise Separable Convolution
Long context evaluation for large language models
Free ChatGPT API Key,免费ChatGPT API,支持GPT4 API(免费),ChatGPT国内可用免费转发API,直连无需代理。可以搭配ChatBox等软件/插件使用,极大降低接口使用成本。国内即可无限制畅快聊天。
This is the official code of VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding (ECCV 2024)
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.
Implementation of Depthwise Separable Convolution (pytorch)
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)