Stars
TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control
An open-source implementation for training LLaVA-NeXT.
VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)
Accelerating the development of large multimodal models (LMMs) with lmms-eval
Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.
Long Context Transfer from Language to Vision
The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.
🔥🔥MLVU: Multi-task Long Video Understanding Benchmark
Retrieval and Retrieval-augmented LLMs
✨✨Latest Advances on Multimodal Large Language Models
[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding
Awesome papers & datasets specifically focused on long-term videos.
Official code of SmartEdit [CVPR-2024 Highlight]
Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
An official Project related to Paper "Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector" (ACM MM 2023)
The official project of paper "Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing"
Official code repo for "Editing Implicit Assumptions in Text-to-Image Diffusion Models"
A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten ge…
CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!
State-of-the-Art Text Embeddings
Optocal Character Recognition (OCR / HTR) using Transformers
This is some code for multilingual machine translation (English, Korean, Japanese, Arabic)
Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023
This is a simple yet method focused on handwritten text dataset generation, which is beneficial for handwritten text detection and segmentation
🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022