Skip to content
View shuyansy's full-sized avatar
  • BAAI
  • China

Block or report shuyansy

Block user

Prevent this user from interacting with your repositories and sending you notifications. Learn more about blocking users.

You must be logged in to block users.

Please don't include any personal information such as legal names or email addresses. Maximum 100 characters, markdown supported. This note will be visible to only you.
Report abuse

Contact GitHub support about this user’s behavior. Learn more about reporting abuse.

Report abuse
Showing results

TextCtrl: Diffusion-based Scene Text Editing with Prior Guidance Control

Python 3 1 Updated Oct 10, 2024

An open-source implementation for training LLaVA-NeXT.

Python 267 11 Updated Sep 26, 2024

VILA - a multi-image visual language model with training, inference and evaluation recipe, deployable from cloud to edge (Jetson Orin and laptops)

Python 1,888 150 Updated Sep 25, 2024

Accelerating the development of large multimodal models (LMMs) with lmms-eval

Python 1,464 125 Updated Oct 15, 2024

Evaluation code and datasets for the ACL 2024 paper, VISTA: Visualized Text Embedding for Universal Multi-Modal Retrieval. The original code and model can be accessed at FlagEmbedding.

Python 13 2 Updated Sep 4, 2024

Long Context Transfer from Language to Vision

Python 310 17 Updated Aug 26, 2024

The official repo for "SpatialBot: Precise Spatial Understanding with Vision Language Models.

Python 149 9 Updated Sep 19, 2024

🔥🔥MLVU: Multi-task Long Video Understanding Benchmark

Python 145 Updated Oct 12, 2024

Retrieval and Retrieval-augmented LLMs

Python 7,137 520 Updated Oct 10, 2024

✨✨Latest Advances on Multimodal Large Language Models

12,192 778 Updated Oct 9, 2024

[CVPR 2024] MovieChat: From Dense Token to Sparse Memory for Long Video Understanding

Python 506 41 Updated Sep 6, 2024

A family of lightweight multimodal models.

Python 905 68 Updated Sep 18, 2024

Awesome papers & datasets specifically focused on long-term videos.

178 7 Updated Oct 3, 2024
Python 161 7 Updated Jul 12, 2024

Official code of SmartEdit [CVPR-2024 Highlight]

Python 240 8 Updated Jun 21, 2024

Code for In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering

Python 133 6 Updated Jul 6, 2024

An official Project related to Paper "Perceiving Ambiguity and Semantics without Recognition: An Efficient and Effective Ambiguous Scene Text Detector" (ACM MM 2023)

Python 27 3 Updated Dec 3, 2023

The official project of paper "Visual Text Meets Low-level Vision: A Comprehensive Survey on Visual Text Processing"

44 1 Updated Oct 7, 2024

Official code repo for "Editing Implicit Assumptions in Text-to-Image Diffusion Models"

Python 80 2 Updated Mar 15, 2023

A paper collection of recent diffusion models for text-image generation tasks, e,g., visual text generation, font generation, text removal, text image super resolution, text editing, handwritten ge…

193 4 Updated Aug 2, 2024

CVPR and NeurIPS poster examples and templates. May we have in-person poster session soon!

1,456 136 Updated May 9, 2023

State-of-the-Art Text Embeddings

Python 15,051 2,451 Updated Oct 15, 2024

Optocal Character Recognition (OCR / HTR) using Transformers

Python 10 1 Updated Aug 20, 2022

This is some code for multilingual machine translation (English, Korean, Japanese, Arabic)

Python 1 Updated Sep 7, 2023

Text-DIAE: A Self-Supervised Degradation Invariant Autoencoders for Text Recognition and Document Enhancement - AAAI 2023

Python 23 6 Updated Jul 12, 2023

This is a simple yet method focused on handwritten text dataset generation, which is beneficial for handwritten text detection and segmentation

Python 1 Updated Sep 7, 2023

🦙 LaMa Image Inpainting, Resolution-robust Large Mask Inpainting with Fourier Convolutions, WACV 2022

Jupyter Notebook 7,939 842 Updated Jul 26, 2024
Next