Stars
Qwen2-VL is the multimodal large language model series developed by Qwen team, Alibaba Cloud.
GROOViST: A Metric for Grounding Objects in Visual Storytelling – EMNLP 2023
Official Pytorch Implementation of Synthesizing Coherent Story with Auto-Regressive Latent Diffusion Models
This repository contains the code for SFT, RLHF, and DPO, designed for vision-based LLMs, including the LLaVA models and the LLaMA-3.2-vision models.
为GPT/GLM等LLM大语言模型提供实用化交互接口,特别优化论文阅读/润色/写作体验,模块化设计,支持自定义快捷按钮&函数插件,支持Python和C++等项目剖析&自译解功能,PDF/LaTex论文翻译&总结功能,支持并行问询多种LLM模型,支持chatglm3等本地模型。接入通义千问, deepseekcoder, 讯飞星火, 文心一言, llama2, rwkv, claude2, m…
中文nlp解决方案(大模型、数据、模型、训练、推理)
自己总结的这十多年做Qt开发以来的经验,以及Qt相关武林秘籍电子书,会一直持续更新增加,欢迎各位留言增加内容或者提出建议,谢谢!公众号:Qt实战/Qt入门和进阶/Qt教程
[CVPR 2024] Intelligent Grimm - Open-ended Visual Storytelling via Latent Diffusion Models
[ECCV 2024] official code for "Long-CLIP: Unlocking the Long-Text Capability of CLIP"
[A toolbox for fun.] Transform Image into Unique Paragraph with ChatGPT, BLIP2, OFA, GRIT, Segment Anything, ControlNet.
🧑🏫 60+ Implementations/tutorials of deep learning papers with side-by-side notes 📝; including transformers (original, xl, switch, feedback, vit, ...), optimizers (adam, adabelief, sophia, ...), ga…
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome
A recurrent neural network for generating little stories about images
Codes for Paper: Attractive Storyteller: Stylized Visual Storytelling with Unpaired Text
Stable Diffusion web UI
Nightly release of ControlNet 1.1
Analyze the unstructured data with Towhee, such as reverse image search, reverse video search, audio classification, question and answer systems, molecular search, etc.
Image to prompt with BLIP and CLIP
Chinese version of GPT2 training code, using BERT tokenizer.