vlm

Here are 183 public repositories matching this topic...

sgl-project / sglang

SGLang is a fast serving framework for large language models and vision language models.

cuda inference pytorch transformer moe llama vlm llm llm-serving llava llama2 deepseek-llm deepseek llama3 llama3-1 deepseek-v3

Updated Jan 11, 2025
Python

Nexa SDK is a comprehensive toolkit for supporting GGML and ONNX models. It supports text generation, image generation, vision-language models (VLM), Audio Language Model, auto-speech-recognition (ASR), and text-to-speech (TTS) capabilities.

audio sdk transformers tts language-model whisper asr vlm sdk-python edge-computing on-device-ml on-device-ai llm stable-diffusion

Updated Jan 8, 2025
Python

CVHub520 / X-AnyLabeling

Star

Effortless data labeling with AI support from Segment Anything and other awesome models.

Updated Jan 6, 2025
Python

BAAI-Agents / Cradle

Star

The Cradle framework is a first attempt at General Computer Control (GCC). Cradle supports agents to ace any computer task by enabling strong reasoning abilities, self-improvment, and skill curation, in a standardized general environment with minimal requirements.

ai gcc multimodality vlm cradle computer-control lmm grounding ai-agent large-language-models llm generative-ai vision-language-model ai-agents-framework general-computer-control personoid foundation-agent

Updated Nov 7, 2024
Python

QiuYannnn / Local-File-Organizer

Star

An AI-powered file management tool that ensures privacy by organizing local texts, images. Using Llama3.2 3B and Llava v1.6 models with the Nexa SDK, it intuitively scans, restructures, and organizes files for quick, seamless access and easy retrieval.

vlm file-organizer on-device-ai llm llama3

Updated Oct 21, 2024
Python

xlang-ai / OSWorld

Star

[NeurIPS 2024] OSWorld: Benchmarking Multimodal Agents for Open-Ended Tasks in Real Computer Environments

agent cli benchmark natural-language-processing gui reinforcement-learning artificial-intelligence code-generation language-model vlm rpa multimodal llm large-action-model

Updated Dec 20, 2024
Python

om-ai-lab / OmAgent

Star

Build multimodal language agents for very fast prototype and production

Updated Jan 10, 2025
Python

coderonion / awesome-yolo-object-detection

Star

🚀🚀🚀 A collection of some awesome public YOLO object detection series projects and the related object detection datasets.

Updated Jan 8, 2025

heshengtao / comfyui_LLM_party

Star

LLM Agent Framework in ComfyUI includes MCP sever, Omost,GPT-sovits, ChatTTS,GOT-OCR2.0, and FLUX prompt nodes,access to Feishu,discord,and adapts to all llms with similar openai / aisuite interfaces, such as o1,ollama, gemini, grok, qwen, GLM, deepseek, kimi,doubao. Adapted to local llms, vlm, gguf such as llama-3.3, Linkage graphRAG / RAG

Updated Jan 11, 2025
Python

ThuCCSLab / Awesome-LM-SSP

Star

A reading list for large models safety, security, and privacy (including Awesome LLM Security, Safety, etc.).

nlp security privacy jailbreak safety awesome-list language-model vlm adversarial-attacks diffusion-models llm

Updated Jan 10, 2025

BAAI-DCAI / Bunny

Star

A family of lightweight multimodal models.

english chinese vlm gpt-4 chatgpt mllm multimodal-large-language-models

Updated Nov 18, 2024
Python

peterdsharpe / AeroSandbox

Sponsor

Star

Aircraft design optimization made fast through computational graph transformations (e.g., automatic differentiation). Composable analysis tools for aerodynamics, propulsion, structures, trajectory design, and much more.

python analysis simulation optimization aerospace automatic-differentiation airplane cfd aircraft aerodynamics vlm xfoil aerospace-engineering aircraft-design mdo mdao aerodynamic-analysis 3d-panel

Updated Jan 9, 2025
Jupyter Notebook

zubair-irshad / Awesome-Robotics-3D

Star

A curated list of 3D Vision papers relating to Robotics domain in the era of large models i.e. LLMs/VLMs, inspired by awesome-computer-vision, including papers, codes, and related websites

computer-vision robotics navigation benchmarks simulations manipulation scene-graph grasping nerf 3d pointclouds vlm diffusion-models pretraining policy-learning foundation-models llm vision-language-model gaussian-splatting

Updated Nov 4, 2024

coderonion / awesome-llm-and-aigc

Star

🚀🚀🚀A collection of some wesome public projects about Large Language Model(LLM), Vision Language Model(VLM), Vision Language Action(VLA), AI Generated Content(AIGC), the related Datasets and Applications.

Updated Jan 8, 2025

gokayfem / awesome-vlm-architectures

Star

Famous Vision Language Models and Their Architectures

awesome awesome-list kosmos clip image-encoder vlm blip multimodal text-encoder vision-language-model llava internlm cogvlm qwen-vl

Updated Sep 8, 2024
Markdown

THUDM / CogAgent

Star

An open-sourced end-to-end VLM-based GUI Agent

agent glm vlm computer-use gui-agent

Updated Jan 6, 2025
Python

mbzuai-oryx / GeoChat

Star

[CVPR 2024 🔥] GeoChat, the first grounded Large Vision Language Model for Remote Sensing

remote-sensing vlm

Updated Nov 28, 2024
Python

gokayfem / ComfyUI_VLM_nodes

Star

Custom ComfyUI nodes for Vision Language Models, Large Language Models, Image to Music, Text to Music, Consistent and Random Creative Prompt Generation

image-captioning nodes vlm custom-nodes img2text llm mllm llava comfyui siglip phi15 joytag img2sfx

Updated Nov 6, 2024
Python

yueliu1999 / Awesome-Jailbreak-on-LLMs

Star

Awesome-Jailbreak-on-LLMs is a collection of state-of-the-art, novel, exciting jailbreak methods on LLMs. It contains papers, codes, datasets, evaluations, and analyses.

security privacy ai jailbreak safety vlm llm llms vlms

Updated Jan 6, 2025

niuzaisheng / ScreenAgent

Star

ScreenAgent: A Computer Control Agent Driven by Visual Language Large Model (IJCAI-24)

agent ai vlm llm

Updated Nov 25, 2024
Python

Improve this page

Add a description, image, and links to the vlm topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the vlm topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

vlm

Here are 183 public repositories matching this topic...

sgl-project / sglang

NexaAI / nexa-sdk

CVHub520 / X-AnyLabeling

BAAI-Agents / Cradle

QiuYannnn / Local-File-Organizer

xlang-ai / OSWorld

om-ai-lab / OmAgent

coderonion / awesome-yolo-object-detection

heshengtao / comfyui_LLM_party

ThuCCSLab / Awesome-LM-SSP

BAAI-DCAI / Bunny

peterdsharpe / AeroSandbox

zubair-irshad / Awesome-Robotics-3D

coderonion / awesome-llm-and-aigc

gokayfem / awesome-vlm-architectures

THUDM / CogAgent

mbzuai-oryx / GeoChat

gokayfem / ComfyUI_VLM_nodes

yueliu1999 / Awesome-Jailbreak-on-LLMs

niuzaisheng / ScreenAgent

Improve this page

Add this topic to your repo