multimodal

Here are 942 public repositories matching this topic...

Mintplex-Labs / anything-llm

The all-in-one Desktop & Docker AI application with built-in RAG, AI agents, and more.

nodejs desktop-app webui ai-agents multimodal rag vector-database llm localai local-llm ollama llm-webui lmstudio llm-application agent-framework-javascript crewai llama3 custom-ai-agents

Updated Jan 14, 2025
JavaScript

jina-ai / serve

Star

☁️ Build multimodal AI applications with cloud-native stack

Updated Dec 20, 2024
Python

haotian-liu / LLaVA

Star

[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V level capabilities and beyond.

chatbot llama multimodal multi-modality gpt-4 foundation-models visual-language-learning chatgpt instruction-tuning vision-language-model llava llama2 llama-2

Updated Aug 12, 2024
Python

microsoft / unilm

Star

Large-scale Self-supervised Pre-training Across Tasks, Languages, and Modalities

Updated Jan 7, 2025
Python

NVIDIA / NeMo

Star

A scalable generative AI framework built for researchers and developers working on Large Language Models, Multimodal, and Speech AI (Automatic Speech Recognition and Text-to-Speech)

machine-translation tts speech-synthesis neural-networks deeplearning speaker-recognition asr multimodal speech-translation large-language-models speaker-diariazation generative-ai

Updated Jan 15, 2025
Python

library & platform to build, distribute, monetize ai apps that have the full context (like rewind, granola, etc.), open source, 100% local, developer friendly. 24/7 screen, mic, keyboard recording and control

machine-learning ai computer-vision ml agi vision agents multimodal llm

Updated Jan 15, 2025
TypeScript

rerun-io / rerun

Star

Visualize streams of multimodal data. Free, fast, easy to use, and simple to integrate. Built in Rust.

visualization python rust computer-vision cpp robotics multimodal

Updated Jan 15, 2025
Rust

bentoml / BentoML

Star

The easiest way to serve AI apps and models - Build Model Inference APIs, Job queues, LLM apps, Multi-model pipelines, and more!

python machine-learning deep-learning model-serving multimodal mlops ml-engineering ai-inference llm generative-ai llmops llm-serving model-inference-service llm-inference inference-platform

Updated Jan 14, 2025
Python

enricoros / big-AGI

Sponsor

Star

AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.

ui beam agi openai gpt mistral multimodal groq openai-api gpt-4 large-language-models stable-diffusion generative-ai chatgpt chatgpt-ui gpt-5 anthropic

Updated Jan 14, 2025
TypeScript

SkalskiP / courses

Sponsor

Star

This repository is a curated collection of links to various courses and resources about Artificial Intelligence (AI)

nlp machine-learning natural-language-processing tutorial deep-neural-networks computer-vision deep-learning transformers generative-model multimodal mlops stable-diffusion

Updated Apr 22, 2024
Python

facebookresearch / mmf

Star

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

deep-learning dialog pytorch vqa pretrained-models captioning multimodal multi-tasking textvqa hateful-memes

Updated Nov 15, 2024
Python

swyxio / ai-notes

Star

notes for software engineers getting up to speed on new AI developments. Serves as datastore for https://latent.space writing, and product brainstorming, but has cleaned up canonical references under the /Resources folder.

ai openai gpt multimodal gpt-3 prompt-engineering stable-diffusion

Updated Jan 15, 2025
HTML

modelscope / ms-swift

Star

Use PEFT or Full-parameter to finetune 400+ LLMs (Qwen2.5, Llama3.2, GLM4, Internlm2.5, Yi1.5, Mistral, Baichuan2, DeepSeek, ...) and 150+ MLLMs (Qwen2-VL, Qwen2-Audio, Llama3.2-Vision, Llava, InternVL2.5, MiniCPM-V-2.6, GLM4v, Xcomposer2.5, Yi-VL, DeepSeek-VL2, Phi3.5-Vision, GOT-OCR2, ...).

agent deploy llama lora liger peft multimodal sft dpo pre-training llm modelscope vllm qwen2 llama3 internvl qwen2-vl phi4

Updated Jan 15, 2025
Python

livekit / agents

Star

Build real-time multimodal AI applications 🤖🎙️📹

real-time video ai voice agents voice-assistant multimodal

Updated Jan 15, 2025
Python

kyegomez / tree-of-thoughts

Sponsor

Star

Plug in and Play Implementation of Tree of Thoughts: Deliberate Problem Solving with Large Language Models that Elevates Model Reasoning by atleast 70%

deep-learning prompt artificial-intelligence multimodal gpt4 prompt-learning prompt-tuning prompt-engineering chatgpt

Updated Oct 29, 2024
Python

kyegomez / swarms

Sponsor

Star

The Enterprise-Grade Production-Ready Multi-Agent Orchestration Framework Join our Community: https://discord.gg/jM3Z6M9uMq

Updated Jan 13, 2025
Python

IDEA-CCNL / Fengshenbang-LM

Star

Fengshenbang-LM(封神榜大模型)是IDEA研究院认知计算与自然语言研究中心主导的大模型开源体系，成为中文AIGC和认知智能的基础设施。

transformers pytorch chinese-nlp pretrained-models distributed-training multimodal aigc

Updated Aug 13, 2024
Python

luban-agi / Awesome-AIGC-Tutorials

Star

Curated tutorials and resources for Large Language Models, AI Painting, and more.

nlp awesome ai deep-learning tutorials multimodal courses-resource aigc llm midjourney prompt-engineering stable-diffusion chatgpt

Updated Mar 31, 2024

TEN-framework / TEN-Agent

Star

TEN Agent is a conversational AI powered by the TEN, integrating Gemini 2.0 Live, OpenAI Realtime, RTC, and more. It delivers real-time capabilities to see, hear, and speak, while being fully compatible with popular workflow platforms like Dify and Coze.

Updated Jan 14, 2025
Python

jina-ai / discoart

Star

🪩 Create Disco Diffusion artworks in one line

generative-art cross-modal diffusion prompts creative-ai creative-art multimodal clip-guided-diffusion dalle disco-diffusion midjourney imgen discodiffusion latent-diffusion stable-diffusion

Updated May 16, 2023
Python

Improve this page

Add a description, image, and links to the multimodal topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the multimodal topic, visit your repo's landing page and select "manage topics."

Learn more

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

multimodal

Here are 942 public repositories matching this topic...

Mintplex-Labs / anything-llm

jina-ai / serve

haotian-liu / LLaVA

microsoft / unilm

NVIDIA / NeMo

mediar-ai / screenpipe

rerun-io / rerun

bentoml / BentoML

enricoros / big-AGI

SkalskiP / courses

facebookresearch / mmf

swyxio / ai-notes

modelscope / ms-swift

livekit / agents

kyegomez / tree-of-thoughts

kyegomez / swarms

IDEA-CCNL / Fengshenbang-LM

luban-agi / Awesome-AIGC-Tutorials

TEN-framework / TEN-Agent

jina-ai / discoart

Improve this page

Add this topic to your repo