Skip to content

Xiaohao-Liu/Awesome-Vison2Audio

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

30 Commits
 
 
 
 

Repository files navigation

Awesome-Vison2Audio

A curated list of Vison to Audio Generation

Paper List

2024

  • 2024 🎶 VMAs: Video-to-Music Generation via Semantic Alignment in Web Music Videos, ByteDance, 🌐 Demo
  • 2024 🎶 MM-LDM: Multi-Modal Latent Diffusion Model for Sounding Video Generation, Institute of automation, MM'24
  • 2024 Jul. 🔉 FoleyCrafter: Bring Silent Videos to Life with Lifelike and Synchronized Sounds. Shanghai Artificial Intelligence Laboratory, Chinese University of Hong Kong, Shenzhen. 🌐 Demo 🔗 Code 🤗 HF Datasets (VGGSound, AVSync15)
  • 2024 Jul. 🔉 FRIEREN: Efficient Video-to-Audio Generation with Rectified Flow Matching. ZheJiang University. 🌐 Demo Datastes (VGGSound)
  • 2024 Jul. 🔉 Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity. Dolby Lab. ECCV'24. 🌐 Demo Datastes (VGGSound)
  • 2024 Jul. 🔉 Read, Watch and Scream! Sound Generation from Text and Video, NAVER. 🌐 Demo Datastes (VGGSound)
  • 2024 June. 🎶 VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling, HKUST, Microsoft Research Asia. 🔗 Code Datasets (V2M).
  • 2024 May. 🔉 Visual Echoes: A Simple Unified Transformer for Audio-Visual Generation, Sony. [image2audio]. 🌐 Demo Datasets (VGGSound)
  • 2024 Feb. 🔉 Seeing and Hearing: Open-domain Visual-Audio Generation with Diffusion Latent Aligners, HKUST, CVPR'24. 🌐 Demo 🔗 Code Datasets (VGGSound)
  • 2024 🎶 V2Meow: Meowing to the Visual Beat via Video-to-Music Generation, Google. AAAI'24. 🌐 Demo Dataset (MV100K)
  • 2024 🎶 Diff-BGM: A Diffusion Model for Video Background Music Generation, PKU, CVPR'24. 🔗 Code Datasets (BGM909)
  • 2024 🎶 MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models, Adobe, CVPR'24. 🔗 Code 🌐 Demo
  • 2024 🔉 From Vision to Audio and Beyond: A Unified Model for Audio-Visual Representation and Generation, University of Washington, ICML'24.
  • 2024 🔉 SonicVisionLM: Playing Sound with Vision Language Models, Shanghai University, CVPR'24 🌐 Demo
  • 2024 🎶 Video2Music: Suitable music generation from videos using an Affective Multimodal Transformer model, SUTD, EXPERT SYST APPL'249. 🔗 Code. Datasets (MuVi-Sync)
  • 2024 🔉 V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models, Dolby, AAAI'24. 🌐 Demo
  • 2024 🎶 DanceComposer: Dance-to-Music Generation Using a Progressive Conditional Music Generator, Sun Yat-sen University, TMM'24
  • 2024 🔉 Video-Foley: Two-Stage Video-To-Sound Generation via Temporal Event Condition For Foley Sound, KAIST. 🌐 Demo
  • 2024 🔉 LoVA: Long-form Video-to-Audio Generation, RUC. Datasets(AudioSet, VGGSound, UnAV100)

2023

  • 2023 Aug. 🎶 Video Background Music Generation: Dataset, Method and Evaluation, Beihang University, ICCV'23. 🔗 Code Datasets (SymMV)
  • 2023 Jun. 🔉 DiffFoley: Synchronized Video-to-Audio Synthesis with Latent Diffusion Models, Tsinghua University, NIPS'23. 🌐 Demo 🔗 Code Datasets (VGGSound, AudioSet)
  • 2023 Feb. 🎶 Discrete Contrastive Diffusion for Cross-Modal Music and Image Generation, Illinois Institute of Technology, ICLR'23. 🌐 Demo 🔗 Code Datasets (AIST++, Tiktok Dance-Music)
  • 2023 🔉 MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation, Renmin University of China CVPR'23. 🔗 Code Datasets (Landscape, AIST++)
  • 2023 🔉 Conditional Generation of Audio from Video via Foley Analogies, University of Michigan, Adobe, CVPR'23 🌐 Demo 🔗 Code
  • 2023 🎶 Long-Term Rhythmic Video Soundtracker,Shanghai Artificial Intelligence Laboratory, ICML'23 🌐 Demo 🔗 Code
  • 2023 🔉 CMMD: Contrastive Multi-Modal Diffusion for Video-Audio Conditional Modeling, Microsoft

2022

  • 2022 Jul. 🎶 Quantized GAN for Complex Music Generation from Dance Videos, Illinois Institute of Technology, ECCV'22. 🌐 Demo 🔗 Code Datasets (AIST++, Tiktok Dance-Music)

2021

  • 2021 Nov. 🎶 Video Background Music Generation with Controllable Music Transformer, Beihang University, MM'21. 🌐 Demo 🔗 Code 🌟

2020

  • 2020 Jul. 🔉 Generating Visually Aligned Sound from Videos, South China University of Technology, TIP'20. 🌐 Demo 🔗 Code
  • 2020 Jul. 🎶 Foley Music: Learning to Generate Music from Videos, MIT.
  • 2020 Jun. 🔉 Audeo: Audio Generation for a Silent Performance Video, University of Washington.

2019

  • 2019, 🎶 AIST Dance Video Database: Multi-Genre, Multi-Dancer, and Multi-Camera Database for Dance Information Processing, AIST, ISMIR'19.

2018

  • 2018 Jun. 🔉 Visual to Sound: Generating Natural Sound for Videos in the Wild, University of North Carolina, CVPR'18.
  • 2018, 🔉 Visually Indicated Sound Generation by Perceptually Optimized Classification, University of Southern California, ECCV MULA workshop'18. 🔗 Code

2016

Survey

  • 2024 Aug. Foundation Models for Music: A Survey, Queen Mary University of London.
  • 2024 Jun. LLMs Meet Multimodal Generation and Editing: A Survey, HKUST. 🔗 Code.
  • 2022 Aug. Learning in Audio-visual Context: A Review, Analysis, and New Perspective, Renmin University of China.
  • 2023 Sep. Sparks of Large Audio Models: A Survey and Outlook, Queensland University of Technology.

Datasets

  • 🎶 V2M (Unpublished): VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling. (Movie trailer, 190K for training, 20K for finetuning, 300 for benchmarking).
  • 🔉 VGGSound: Vggsound: A large-scale audio-visual dataset. ICASSP'20
  • Landscape
  • 🎶 Tiktok Dance Dataset
  • 🔉 AVSync15: Audio-synchronized visual animation.
  • 🎶 BGM909. Piano version music. Diff-BGM: A Diffusion Model for Video Background Music Generation
  • 🎶 MV100K: V2Meow
  • 🎶 MMTrailer: A Multimodal Trailer Video Dataset with Language and Music Descriptions.
  • 🎶 SymMV: Video Background Music Generation: Dataset, Method and Evaluation.
  • 🔉 VAS: Generating Visually Aligned Sound from Videos.
  • 🎶 AIST++. Dance-to-Music
  • 🎶 AIST. Dance-to-Music
  • 🎶 MuVi-Sync. Video2Music: Suitable Music Generation from Videos using an Affective Multimodal Transformer model

Evaluation Metrics

About

A curated list of Video to Audio Generation

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •