Hunyuan Video Foley in ComfyUI — Generate Realistic AI Soundtracks from Video Using a Custom Node

Ever wished your AI-generated videos could sound as realistic as they look?
Meet Hunyuan Video Foley, Tencent’s powerful new video-to-sound model that adds lifelike, synchronized audio to your AI footage.

In this post, I’ll walk you through how I integrated Hunyuan Video Foley into ComfyUI using a custom node I developed — optimized for low VRAM GPUs. You’ll learn how to install it, set up the model files, and see real examples of what this model can do.

What Is Hunyuan Video Foley?

Hunyuan Video Foley (often shortened as Huen Video Foley) is Tencent’s latest AI model that automatically generates high-quality 48 kHz synchronized audio from video input.
It doesn’t just slap sounds onto visuals — it analyzes both visual and textual cues to produce context-aware audio, from ambient soundscapes to complex Foley details like footsteps, engines, or ambient background tones.

Unlike traditional Foley, which takes hours of manual audio design, this model does it in seconds, with professional-grade clarity.

Vantage HunyuanFoley Custom ComfyUI Node — Built for Low VRAM GPUs

At the time of testing, there was no official ComfyUI node for Hunyuan Foley.
Some early attempts required 20GB of VRAM or more, which made them impractical for most creators.

So, I decided to build my own ComfyUI node, fine-tuned to run efficiently on lower-end GPUs like the RTX 3060 (12GB).

✅ VRAM usage: Peaks at just ~89%
✅ Supports local model loading
✅ Generates synchronized Foley in under 2 minutes
✅ Fully integrated into ComfyUI workflows

You can install the node directly through the ComfyUI Custom Node Manager — just search for “Vantage Hunyuan Foley” by Vantage with AI.
You can also install it manually from GitHub — https://github.com/vantagewithai/Vantage-HunyuanFoley.

Installation Guide

Here’s how you can install and configure the node step-by-step:

1. Install the Custom Node

If using the ComfyUI Manager:

Open Custom Nodes Manager
Search for “Vantage Hunyuan Foley”
Select the one by Vantage with AI
Click Install and Restart ComfyUI

Manual Install:

Clone the repository manually into your custom_nodes directory
Install dependencies via requirements.txt
Restart ComfyUI

That’s it — your node is ready to go!

2. Download and Place Model Files

You’ll need to manually download the model files and place them correctly.
Go to your ComfyUI models directory and create a new folder: models/Hunyuan_Foley/

Then download and arrange the following:

Base Model

From the official Hunyuan Video Foley page on Hugging Face
Place all base model files directly inside Hunyuan_Foley/

SIGLIP Vision Model

Create a subfolder: models/Hunyuan_Foley/SIGLIP2/

Download:

model.safetensors
config.json
preprocessor_config.json

CLAP Text Model

Create another subfolder: models/Hunyuan_Foley/clap/

Download:

model.safetensors
config.json
merges.txt
vocab.json

Your final folder structure should look like this:

Hunyuan_Foley/
├── model.safetensors
├── config.json
├── SIGLIP2/
│   ├── model.safetensors
│   ├── config.json
│   └── preprocessor_config.json
└── clap/
    ├── model.safetensors
    ├── config.json
    ├── merges.txt
    └── vocab.json

Now, you’re ready to run the workflow.

Test 1: Default Audio (No Prompt)

For the first run, I left the prompt empty and let the model decide how to interpret the scene.
It generated a suspenseful background score with subtle environmental details — like twigs snapping and ambient tension, perfectly synced to the visuals.

Clip length: 5 seconds
Render time: 98 seconds
Audio: Dynamic and scene-accurate

Downloads & Links

Final Thoughts

Hunyuan Video Foley represents a huge leap forward in AI-generated audio.
When paired with WAN 2.2, it allows creators to produce fully cinematic clips — complete with dynamic visuals and synchronized sound — all inside ComfyUI.

And with the custom low VRAM node, you can now enjoy this workflow on mid-range GPUs without sacrificing quality.

This is just the beginning — as Tencent refines Hunyuan Foley and as ComfyUI evolves, the gap between AI video and full cinematic production keeps shrinking.