LTX-2.3 Released: Major Upgrade for AI Video Generation (ComfyUI Workflow Guide)

LTX-2.3: A Major Update to LTX-2 for AI Video Generation

Lightricks has released LTX-2.3, a significant upgrade to the LTX-2 video generation model. This new version introduces improvements across visual detail, motion stability, prompt understanding, and audio generation, making it one of the most capable open AI video models currently available.

The update builds on the powerful foundation of LTX-2 but refines several key areas of the generation pipeline, improving both video quality and overall reliability.

In this article, we’ll explore what’s new in LTX-2.3 and highlight a unified ComfyUI workflow that allows creators to generate videos using text prompts, images, or custom audio inputs.

For a complete walkthrough of the workflow and generation tests, watch the tutorial video included below.

Watch the Full LTX-2.3 Workflow Tutorial

The video tutorial demonstrates the complete workflow and generation process, including:

Required models for LTX-2.3
ComfyUI workflow structure
Text-to-Video generation
Image-to-Video animation
Talking avatar generation with custom audio
Multi-stage sampling and upscaling

Watch the full tutorial here:

What’s New in LTX-2.3

Sharper Fine Detail

One of the biggest improvements in LTX-2.3 is a rebuilt latent space and updated VAE trained on higher-quality data.

This change helps preserve fine visual details such as:

Hair and skin textures
Small objects and edges
Text and graphical elements

The result is cleaner and more consistent outputs throughout the generation pipeline.

Improved Prompt Adherence

LTX-2.3 also improves how the model understands prompts by introducing a 4× larger text connector.

This allows the model to better interpret:

Multiple characters or subjects
Spatial relationships within scenes
Complex stylistic instructions

Prompts that previously produced inconsistent results now resolve more accurately and predictably.

Better Image-to-Video Motion

Image-to-video generation has also been improved.

Compared to earlier versions, LTX-2.3 produces:

More natural motion
Better visual continuity from the input frame
Fewer frozen frames or artificial camera movements

This makes it much more reliable for animating still images into dynamic scenes.

Cleaner Audio Generation

Another major upgrade is improved audio generation quality.

With filtered training data and a new vocoder, LTX-2.3 generates audio with:

Fewer artifacts
Better synchronization with video
More stable speech generation

This improvement is particularly useful for talking avatars and audio-driven video workflows.

Native Portrait Video Generation

LTX-2.3 now supports native portrait video generation up to 1080 × 1920 resolution.

Unlike previous approaches that simply cropped landscape outputs, the model was trained on portrait-orientation data, allowing it to generate vertical video more naturally.

This makes it well suited for social media content and mobile-first video formats.

LTX-2.3 Model Versions

Similar to LTX-2, the new release includes two main model variants.

Dev Model (Maximum Quality)

The Dev model provides the highest generation quality and stability.

Typical settings include:

CFG around 4
20 diffusion steps

This configuration prioritizes fidelity and accuracy.

Distilled Model (Maximum Speed)

The distilled model is optimized for faster generation.

Typical settings:

CFG 1
8 diffusion steps

This version runs significantly faster while still producing strong visual results.

Distilled LoRA

Lightricks also provides a distilled LoRA for LTX-2.3.

Applying this LoRA to the Dev model allows it to run with:

CFG 1
8 steps

This effectively combines the quality of the Dev model with the speed of the distilled setup.

New Upscaling Options

LTX-2.3 also introduces additional upscaling models designed for multi-stage video generation workflows.

These include:

1.5× Spatial Upscaler
2× Spatial Upscaler
2× Temporal Upscaler

The spatial upscalers allow progressive resolution increases across multiple stages, while the temporal upscaler increases frame rate directly in the latent space.

This approach allows creators to scale videos to higher resolution or higher FPS without regenerating the entire sequence.

Unified LTX-2.3 ComfyUI Workflow

To demonstrate how these capabilities can be used in practice, the tutorial video below features a unified ComfyUI workflow designed specifically for LTX-2.3.

This workflow supports multiple generation modes within a single pipeline.

Supported Generation Modes

Text-to-Video (T2V)
Generate video directly from text prompts.
Image-to-Video (I2V)
Animate a starting image into a moving scene.
Custom Audio-to-Video with Lip Sync (A2V)
Use your own audio file to drive speech and lip-sync animation.

Flexible Model Support

The workflow also supports loading models in multiple formats:

Safetensors models
GGUF quantized models

This allows the workflow to run on a wide range of hardware configurations, from high-end GPUs to more limited systems.

Final Thoughts

LTX-2.3 represents a significant upgrade over LTX-2, bringing improvements across the entire AI video generation pipeline.

With better prompt adherence, sharper visual detail, improved motion, and cleaner audio, it offers a more reliable platform for building advanced video generation workflows.

When combined with a unified ComfyUI workflow supporting T2V, I2V, and audio-driven video generation, LTX-2.3 becomes an extremely flexible tool for creators exploring the next generation of AI video production.

If you want to see the workflow in action, be sure to check out the video tutorial above, where the full pipeline is explained step-by-step.

ComfyUI Workflow – Vantage-LTX2.3-Advanced-Workflow-GGUF-Support Download