LTX Video 2

19B parameter video generation model with integrated audio, released January 5, 2026

📅 Last updated: February 1, 2026 💬 Source: ~44,500 Discord messages 📊 ~4,345 knowledge items

Chat with this knowledge base This content is also available in NotebookLM for interactive Q&A.

📖 Overview

LTX Video 2 is a 19B parameter video generation model released by Lightricks in partnership with NVIDIA on January 5, 2026. It features integrated audio generation, built-in upscaling, and supports text-to-video, image-to-video, and video-to-video generation.

Key Features

Integrated Audio: Generates spatially-aware audio that responds to visual content
Fast Generation: Near real-time with distilled model (~6 seconds for 121 frames at 720p on RTX 4090)
Built-in Upscaling: Spatial and temporal upscalers included
24 FPS Output: Higher frame rate than most competitors
Low VRAM Option: Can run on 8GB VRAM with RAM offloading

Model Variants

Model	Size	Notes
ltx-video-2-1 (fp8)	~27GB	Full model with VAE + audio, recommended for quality
ltx-video-2-1 (GGUF Q8)	~20GB	Quantized version, good balance of quality/size
Distilled LoRA	+384MB	8-step generation instead of 20, slight quality trade-off

🖥️ Hardware Requirements

Good news for low VRAM users LTX Video 2 can run on 8GB VRAM with 64GB+ system RAM using offloading. Use --reserve-vram 20 flag.

GPU	VRAM	Capability
RTX 5090	32GB	832x480, 241 frames with fp8 + distill LoRA
RTX 4090	24GB	720p, 121 frames (~6 sec with distilled)
RTX 3090	24GB	720p generation confirmed working
RTX 4070 Ti Super	16GB	Works with GGUF models + offloading
8GB cards	8GB	Possible with 64GB+ RAM, heavy offloading

RAM Requirements for Offloading

When using VRAM offloading, ensure you have sufficient system RAM. 64GB recommended for comfortable operation with 8GB VRAM GPUs.

— Kijai

⚙️ Recommended Settings

Parameter	Recommended	Notes
Resolution	1280x720 or higher	Must be divisible by 32. Below 720p tends to perform poorly.
Duration	≤10 seconds (official)	20s works for some users. Quality degrades at 30s+
Steps	20 (base) / 8 (distilled)	Use distilled LoRA for faster generation
Scheduler	Euler (default)	Euler_A better for anime/art content
Model precision	fp8	Preferred over GGUF for quality
Frame count formula	(8n)+1	VAE compresses 8 frames to 1 latent

Resolution Warning Resolutions must be divisible by 32. Portrait orientations don't work well and cause quality/motion issues.

🔬 Technical Discoveries

Audio Generation

Audio is spatially aware

Audio changes based on position - footsteps get louder as character approaches camera.

— Lodis

Audio continuation maintains voice consistency

Can take audio from video input and continue generating with same voice characteristics.

— harelcain

Multi-language support

Supports multiple languages including Hindi, Russian, and Chinese for audio generation.

— Govind Singh

Context-aware accents

Model generated Indian accent when Indian doctors appeared in video without being prompted for accent.

— Tachyon

Architecture & Performance

All-in-one model file

27GB fp8 model includes video (19B params), audio processing, and VAE all in one file.

— Ada

VAE compression ratio

VAE compresses 8 frames to 1 latent frame. 16 latents decode to 121 pixel frames.

— Dragonyte

Near real-time generation

121 frames at 720p generates in ~6 seconds on RTX 4090 with distilled model.

— Kijai

Depth information included

LTX Video 2 includes depth information in decoded latents.

— Kijai

I2V & Generation Modes

I2V works better at higher resolutions

1280x720 works well, below that tends to perform poorly.

— Tachyon

Multiple input modes supported

Text-to-video, image-to-video, video-to-video, and audio-to-video generation capabilities.

— l҈u҈c҈i҈f҈e҈r҈

Prompt strength vs LoRA

Camera movement prompts override static camera LoRA settings.

— burgstall

⚠️ Known Limitations

Portrait orientation issues

Portrait aspect ratios cause issues with generation quality and motion. Stick to landscape.

— Cubey

Duration limits

Out of distribution breakdown at 30 seconds, probably best to keep it to 20 seconds max.

— sometimesTwitchy

Complex motion breakdown

Model struggles with complex motion like gymnastics where anatomy breaks down during flips.

— dj47

Text generation is weak

Model struggles with proper looking text generation.

— harelcain

Limited anime training

The dataset is mainly cinematic landscape videos, wasn't trained on many anime.

— dj47

832x480 performs poorly

Most results had no motion, and all outputs didn't look very good at this resolution.

— Cubey

🔧 Troubleshooting

Common Errors & Fixes

Problem: ModuleNotFoundError for audio_vae

Solution: Install version ≥0.3.0 of ComfyUI-LTXVideo custom node.

Problem: Sampling errors during generation

Solution: Disable live preview in ComfyUI settings.

— Cubey

Problem: CUDA out of memory

Solution: Use GGUF Q8 model instead of fp8, enable RAM offloading with --reserve-vram flag, or reduce resolution/frame count.

Problem: No motion in output

Solution: Increase resolution to at least 720p. Lower resolutions like 832x480 often produce static results.

Problem: Audio not generating

Solution: Ensure you're using the full model file that includes audio VAE, not a video-only variant.

Quality Issues

Problem: Blurry output at 4K

Solution: Generate at 720p/1080p and use external upscaler. Native 4K still has blurriness issues.

Problem: Anatomy breakdown in complex motion

Solution: Avoid complex motions like flips/gymnastics. Use shorter clips and simpler movements.

Problem: Quality degradation at long durations

Solution: Keep clips under 20 seconds. For longer content, generate multiple clips and stitch together.

🔄 Workflows & Tips

Vid2vid with partial latent masking

Use latent masking to preserve specific regions while regenerating others. Useful for fixing artifacts or changing specific elements.

Use fp8 instead of GGUFs

fp8 models generally produce better quality than GGUF quantized versions when VRAM allows.

Temporal upscaler for smooth motion

Use built-in temporal latent upscaler to achieve effective double frame rate and reduce deformations.

— harelcain

Distilled LoRA for iteration speed

Use ltx-2-19b-distilled-lora-384.safetensors at 0.6 weight for 8-step generation when iterating on prompts.

— Ada

⚖️ Model Comparisons

LTX Video 2 vs Wan 2.2

Aspect	LTX Video 2	Wan 2.2
Speed	Faster (near real-time with distilled)	Slower
Audio	Built-in, spatially aware	No native audio
Dynamic motion	Much better in same duration	More conservative
Fidelity	Higher	Good
Control options	Limited currently	VACE, Fun models, more mature

Community consensus "LTX2 using full pagefile is still faster than wan2.2, has higher fidelity and better audio" — boop

🎓 Training & LoRAs

Training challenges The distilled model makes LoRA training difficult. Most community training efforts target the base model.

Available LoRAs

Distilled LoRA: ltx-2-19b-distilled-lora-384.safetensors - 8-step generation
Static Camera LoRA: Reduces camera movement (but can be overridden by prompts)

📖 Overview

Key Features

Model Variants

🖥️ Hardware Requirements

RAM Requirements for Offloading

⚙️ Recommended Settings

🔬 Technical Discoveries

Audio is spatially aware

Audio continuation maintains voice consistency

Multi-language support

Context-aware accents

All-in-one model file

VAE compression ratio

Near real-time generation

Depth information included

I2V works better at higher resolutions

Multiple input modes supported

Prompt strength vs LoRA

⚠️ Known Limitations

Portrait orientation issues

Duration limits

Complex motion breakdown

Text generation is weak

Limited anime training

832x480 performs poorly

🔧 Troubleshooting

Problem: ModuleNotFoundError for audio_vae

Problem: Sampling errors during generation

Problem: CUDA out of memory

Problem: No motion in output

Problem: Audio not generating

Problem: Blurry output at 4K

Problem: Anatomy breakdown in complex motion

Problem: Quality degradation at long durations

🔄 Workflows & Tips

Vid2vid with partial latent masking

Use fp8 instead of GGUFs

Temporal upscaler for smooth motion

Distilled LoRA for iteration speed

⚖️ Model Comparisons

LTX Video 2 vs Wan 2.2

🎓 Training & LoRAs

Available LoRAs

Training Tips

🔗 Resources

Official Links

ComfyUI Integration

Community Resources