Case 02

Lipsync system — 99%+ cost reduction versus premium video AI

We replaced premium video AI ($3–5/min) with an open-source ComfyUI workflow. Same quality, costs measured in cents.

Role: RTP Agency·Timeline: 6+ months in production·Status: 3+ commercial deployments
Premium video AI API$3–5 / min
Custom ComfyUI workflowcents / min
99%+ on costs

The business problem

A motion-design agency producing advertising creatives needed lipsync video generation at scale. They were paying a premium for a leading proprietary API — roughly $0.05–0.08 per second of video, which added up to:

  • $3–5 per minute of generated video
  • Dozens of dollars per finished creative
  • Unsustainable economics at their order volume

Beyond cost, they were hitting API limits, a quality ceiling, and a lack of customization that constrained their creative work. They needed a solution that was significantly cheaper, free of dependence on external APIs, and adjustable to their tasks.

Our approach

Most teams would have either accepted the premium pricing or tried to build their own model. We took a third path: we built production infrastructure around the best open-source AI models with cost-optimized GPU orchestration.

After evaluating the available options, we chose Infinity Talk (built on Wan 2.1) as the foundation for lipsync. The key reasons:

  • There was no comparable open-source alternative at the time
  • The ComfyUI architecture allowed deep customization through workflow modifications
  • Quality was on par with the premium API on the agency's tasks — and surpassed it in some scenarios
  • It could be self-hosted, completely removing dependence on the API

Production architecture

The difficulty wasn't running the model, but bringing it to production grade.

We built a containerized deployment infrastructure that provides:

  • A Telegram bot interface (via a local Bot API server for large media files beyond Telegram's standard limits)
  • Workflow orchestration for ComfyUI pipelines
  • Heavy file handling (large input and output videos)
  • Polling and webhook integration with GPU providers
  • A Docker template we reuse across similar projects — drop in the config, deploy, ready in minutes

The infrastructure design is modular and repeatable — since then we've used the same Docker template foundation to deploy similar AI pipelines for other clients with minimal changes.

Cost engineering

This is where the economics get interesting.

Initial premium API costs (their previous solution)

  • $3–5 per minute of video
  • Dozens of dollars per finished creative
  • Constraints from API limits

Our first implementation (self-hosted GPU on VAST AI)

  • $2/hour to rent an H200 GPU
  • Batch processing: dozens of videos per hour on a single GPU instance
  • Cost per video: cents instead of dollars

Current optimized version (RunningHub)

  • $15/mo fixed subscription for the client (50K tokens + access to premium GPUs)
  • Effectively unlimited generation within practical use
  • Cost per video in tokens: ~200 tokens (negligible at this volume)
Net cost reduction: 99%+ compared with premium API pricing at their volume.

The optimization journey itself illustrates a key consulting principle: continuous iteration on infrastructure choice. At first VAST AI was the right answer, but when their pricing changed and better alternatives appeared, switching to RunningHub delivered another leap in economics.

Photo-to-video versus video-to-video

We implemented both modes with a deliberate split by use case:

  • Photo-to-video — faster generation, fewer hallucinations, often higher quality. The default for most tasks.
  • Video-to-video — needed by specific clients with long-form content (5–10 minute workflows). This mode was initially broken in the available implementations; we debugged it and got it working, which became a key differentiator.

A working V2V wasn't available from anyone in the open-source community at the time, and the next client found us directly through a technical article we published on the Infinity Talk implementation.

Recognition and knowledge sharing

We published a detailed technical breakdown of the Infinity Talk implementation that earned editorial recognition and a strong community response. It became the main reference for people getting into this topic and led to direct client acquisition.

The result

99%+
Cost reduction versus the proprietary API
6+ mo
Continuous operation in production
3
Paid commercial deployments
$15/mo
Current infrastructure cost

For the original client: the same volume of lipsync video at a fraction of the previous cost. No API limits. A customizable workflow tailored to specific creative tasks. 6+ months of continuous operation in production.

Broader commercial impact: 3 paid deployments across different clients with different needs. Each customized through workflow modifications (V2V in some, I2V in others). The infrastructure foundation reused across several AI projects.

Technology stack

AI modelsInfinity Talk (built on Wan 2.1)
Workflow engineComfyUI
GPU computeVAST AI · RunningHub
InterfaceTelegram Bot API (local server)
InfrastructureDocker · orchestration in Python

What this demonstrates

  • Production-grade open-source AI expertise — not experiments, but commercial deployments
  • Cost-optimization thinking — understanding when API services make sense and when self-hosted/alternative providers deliver huge savings
  • Production-infrastructure thinking — reusable Docker templates, correct file handling, messenger integration
  • Continuous improvement — willingness to switch infrastructure providers when the economics or capabilities change
  • Expert content — knowledge sharing generates inbound leads

Similar challenge?

Tell us what you're building — we'd be glad to talk it through.

Let's talk