Case 01

Metra AI — Production SaaS for content automation in Telegram

We built a turnkey SaaS platform with multi-agent LLM orchestration. From architecture to launch in 3 months.

Role: RTP Agency·Timeline: 3 months·Status: Live, with paying users·metra-ai.org →

Structure parserparses anatomy, preserves links

Content rewriterparagraph by paragraph, isolated calls

Stylingemoji and channel formatting

Rule validatorchannel-specific rules

The business problem

Telegram channel owners and content teams spend an enormous amount of time manually creating posts. Standard solutions either don't fit Telegram's specifics (Buffer, Hootsuite are tailored for Instagram/Twitter) or rely on raw ChatGPT output, which produces low-quality, generic content that requires lengthy manual editing.

The main pain point: content teams spend 60–80% of their time on production instead of strategy, and quality suffers because AI-generated content usually lacks brand voice, channel context, and real-time relevance.

What we built

A full-fledged SaaS platform that automates the entire content workflow in Telegram:

AI-powered post generation that preserves brand voice and channel lore
A scheduling system with reusable weekly presets
Real-time data integration for news content
A built-in Telegram CRM for working with leads without exposing the owner's account data
Multi-account infrastructure with multiple operators for agencies running many channels

Architecture: why multi-agent, not a single LLM call

The key technical innovation is a multi-stage LLM orchestration pipeline instead of single API calls. This is a deliberate architectural choice grounded in an important insight:

LLMs perform poorly when given too many simultaneous constraints. A single 3000-token prompt asking to "rewrite the post in voice X, with lore Y, in format Z, following rules A/B/C" produces unstable results, because attention is diluted across the requirements.

The solution: break post generation into specialized stages, each with one clear area of responsibility.

Standard posting pipeline

Structure parser — extracts the anatomy of the post and preserves links (which premium LLMs strip out by default)
Content rewriter — processes each paragraph with isolated calls, preserving structural integrity
Styling — adds emoji and formatting to match the channel's persona
Rule validator — applies channel-specific rules (for example, remove punctuation, limit length)

Extended pipeline (generation from scratch)

Archetype selector — chooses the post structure based on content type and length parameters
Block-by-block generator — writes each section with focused context
Applying style and formatting
Validation against auto-rules

This architecture eliminates the typical AI failures — mid-text hallucinations, structural drift, over-application of lore, format breakage — that single LLM calls produce.

Key technical findings

1. The prompt normalization layer

The image model rejected legitimate prompts too often — false positives on ordinary requests like generating people. Instead of switching to a more expensive model, we built a prompt normalization layer that rephrases harmless user input so it isn't blocked by mistake, while preserving the original meaning. This let us keep quality high without hosting more expensive alternatives.

2. Lore compression and translation

The channel lore provided by the user (often 3000+ tokens of unstructured text) is compressed and translated into the channel's posting language at upload time. The AI receives a structured summary in the required language instead of raw lore — this sharply increases content relevance and reduces token costs.

3. Lore as context, not a constraint

Through iteration we discovered that the AI over-applies lore when it's given as a direct instruction. We designed lore as soft context that influences but does not dominate generation — content comes out more natural.

4. Premium LLM selection strategy

We selected specific LLMs for specific tasks:

Fewer false refusals on legitimate edge cases
Better real-time news integration via a Perplexity-style API
Cost optimization at scale

Infrastructure

Several Ubuntu servers in production (main service, CRM, staging)
16 Docker containers with a clean separation of responsibilities
The backend as the single source of truth — all client requests go through the backend, never directly to AI providers or the database
Monitoring stack: Prometheus, Grafana, Sentry for errors, Uptime Kuma for service health
Encryption: all sensitive data (phone numbers, messages, passwords) is encrypted with proper salt+pepper and GPU-resistant hashing. Decryption keys are stored off-server.
Security: 2FA, JWT rotation, session fingerprinting, domain proxying

The result

3 mo

From development to launch

Docker containers in production

Week 1

First paying users

25/day

Auto-posts per channel on the paid plan

Launched into production within the 3-month development window. The multi-account CRM lets agencies manage operators without exposing channel owner data. Active early traction with a growing user base.

Technology stack

Backend	FastAPI · Python · Celery
Frontend	React · TypeScript · Next.js
Database	PostgreSQL · Redis
Infrastructure	Docker · Nginx · Multiple Ubuntu servers
Monitoring	Prometheus · Grafana · Sentry · Uptime Kuma
AI / LLM	Multi-provider stack (proprietary + open source)

What this demonstrates

The ability to design and build a turnkey production SaaS
A deep understanding of LLM constraints and how to work around them with architecture
Production-grade security and infrastructure
Pragmatic cost optimization at the architectural level
End-to-end product thinking: business problem → technical solution → deployment → operations

Similar challenge?

Tell us what you're building — we'd be glad to talk it through.

Let's talk →

← All case studies Next: Lipsync system →