§ 01 — Hero

On-device AI research

Small models. Serious speech.

SynthDream Labs builds compact AI models that run entirely on small edge devices. Our first release is a proprietary text-to-speech engine — 50 MB, CPU-enabled, indistinguishable from the best cloud TTS on the market.

See the technology For acquirers→

§ 02 — Product

Our first release

A text-to-speech engine that runs where your users do.

We built a proprietary neural text-to-speech engine from the ground up. No third-party base models. No cloud fallback. No API dependency.

The result is a 50 MB model that produces speech at the quality of leading cloud TTS providers — but runs instantly on a phone CPU, in a browser tab, or on an embedded device. Your users never wait for an API call, and their text never leaves the device.

§ 03 — Pillars

What makes it different

Four things you won’t find in a cloud TTS.

Fully proprietary models

Architecture, training data pipelines, inference runtime — all designed and owned in-house. The IP stack is fully proprietary.

Privacy by design

Audio and text never leave the device. No cloud round-trip, no logging, no dependency on external infrastructure. Aligned with GDPR, HIPAA, and enterprise privacy requirements by architecture — not by policy.

No API tokens, ever

Once installed, the model runs indefinitely on the user’s own hardware. No per-character billing. No inference bills that scale with usage. A one-time integration, no ongoing cloud cost.

Already shipping

Working builds for Android, Chromium-based browsers, and Firefox. Model size and inference speed have improved on every monthly cycle. Not a research demo — a deployment-ready engine.

§ 04 — Specs

Technical snapshot

The specs, without the marketing.

Model size

50 MB and still shrinking

Parameters

~25 M two orders of magnitude below industry average

Hardware

CPU only no GPU, no NPU, no accelerator required

Inference latency

Near-instant on consumer mobile CPUs

Languages

6 + expanding English, Czech, Slovak, Spanish, Ukrainian, Vietnamese

New language

~5 hours of audio is all we need to train a new language

New accent

< 1 hour of audio is all we need to train a model on a specialized regional accent

Expression

Emotional range across voices — calm, urgent, warm, assertive

Data leaving device

Zero

Recurring cost

Zero

True localization We don't just add languages — we deliver specialized regional accents that sound native to the people who actually live there.

§ 05 — Platforms

Platforms

Running today.

Three production-ready deployments. Each one runs the same core model, wrapped in a platform-specific runtime.

Android

Native mobile SDK running the engine directly on device CPU. Tested across budget and flagship Android hardware.

Chromium-based browsers

Browser extension for Chrome, Edge, Brave, Arc, and every Chromium fork. Runs fully offline — no permissions beyond the active tab.

Firefox

Native Firefox extension with the same on-device architecture. Runs as fast and efficiently as the Chromium counterpart.

Live demos available on request→

§ 06 — Demo

Hear it

Play a sample.

Same 50 MB model, three English accents. Switch between voices — each one runs locally, with no cloud call between tap and sound.

Voice 01 · EN-US · American English

American English

“The next meaningful frontier in applied AI is not bigger models — it is models small enough to run everywhere, at the quality bar set by the largest models today.”

§ 07 — Team

Team

The team behind the model.

A small group of researchers and engineers with deep background in self-supervised speech systems, voice conversion, and compact model design.

Ian Sosa

Co-Founder, Product Systems Lead

Physicist, AI developer, and entrepreneur. Valedictorian at the University of Buenos Aires and the Instituto Balseiro (BSc+MSc), plus an MSc in Statistical Physics & Chaos specializing in complex-systems modeling. Research background in AI-driven molecular simulations, developing multiscale methods that embed long-range quantum effects into large-scale atomistic models. As co-founder, architected low-latency inference pipelines, built on-prem GPU clusters, and developed data-efficient TTS and voice-conversion models for low-resource languages. Led multidisciplinary teams across data acquisition, model design, and production deployment — coordinating roadmap, hiring, MLOps, and operations end-to-end, and helping the company win the IB50K competition and launch commercial voices.

Julián Neñer

Co-Founder, Lead AI Engineer

Physicist and AI engineer from the Instituto Balseiro, with a master's in interdisciplinary/econophysics modeling capital systems with agent-based AI. At SynthDream, designs noise-robust, lightweight architectures for real-time voice conversion and low-resource text-to-speech — enabling strong performance with limited data and efficient inference at scale. Previously built AI pipelines for object detection in satellite imagery for an Argentinian government-funded project. Beyond modeling, leads the full software and MLOps stack (APIs, cloud deployment, evaluation frameworks), manages engineering squads, and coordinates systematic experimentation, data collection, and QA.

Francisco Videla

AI Engineer, Speech Systems

Contributes to model training, evaluation pipelines, and acoustic quality tuning across voices and accents. Focused on closing the gap between compact on-device models and state-of-the-art cloud baselines.

Franco Schenone

AI Engineer, Runtime & Tooling

Works on quantization, runtime optimization, and cross-platform deployment tooling. Maintains the build systems that keep Android, Chromium, and Firefox targets shipping from a single model source.

Selected prior work — Ian & Julián

Research foundations behind the model.

01 — Voice

Prosody-Preserving Voice Conversion

Speech-to-speech voice conversion systems designed to alter speaker identity while preserving the prosodic structure of the source utterance — rhythm, intonation, and speaking style.
Self-supervised speech representations (HuBERT) as disentangled content carriers, exploiting their reduced speaker sensitivity to separate linguistic content from identity-specific features prior to reconstruction.
Information-bottleneck architectures using speaker embedding networks to suppress unwanted speaker leakage from content representations, enforcing a clean separation between what is said and who is saying it.
Conversion quality evaluated across multiple axes — speaker similarity, prosody retention, and intelligibility — using both objective metrics and perceptual listening studies.

02 — Accent

Accent Conversion

Accent-alteration systems targeting the transfer of phonetic and prosodic patterns characteristic of a target accent while preserving the speaker's voice identity and content.
Multiple prosodic representation strategies explored — pitch and energy curves, discrete prosodic tokens, and learned latent spaces — to identify representations that transfer cleanly across accent domains without collapsing speaker-specific variation.
Speaker and accent embedding networks used as information bottlenecks, conditioning the conversion model on target-accent speaker embeddings to steer synthesis toward the desired phonetic and prosodic target without requiring parallel training data.
Architectural and training-time constraints applied to address entanglement between accent, prosody, and speaker identity — improving controllability and reducing unintended co-variation between conversion axes.

§ 08 — Research

Where we’re heading

TTS is the first model. It won’t be the last.

SynthDream Labs is a research company with a specific thesis: the next meaningful frontier in applied AI is not bigger models — it is models small enough to run everywhere, at the quality bar set by the largest models today.

TTS was the first proof.

§ 09 — Contact

Contact

Talk to the people who built it.

Live demos, technical deep-dives, and partnership conversations — handled directly by the founders, not a sales team.

Email the founders

SynthDream Labs Inc.
323 S 21st Ave, STE C
Hollywood, FL 33020
United States