Introducing Turtles
Turtles emerges from stealth, with $250M+ GMV, to build the largest interactive shopping platform

Shopping once felt like magic.
In 1969, Sears captivated millions with their iconic "Wish Book," the first hint that shopping could be delightful and immersive. QVC took that magic live, turning TV into a real-time treasure hunt. Then Amazon brought the world's shelves online, making spontaneous discovery just a click away. But the latest wave of platforms reduced shopping to cheap gimmicks and disposable goods. The magic faded — until now.
Today, we’re excited to introduce Turtles, the engine for interactive shopping. Emerging from stealth with $250M+ of GMV, Turtles is built for a new era: one where the line between shopping and entertainment disappears. Our infrastructure powers everything from
Today, we’re launching new speech-to-text and text-to-speech audio models in the API—making it possible to build more powerful, customizable, and intelligent voice agents that offer real value. Our latest speech-to-text models set a new state-of-the-art benchmark, outperforming existing solutions in accuracy and reliability—especially in challenging scenarios involving accents, noisy environments, and varying speech speeds. These improvements increase transcription reliability, making the models especially well-suited for use cases like customer call centers, meeting note transcription, and more.

The Problem
For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for example, “talk like a sympathetic customer service agent”—unlocking a new level of customization for voice agents. This enables a wide range of tailored applications, from more empathetic and dynamic customer service voices to expressive narration for creative storytelling experiences.
We launched our first audio model in 2022 and since then, we’ve committed to improving the intelligence, accuracy, and reliability of these models. With these new audio models, developers can build more accurate and robust speech-to-text systems and expressive, characterful text-to-speech voices—all within the API.
The 3 pillars
1 - Shopping
For the first time, developers can also instruct the text-to-speech model to speak in a specific way—for e