Abhimanyu "Chitra" Selvan is a Developer Relations Leader and an engineer with expertise across multiple high-tech domains. His experience spans programming digital cockpit systems for Boeing 787 simulators, controlling automated industrial robots, and engineering FDA-approved medical IoT systems. Over the last five years, he has delved into the cloud-native ecosystem, particularly kubernetes. Currently, he is tinkering with GPUs and helping developers build and break things at DigitalOcean.
When Abhimanyu isn't wrestling with his single monitor setup, he's busy outrunning his son in solo marathons or perfecting his "don't-fall-in" dance on a paddleboard at the local lake.
GenAI is no longer a new buzzword and we currently stand at the convergence of speech, vision and intelligence. The boundaries of human interactions with machines are blurring into a more seamless, interactive ecosystem. This talk will present an approach to building distributed, multi-modal AI agents at scale.
Our live demonstration will showcase:
- Voice capture: The Automatic Speech Recognition (ASR) service will transcribe the voice input into a precise text prompt
- Visual Generation: The text prompt will generate a contextually rich image
- Text Generation: The generated image will be analyzed, producing an ALT text description
We’ll explore how these agents can be run at scale, using a globally distributed system. We’ll share the thought process behind making these architectural decisions and what it can take to go from prototype to production in this space. We’ve also instrumented the entire system with distributed tracing to provide visibility into each stage of the pipeline.
Whether you’re curious about building AI agents or want to take your GenAI projects to production, this talk will show you what that journey can look like—end to end.
Searching for speaker images...