More

akadeb · 2026-03-25T21:40:19 1774474819

Yeah I sell one here https://www.elatoai.com/products/ai-devkit If you pre-order on the link I can ship you one at a discount

akadeb · 2026-03-25T21:38:15 1774474695

Thanks Boris! The case is made out of Resin but I started with PLA. Will post the open-sourced STEP files there

akadeb · 2026-03-17T21:31:43 1773783103

I built an open-source, screen-free, storytelling toy for my nephew who uses a Yoto toy. My sister told me he talks to the stories sometimes and I thought it could be cool if he could actually talk to those characters in stories with AI models (STT, LLM, TTS) running locally on her Macbook and not send the conversation transcript to cloud models.

This is my voice AI stack:

- ESP32 on Arduino to interface with the Voice AI pipeline

- mlx-audio for STT (whisper) and TTS with streaming (`qwen3-tts` / `chatterbox-turbo`)

- mlx-vlm to use vision language models like Qwen3.5-9B and Mistral

- mlx-lm to use LLMs like Qwen3, Llama3.2, Gemma3

- Secure websockets to interface with a Macbook

This repo supports inference on Apple Silicon chips (M1/2/3/4/5) but I am planning to add Windows soon. Would love to hear your thoughts on the project.

akadeb · 2026-03-14T20:13:01 1773519181

Many parents are concerned about sending their children's chat transcripts to the cloud and privacy is often the first thing that comes up when we talk about AI toys.

So I built OpenToys so anyone with an ESP32 can create their own AI Toys that run inference locally, starting with Apple Silicon chips and keep their data from leaving their home network.

The repo currently supports voice cloning and multilingual conversations in 10 languages locally. The app is a Rust Tauri app with a Python sidecar with the voice pipeline. The stack uses Whisper for STT, any MLX LLMs, Qwen3-TTS and Chatterbox-Turbo for TTS.

akadeb · 2026-01-23T10:35:10 1769164510

You could pipe the output to an audio file with ffmpeg or pyaudio and save it locally

viraptor · 2026-01-23T12:43:02 1769172182

I don't want to save the audio. I want to save the voice model so I can use it for many different texts, for consistency.

stuckkeys · 2026-01-23T22:13:24 1769206404

Yes, you can. I was just testing it. I made a "My Custom Voices" tab, and recorded a small sample of my own voice or upload a sample of w/e voice. Then you can use it. I am in the process of training a model of my voice too to see how it handles it using the 1.7b

Works surprisingly good with a 4090. I will also try it on 5090. This is the best one I have seen so far. NGL. 11Labs is cooked lol.

akadeb · 2026-01-14T19:39:48 1768419588

https://www.akadeb.xyz

akadeb · 2025-06-18T11:24:17 1750245857

I would highly recommend gemini 2.5 pro too for their speech quality. It's priced lower and the quality is top notch on their API. I made an implementation here in case you're interested https://www.github.com/akdeb/ElatoAI but its on hardware so maybe not totally relevant

koakuma-chan · 2025-06-18T11:33:52 1750246432

I'm using LiveKit, and I indeed have tested Gemini, but it appears to be broken or at least incompatible with OpenAI. Not sure if this is a Livekit issue or a Gemini issue. Anyway I decided to go back to just using LLM, SST and TTS as separate nodes, but I've also been looking into Deepgram Voice Agent API, but LiveKit doesn't support it (yet?).

akadeb · 2025-05-30T01:53:45 1748570025

I like the sound of that! I think youre gonna like what we are building here https://github.com/akdeb/ElatoAI

Its as if the rubber duck was actually on the desk while youre programming and if we have an MCP that can get live access to code it could give you realtime advice.

akshay_trikha · 2025-05-30T02:40:56 1748572856

Wow, that's really cool thanks for open sourcing! I might dig into your MCP I've been meaning to learn how to do that.

I genuinely think this could be great for toys that kids grow up with i.e. the toy could adjust the way it talks depending on the kids age and remember key moments in their life - could be pretty magical for a kid

akadeb · 2025-04-25T11:20:02 1745580002

I understand, is it the realtime conversational aspect or just in general you wouldn't want a child to play with a TTS-like service?

akadeb · 2025-04-25T11:18:50 1745579930

Murphy's law