Hacker Newsnew | past | comments | ask | show | jobs | submit | cootsnuck's commentslogin

I work specifically in voice AI and am very familiar with how these tools and systems work.

> I would expect an "AI Note Taker" to faithfully transcribe the entire conversation. With the same quality I see in a lot of automated video subtitles.. ie they use the wrong word a lot but it's easy to tell what they mean by context.

That's a reasonable expectation, but would not be a safe one. All transcription tools are not made the same. First it depends on what kind of STT/ASR (speech-to-text / automatic speech recognition) model they are using. A lot of tools like to use some flavor of OpenAI's Whisper model. It works well generally but I would never use it in a critical use case like healthcare. Because it can hallucinate. That's specific to its architecture and how it was trained.

There's a fairly large variety of architectures that can be used for STT/ASR. Some of them are designed for "offline" / "batch" / pre-recorded audio. Some are designed for fast real-time streaming transcription.

There are more factors too like training data. And not just demographics of the speakers in the training data but audio environments too. Was the model trained on echo-y doctor offices with two people being recorded from a crappy smartphone mic or desktop mic? (It could've been! But it's an important distinction.)

And there's more factors than that, but you get the picture (e.g. are they trying to "clean up" the transcript afterwards by feeding it to an LLM, are they attempting to pre-process audio before transcription also in an attempt to boost accuracy)

There's a lot of ways to do it, meaning, there's a lot of ways to screw it up.


Yup, spot on. There's a capability-reliability gap that the industry does not like to talk about too much.

It often feels like the AI industry is continually glossing over the fact that capability and reliability are fundamentally different qualities. We tend to use "accurate" and "reliable" interchangeably, but they describe different things. A model can ace a benchmark (capability/accuracy) and still be a liability in production (reliability).

Just look at recent reactions to yet another release from METR showing improved capabilities. But the less talked about part is how their measure is for a 50% success rate (and the even lesser talked about secondary measure they have at 80% success rate has a drastically lower time-horizon for tasks). https://metr.org/

I implement AI systems for enterprises and I don't know any that would ever be okay with 80% reliability (let alone 50%).


This capability-reliability gap (excellent term btw, more people need to think in these terms or we'll be in real trouble) is also infecting LLM assisted outputs. I just tried VSCode again tonight after a ~3yr hiatus and goddamn has it deteriorated. Lots of new features, lots of interesting looking plugins, but 3 out of the 5 plugins I tried for code CAD (the reason I downloaded VSCode again at all) were completely unusable--like couldn't even be made to work at all--and the other two didn't do anything like what they claimed. Also VSCode itself got into some kind of spastic loop trying to log me into github, and seemed incapable of recognizing the virtual environment in a python project's workspace... It also feels like the UI got even slower. This situation is bad.

Not my term! Some real academics came up with it: https://www.normaltech.ai/p/new-paper-towards-a-science-of-a...

Interesting article, thanks for the link.

They absolutely need deterministic tools. What you just described is exactly how the current popular AI agents work. They use "harnesses", which to me is just a rebranding of what we have known all along about building useful and reliable software...composable orchestrated systems with a variety of different pieces selected based on their capabilities and constraints being glued together for specific outcomes.

It just feels like for some reason this is all being relearned with LLMs. I guess shortcuts have always been tempting. And the idea of a "digital panacea" is too hard to resist.


Yes! I was just about to comment the same thing. I sank so many hours into that Dragon Ball Z game. Was called Dragon Ball Z Tournament. And its background music was an instrumental version of Sisqo's Thong Song. Wild.

https://www.youtube.com/watch?v=Cdaf8ehjuX4


I think "AI-tone" is a much better way to characterize this stuff than accusing people of using AI. The problem has always been the same. Putting out slop feels disrespectful to the people you want to read/watch your stuff.

Makes me think of how pre-chatGPT I still could barely handle most recipe blogs because of their well known attempts at "filling space". And yea the problem is significantly magnified now everywhere else.

Anyway, my point is, whether or not someone uses AI is almost secondary in a way (even though it can seem pretty obvious to most of us when it's being used). All that matters is if the writing seems like it cares more about throwing words at people instead of actually conveying its points in a way to elicit understanding.


Their comment does not give off "LLM-written" really... It drives forwards actual points without superfluous segments. I don't think it's helpful to try and discredit people whenever we want by throwing around accusations of "LLM-written".


Sam, we're not going to use your weird eye scanning orb.


Because you can't have a tech company offering third party identity verification solutions if you just go with something like an RTA header.


About the friction, not the capabilities...I haven't switched off my biz calendar/appointment provider I'm paying for even though I've kinda outgrown it.

I wouldn't under estimate switching friction.


How much does your friction avoidance cost, if you don't mind my asking?


Handy has Windows support. https://handy.computer/


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: