I'm a healthcare CIO of 12 years, and I've evaluated 4 and deployed 2 of these tools, one of which is currently deployed at my currently healthcare employer. I am very measured on AI but the results I've seen from these virtual scribes is HUGE. In every case we have IMMEDIATELY seen improvements in patient NPS scores, provider satisfaction, and note quality. Notes are more standardized as well as more verbose and detailed, which makes it easier for future providers to understand the case. These better notes reduce our claim rejection rate.
And what converted me was direct patient response. Across the board patient feedback is extremely positive, with the most common comment being along the lines of "I really felt like the doctor connected with me better and they were more present in the visit."
These AI scribes really DO improve patient care, I've seen it with my own eyes.
=> the error rate was 7.4% in the version generated by speech recognition software, 0.4% after transcriptionist review, and 0.3% in the final version signed by physicians. Among the errors at each stage, 15.8%, 26.9%, and 25.9% involved clinical information, and 5.7%, 8.9%, and 6.4% were clinically significant, respectively.
=> Omissions dominated error counts (83.8%, p<<0.001), with CAISs varying widely in error frequency and severity, and a median of 1–6 omissions per consultation (depending on CAIS). Although less frequent, hallucinations and factual inaccuracies were more often clinically serious. No tested CAIS produced error-free summaries.
On the gripping hand, people who work in the management end of the US healthcare industry can't be trusted with healthcare or information security to begin with.
My dad likes to joke around and his doctor uses some kind of transcription service. Time for fun!
His doctor asked him about using drugs and he made a joke that was something like "I only use coke" - meaning coca-cola. Of course his doctor knew he was kidding about drinking too much soda because he eats/drinks too much sugar. So they had a little laugh and moved on.
BUT now it's in his medical transcripts. My mom said it "transcribed" it as something like "the patient responded he has used cocaine recently".
I guess his doctor doesn't go in and actually fix things or even read over what the transcription says...
Also both of my parents have accents and have reported really weird transcriptions that don't match what they actually said.
So now my mom has told my dad he can't make jokes with the doctor anymore because even if the doctor knows he's joking it's going to get noted down as a "fact".
This feels like a compelling reason to joke around more.
If inaccuracies make it to your patient record, it's defamatory. Your doctor must sign off on the transcript and if they're letting through poor results, make it their problem to fix. That'll either force the tech to get better or to fall back on better note taking practices.
my father has cardiac issues, serious ones. When a doctor asks what he wants to do he routinely says "Sail around the world, solo!" because that's about the stupidest most risky thing a person with a bad heart could consider.
So now every single doctor reads the transcript and starts with saying "I think it'd be really poorly advised for you to keep considering your worldwide solo voyage."
AI summarization doesn't carry the tone well. Most any but the most serious humans would catch the way he's saying it as a joke.
Imagine if his health insurance premiums got raised because of it, if he loses a job opportunity due to background checks or if he gets arrested because of it. Even going through customs or getting a visa can be tricky with a history of cocaine on your record.
Errors can be a significant problem in manual charting as well.
I know a medical professional who does a similar evaluation process to what is outlined in your second link to human written charts. They then use that feedback to guide the department on how to improve their charting.
So, don't presume that those error rates cited in those studies should be compared to a baseline rate of zero. If you review human-written charts, you will often also not have an error rate of zero.
It’s been a year or so since I last read The Mote In Gods Eye/The Gripping Hand but I randomly was thinking of this morning. Very funny that I would see a reference to it the same day.
Be careful with initial impressions of metrics. We as humans have a heavy tenancy to anchor to our first judgments or impression. We see a win and assume the win is long term, with no downsides, and dependent on the new information/change.
So combine that with the Hawthorne effect and new business or health initiatives that can look great simply because participants notice change and notice the increased attention. However many human patterns have a tendency to regress to the mean.
Personally I have seen this a lot with developer tools and DevOps. A new SEV/incident/disaster happens and everyone rushes to create or onboard to a tool that would help. Around the office everyone raves about it and is sure that it would fix all issues. And the number of commits goes up, or the number of SEV's in an area decreases for a while. People were paying attention, after a while the tool starts to slow down or not be as used. It's got rough edges that weren't seen or scenarios that were supposed to be supported never get fully integrated. Eventually the patterns regress, but with more tools and more complexity.
This feels wild to me. I think I am pretty well privacy obsessed, but I don't see it here (fwiw, my wonderful doctor has been using these services for years; originally with overseas human labor, now with AI). First off it presupposes some level of privacy with one's GP that I would only want from a therapist. I don't want health information going beyond my doctor? What about him talking to specialists or getting another opinion in the break room?
Ship's sailed on that level of privacy anyway the second you bill an insurance carrier in the US. I am willing to take this particular risk if something I said two years ago pops up to help explain what I am currently experiencing. I understand not everyone is me and I am lucky to be in relatively good health and not have anything going on that might put employment, etc at risk so I can understand where some people may want to refuse. But the knee-jerk "FUCK NO BECAUSE PRIVACY" is almost as bad as writing a post based on a side plot in The Pitt when said side plot was 110% heightening the stress between Dr. Robby and Dr. Al Hashimi, not a goddamn double-blind study of the effectiveness of AI transcripto-bots.
And if you're going to take lessons from The Pitt about medical record transcription, why isn't it Dr. Santos repeatedly falling asleep while transcribing records?
For now. It always begins as voluntary. But then doctors will start to treat people who opt out the way TSA treats me when I opt out: a hostile adversary.
I already get glares and sighs when I dare to actually read every word of a multipage form I am expected to sign without reading. Was told once I would lose my appointment if I took longer than a few minutes to read more than 10 pages because I could not be checked in until I signed. Other patients are waiting, your exercise of your human rights is inefficient.
Then soon I'll have to pay a higher copay to opt out. Then I won't be able to opt out at all.
All in the name of optimizing patient NPS scores and patient throughput.
>For now. It always begins as voluntary. But then doctors will start to treat people who opt out the way TSA treats me when I opt out: a hostile adversary.
I've never had this problem. IME every doctors office recommends showing up 15-20 minutes early to a new-patient appointment for the explicit reason of filling out paperwork.
> Was told once I would lose my appointment if I took longer than a few minutes to read more than 10 pages.
I'd be finding a new doctor at that point. Ridiculous. I love it how doctors can be 30 minutes late for their appointments because they're running late and all their appointment delays are cascading, but if the patient reads a document for 5 minutes, they're the problem!
You can do that by recording and transcribing (many methods) or your doctor has to write on the fly, or worse, has their head in their computer while you talk in their general direction.
Letting doctors talk and examine and not write is a wholly better experience.
Offsite third parties are the problem here. If this was done automatically without data leaving the room, is there a problem? Do you have the same objections to how your digital notes are stored?
I got an erroneous Type II diabetes diagnosis dropped into the note by the AI scribe at my last appointment because my PCP discussed the A1C test he was ordering. Would not recommend. That isn't to say that manually typed notes or speech to text dictated notes are perfect (dot phrases have ended up "documenting" plenty of conversations that never happened), but a false diagnosis of a chronic disease seems like a really bad failure.
I’ve been in tech and medicine too. Consider that any “HUGE” effect in this context is likely exaggerated, especially for something as prosaic as a note-taking assistant.
As a patient sitting with a doctor, I don’t care how standardized the notes are. I don’t care about anyone’s NPS score. I do want the doctor to connect with me, but I also remember not too long ago when doctors did this anyway, without any assistance from robots.
I also remember not too long ago when doctors did this anyway, without any assistance from robots.
Or with assistance from other humans.
The last time I had surgery, every time I met with the surgeon (about six times), he had an intern following him around with a Thinkpad, typing in everything said.
The intern has the ability to understand context, idiomatic expressions, emotion, and a dozen other important and useful things that an AI transcription will never capture.
That’s probably not an intern. Doctors with enough pull can get dedicated scribes like this, but they aren’t cheap, which is why most doctors don’t get them.
> improvements in patient NPS scores, provider satisfaction, and note quality
How are note quality improvements measured? Vibe-notes might be more verbose and better sounding (which would explain the NPS and satisfaction metrics), but still not actually match the doctor's actual words or intent. Are the AI-generated notes actually compared with ground truth to prove they are accurate?
Yep I would agree as a patient. My current doctor types so slow that 6 out of the 10 short minutes in an appointment just disappear while he types. Even with other docs who can touch type, it will free them up to focus completely on the appointment and reduce the hours they spend charting afterwards.
In an article critiquing over-use of AI assistants, the author confesses at the end this article itself was authored partly by Claude that introduced errors in the citations, lol.
Nonetheless, I come away from this article with the sense the ambient devices automating documentation of an encounter are still a net win, with caveats about the need for the doctor to polish the note ti reflect his or her own narrative voice.
"I am not saying ambient scribes are bad technology."
is this a counterpoint? he just seems to be wary of the risk, without a firm position and decided to personally stop using it. people often overestimate their own skills and think their own charting is better than that of others, that doesn't mean the tech doesn't work.
1) in the event you find yourself partially or totally disabled but the records don’t really make a good case for it and your provider has a dismissive attitude about filling out additional documentation to substantiate what they failed to in your records.
You’re not necessarily going to get approved for FMLA, STD, LTD, SS etc based on a diagnosis or test results alone. They will nitpick over say, heart failure, as if that’s magically and spontaneously going to go away. If you’re telling your provider that you’re limited by things like oh I don’t know, “I’m only awake for 2-4 hours before I need to sleep again” or “some days I just can’t do it and sleep 20 hours” but it’s not in your chart… expect denials and clarifications and a huge burden on you to prove why it’s limiting.
2) continuity of care, so you don’t end up explaining everything from the top to a specialist or having them run all these tests and procedures from square one — when there’s months long backlogs , and we already did all this and you need treatment - but - there wasn’t much to work with in your referring chart.
You might not appreciate the “intrusion” if you’re healthy and just worried about your privacy.
If/When things go south and you find yourself fighting these entities for a year or two or three while they nitpick and delay and deny and drag their feet , you’ll be glad an “AI” kept up meticulous records because this is phenomenally stressful and an endless burden on you when they don’t.
So, their AI slop can vomit out all this extra info on why insurance companies should pay them or why your condition is in fact disabling, and now their AI slop can comb through it looking for all that. Because they will try to avoid paying or approving any kind of leave or benefits if it’s not there
And god forbid you hand them a form where they’re being asked to explain themselves. 50/50 on them being eager to help out or rolling their eyes and saying something really nasty about the imposition. And then even when they do that, they almost never file a copy in your chart so your chart STILL doesn’t substantiate your claims. I’m all for an “ai” doing the progress notes in a case where the facility or provider can’t be fucked to do so.
Happily that’s not true of my current provider, who just, does that anyway (?) But I’ve been around enough to know they’re an exception. Even when providers are on your side and mean well, and want to bend over backwards to help you in any way they can — and I want to just acknowledge that’s the situation I’m in today — honestly , sometimes they just forget some of the details when they do their notes.
That’s why some places make the provider do it in real time while they’re talking to you, so they didn’t forget something relevant thirty minutes later. The other side of the coin here may be that some providers find that distracting or off putting to be typing away like a stenographer while they’re examining you…
I think it would be fair to say this can all be tedious and a burden for both patients and providers. There’s just a world of difference between a provider who wants to do this to provide excellency in care, and a provider who wants to do this because they resent it and think it’s beneath them.
How do you control for quality variation between patients? In my experience, AI note taking tools display a clear bias against participants who are {quieter, ESL, women, ...}. How can you evaluate whether these biases show up in a medical setting?
Good, I'm glad. Now find a way to do it in-house. Shipping our conversation to some random-ass fly-by-night SaaS who pinky-swear-promises they're HIPAA-compliant is a non-starter for a medical professional I'd actually want to give money to.
You had it 20 years ago: doctors spoke into recorders, transcriptionists turned that into notes, the docs reviewed them.
The first study I cited replaces the "spoke into recorders" stage with non-AI voice recognition.
The second study replaces the "spoke into recorders" stage with LLM voice recognition, and... crucially... also replaces the educated transcriptionist step with nothing.
I imagine that the real problem is that the voice recognition can be classic or LLM and it just doesn't matter as much as having two humans in the loop instead of one. But that's not a story which gets you to replace cheap voicerec with expensive AI.
If I allow it, is the data from my meeting sent offsite at any stage, for example to an LLM service (e.g., Anthropic, OpenAI, etc.)? Or do the LLM vendors (or any others) have access to the internal data at any stage?
I can understand that, but when you have a dozen friends in these companies talking about how overstaffed they are and the whole "FAANG companies keep devs on payroll just so they can file 5k people to make investors happy" thing makes sense. None of these companies actually pause hiring when these layoffs happen, which is a major indicator of what's driving the layoffs. It's all artificial.
> Interesting that there's separate inference and training focused hardware. Do companies using NV hardware also use different hardware for each task or is their compute more fungible?
Dedicated hardware will usually be faster, which is why as certain things mature, they go from being complicated and expensive to being cheap and plentiful in $1 chips. This tells me Google has a much better grasp on their stack than people building on NVidia, because Google owns everything from the keyboard to the silicon. They've iterated so much they understand how to separate out different functions that compete with each other for resources.
I used to know a guy 15 years ago whose whole career was this, except he had real female friends he'd pay for videos and photos (safe for work, no nudity) to bump up the realism. It's shockingly easy. I wonder how he's doing in this AI age.
> Most people I know only get new phones because their battery will no longer get them through the day.
Most people I know get a new phone when they can't take the cracked screen anymore, or when they completely lose the phone. Or because a pretty new one came out and they upgraded two years ago so it's "time". That's most people.
And what converted me was direct patient response. Across the board patient feedback is extremely positive, with the most common comment being along the lines of "I really felt like the doctor connected with me better and they were more present in the visit."
These AI scribes really DO improve patient care, I've seen it with my own eyes.
reply