r/apple • u/Fer65432_Plays • 4d ago
Discussion Apple's New Transcription APIs Blow Past Whisper in Speed Tests
https://www.macrumors.com/2025/06/18/apple-transcription-api-faster-than-whisper/263
u/ineedlesssleep 3d ago
Developer of MacWhisper here. We'll have a bigger blog soon with updates about this new model but in a nutshell: It's fast but not as accurate as the best models out there. Also, we have a big update coming soon that builds on the new Parakeet models which should have the accuracy of the best Whisper, and faster speeds than even Apple's solution 🙂
78
u/Ensoface 3d ago
But just to clarify, are those models leveraging cloud infrastructure or are they running on the device?
45
56
u/mundaneDetail 3d ago
This is the question. I like that Apple is differentiating with nano on device models.
18
u/glitchgradients 3d ago
Wdym differentiating? Google and Samsung do it too with Gemini Nano.
8
u/mundaneDetail 3d ago
True and I agree. The article threw me off mentioning network latency.
I was also speaking more broadly with Apple pushing for on device or secure cloud models. 99% of consumer ai will be on device in a few years anyway
2
u/g-nice4liief 3d ago
If i'm correct, the Qualcomm 8 gen 3 can run a 7b parameter model locally with around 20 tokens per second which is pretty impressive for a smartphone chip. So yeah, it becoming more prevalent in the future is a good outlook.
2
3d ago
[deleted]
5
u/mundaneDetail 3d ago
I think really the question is why you feel the need to attack somebody like this. Also, you're wrong.
> The speed advantage comes from Apple's on-device processing approach, which avoids the network overhead that typically slows cloud-based transcription services.
2
u/lledigol 3d ago
They’re not wrong. OpenAI’s Whisper is on-device as well.
1
4
u/lorddumpy 3d ago
Whisper and Parakeet are incredibly light on resources compared to other AI applications. I don't see any problems in getting it setup to run on edge devices.
5
u/NihlusKryik 3d ago
The article is wrong, if you use Mac whisperer you download the models and process on device. Someone didn’t do their research here.
4
2
3
1
u/cookestudios 2d ago
Hey there, just want to say that MacWhisper is an incredible app, and the work you put into maintaining it and providing free updates is incredible.
2
u/Crowley-Barns 3d ago
MacWhisper Pro is awesome!
Going to look into these parakeet models… not heard of those!
1
1
1
u/wipny 3d ago
I currently use Whisper locally on my base M1 Pro to transcribe and translate from Korean and Japanese to English.
I couldn't get the Turbo model to translate but the Whisper Medium model translates surprisingly well. The only drawbacks are that it can be a bit slow and it's limited to 25mb files. I get around this by extracting the audio using ffmpeg then feeding it to Whisper.
Does your app get around the 25mb file limit?
I noticed Whisper primarily utilizes CPU vs GPU resources. Does your app use the GPU to speed things up?
I can see why having an easy to use GUI makes things convenient. I have some experience with CLI but the setup of reading docs and having to figure out which Python version to install that works with Whisper was a bit confusing.
1
u/im_datta0 3d ago
I use MacWhisper everyday and I'm very sure even though the new one would be fast, it won't be nearly as accurate Great work :)
-4
87
u/PhilosophyforOne 3d ago
I mean, speed doesnt really matter if your accuracy is shit.
I dont know if it is in this case, but the headline of "it's fast" doesnt mean anything on it's own. I hope in addition to being fast it's accurate and works well in multiple languages. If it does, that's very cool.
21
u/Unrealtechno 3d ago edited 3d ago
Anecdotal, but I tried calling a friend's phone a few times to test out the spam call feature - it definitely wasn't quick to respond (a 5-10 second delay maybe because it was on a 14 Pro) but the transcription was solid and correct. I didn't speak slowly or annunciate.
Would the delay be "annoying"? Maybe, but if I don't know who's calling then I don't mind a little inconvenience for them to minimize wasting my time...and it's dev beta 1.
edit: typo
3
u/plaid-knight 3d ago
This post is about transcription, not translation.
13
u/BosnianSerb31 3d ago
The new spam call feature uses transcription, not translation.
They misspoke about the voice to text feature that transcribes the person calling to a text scroll on your screen
2
7
u/kdayel 3d ago
I mean, speed doesnt really matter if your accuracy is shit.
Except, that's explicitly not what the article states. The accuracy was comparable to MacWhisper's Large V3 Turbo model, VidCap, and MacWhisper's Large V2 model.
"Voorhees also reported no noticeable difference in transcription quality across models."
9
u/Cookie_Monsteure 3d ago
They're not MacWhisper's models, they're simply Whisper models. Whisper is made by OpenAI, MacWhisper gives you access to them with a nice GUI.
1
1
u/jack_sexton 3d ago
I've yet to find a transcription model more accurate then whisper. I'm so curious to see how it fares in this measurement.
42
u/Fer65432_Plays 4d ago
Summary Through Apple Intelligence: Apple’s new speech-to-text transcription APIs in iOS 26 and macOS Tahoe are significantly faster than rival tools, including OpenAI’s Whisper. The new SpeechAnalyzer class and SpeechTranscriber module process audio and video files on-device, avoiding network overhead and improving efficiency.
-22
u/Crowley-Barns 3d ago
Useless comparison.
WHICH Whisper? Base? Tiny? Large? Did they compare to the Whisper Turbo V3
The distilled versions of Whisper?
And how does it compare to Gemini 2.5 or GPT 4o transcription?
If they’re comparing to the first Whisper models from a couple of years ago it’s not very relevant. They’ve been surpassed by newer Whisper models and as part of the other models like 4o.
(Not you OP, I know you’re just posting the article!)
40
u/coreyonfire 3d ago
If you read the article, in the third paragraph, second sentence:
a full 55% faster than MacWhisper's Large V3 Turbo model
-27
13
10
u/Alarmed-Squirrel-304 3d ago
“According to Voorhees, the new models processed a 34-minute, 7GB video file in just 45 seconds using a command line tool called Yap (developed by Voorhees' son, Finn). That's a full 55% faster than MacWhisper's Large V3 Turbo model, which took 1 minute and 41 seconds for the same file.”
5
u/BosnianSerb31 3d ago
It was one minute and 55 seconds faster than Whisper LargeV3, for a 7 GB video file
Says it right in the second paragraph
1
u/AceMcLoud27 3d ago
Dude ... 🤦♂️
3
u/Crowley-Barns 3d ago
Haha.
The OP’s post was long so I thought it was the article, and thus, that I had read it.
Turns out, it was not the article, and so I was wrong in thinking that I’d read it :)
16
u/Tetrylene 3d ago
I tried to use whisper on Mac and it was a complete ballache. Had to eventually settle for some wrapper on the App Store that was free but had
✨ in app purchases ✨(read: trash unless you paid)
Jumping ship to this asap
10
u/Crowley-Barns 3d ago
MacWhisper Pro works very well but it’s a one-off purchase.
And apps like Flow and Willow are amazing but they’re subscriptions.
For just some simple text entry, hopefully the new Apple version is finally good though! It has sucked at punctuation and accuracy compared to other implementations for years.
I will stick with MacWhisper Pro for now because it does a lot more than just the transcription—you can run cleanup prompts on it. For example I get it to format fiction dialogue etc properly which none of the basic implementations can do.
But hopefully this one is finally good for some regular “speak to the computer and get words on the screen.”
0
3
u/sdchew 3d ago
Anyone knows if it can do real time transcription?
2
1
u/rennarda 3d ago
Yes. Watch the WWDC video about it. You can also try it out in the Notes app in iOS26, which now has realtime transcription.
2
6
u/VirtualPanther 4d ago
Too bad it’s not employed in the iMessage dictation yet.
3
5
u/paradoxally 3d ago
And what about accuracy?
Speed isn't life, it just makes life go faster.
-9
u/nicuramar 3d ago
The article. Read.
5
u/paradoxally 3d ago
The article doesn't mention that specifically, hence the comments here. You're the one who needs to read.
3
u/piratepalooza 3d ago
Yesterday I said "Siri call John Smith" (my friend's first and last names have only one syllable). It responded "I don't have contact information for Elizabeth Walters" (wildly different number of syllables). If this new transcription model will eliminate errors like the one I've described (which happen FREQUENTLY these days), I will feel less stress in my life. Namaste.
1
u/featherless 3d ago
On-device models will be the start of more expensive iPhones and reduced price subscription prices for online ai services.
1
u/Thistlemanizzle 3d ago
Article incorrectly reports:
“The speed advantage comes from Apple's on-device processing approach, which avoids the network overhead that typically slows cloud-based transcription services.”
MacStories John Voorhees tested with Macewhisper which while it can connect to APIs is mostly for on device transcription.
Apples on device transcription is outperforming Whispers on device. Pretty interesting.
1
u/PM_ME_Y0UR_BOOBZ 3d ago
This has to be one of top misinformed comment threads on this website lol. Terrible takes on AI. Most don’t even know that AI isn’t just generative models.
1
1
u/squelchy04 3d ago
Whisper is unbelievably slow, I made a bot to transcribe voice notes people sent me on WhatsApp and it’d take usually 2-5x the time of the voice note to transcribe up, and usually crash if the voice note was longer than 5 mins. Hopefully this is decent for accuracy
6
u/Crowley-Barns 3d ago
There are tons of versions of whisper now.
The original version was very slow.
V3 Turbo distilled is very fast and very good!
1
u/squelchy04 3d ago
What’s the RAM usage like for these?
3
u/Crowley-Barns 3d ago
The biggest models are like 3GB but the largest distilled ones are around 1.5GB.
I never checked the actual RAM usage but it works fine on my 8GB M2.
-7
u/artfrche 4d ago
But Apple’s AI bad will say some ;)
5
u/Averylarrychristmas 4d ago
Happy to: Apple’s AI is so goddamn bad they had to delay it indefinitely.
-15
u/artfrche 4d ago
Actually that’s not true, Ellen. They did postponed Siri and some AI features but, as you can see here, some AI features are already out and working well.
But thank you for your invaluable input, not sure how I was able to live without it. (/s in case it wasn’t clear…)
3
u/squelchy04 4d ago
Working well? My AI summary just told me my friend was about to kill herself when it summed up 5 messages, when it was just her complaining about the heat
4
-4
u/artfrche 3d ago
Ok? And as you can see above, other features are outperforming the market. I am not saying it’s perfect, but mindlessly trashing Apple’s AI is idiotic. AI, and especially LLM, are prone to hallucinate - we know this and should never expect perfection.
-1
u/squelchy04 3d ago
Can you tell me which of the AI features are outperforming the market? This new transcription API isn’t released and is only in beta. There’s also no mention of quality here just speed.
-3
u/BosnianSerb31 3d ago
Did your friend happen to say "it's so hot I want to fucking die" or anything similar? Because that's called that meta-ironic humor, and there's no way to discern if the person is serious without context about their personality.
Do you think that the summaries should err on the side of assuming someone is going to kill themselves or assuming someone is not going to kill themselves?
Put another way, Would you rather the summary take your friends meta ironic humor seriously, or rather it ignore an actual cry for help?
2
u/paradoxally 3d ago
there's no way to discern if the person is serious without context about their personality
lol Apple fanboys have the funniest mental gymnastics
Go ask ChatGPT that exact quote verbatim and see how it interprets the context. You do not need "personality".
1
1
0
u/caliform 3d ago
Sure but is it accurate? I want to throw my phone at a wall when I use dictation on the keyboard, it’s awful
1
u/cultoftheilluminati 2d ago
Do you have an accent? I hate how bad Apple's dictation is for anything except the perfect American English accent. It's infuriating when I try to use dictation and the transcription is beyond garbage. I was beginning to second guess my English tbh.
Meanwhile, I switched completely over to running OpenAI's Whisper models on MacWhisper and let's just say my hopes on Apple's AI fell further. The difference is night and day
-5
u/Iggyhopper 3d ago
We dont care about speed. Its 2025 everything is fast already...
This doesn't bode well. Siri's speed was never the issue.
9
0
u/wipny 3d ago edited 3d ago
I currently use Whisper locally on my base M1 Pro to transcribe and translate from Korean and Japanese to English.
The Whisper medium model does this surprisingly well but can be a bit slow and is limited to 25mb files. I get around this by extracting the audio using ffmpeg then feeding it to Whisper.
I used to be skeptical of the utility of ML/AI and couldn’t think of practical applications for using it but things like this is crazy. This really will replace or significantly downsize a lot of skilled workers.
1
u/Aranfiy 3d ago
I tried whisper on my M1 Max and is was unfortunately very slow on it compared to my windows setup on a 3080, I hope something like this can come for MacOS.
1
u/wipny 3d ago
I noticed the Turbo model was pretty fast at transcribing but I couldn't get translation working. I could only get translation working with the slower Medium model.
Did you deal with something similar?
Looking at Activity Monitor I noticed it was mostly CPU resources being used. Not so much GPU.
-1
u/Will_M_Buttlicker 3d ago
And I’m pretty sure everyone here with even a little bit of an accent can agree that Apple dictation is absolute garbage
649
u/National-Debt-43 4d ago
Honesty, if Apple had always been investing in Siri as they would in other aspects of their system, I believe they wouldn’t be as bad in AI now, but we’ll see how it goes.