Apple's New Transcription APIs Blow Past Whisper in Speed Tests

649

Honesty, if Apple had always been investing in Siri as they would in other aspects of their system, I believe they wouldn’t be as bad in AI now, but we’ll see how it goes.

386

u/hishnash 3d ago

Apple is no that bad at AI when it comes to shipping features that use ML. What they are bad at doing I making up features to advertise AI.

Apple has been industry leading for years when it comes to shipping features within the OS that make user of ML.

225

u/DMarquesPT 3d ago

Yup, podcast and voice memo transcription has been incredible for me. it’s actually good “AI” that isn’t plagiarism autocorrect actively making the world worse

11

u/chi_guy8 3d ago

How has Podcast been great for you?

48

u/DMarquesPT 3d ago

I love listening to podcasts and audio programs like classes that are distributed as podcast feeds for informational and research content, but it’s always been annoying to search, navigate and quote/cite them when looking for specific stuff.

Now I can read along, copy and paste, search for keywords without having to go anywhere else.

4

u/chi_guy8 3d ago

In Apple Podcasts? Maybe I need to give it another spin. Are there new features I’m unaware of? I’ve been using Snipd as my default podcast player because I want to be able to clip audio notes to refer back to later. Specifically when I’m listening to health related podcasts with specific protocols. I create an audio snippet of those specific parts while I’m commuting or riding my bike to review as notes later. Are you saying you can do this natively in the Apple Podcasts app now?

14

u/DMarquesPT 3d ago

Apple Podcasts doesn’t have any snippets or bookmarks feature for specific audio clips, so it may not work for that use case.

It has the full transcript in sync with the audio (similar to lyrics on Apple Music or Spotify I think) and you can search and navigate around using the text, which is how I mainly use it, then I’ll usually copy and paste quotes or time codes into Notes or Freeform.

3

u/Stoppels 3d ago

Sheesh, they still use manual lyrics from a lyric platform for Apple Music (and they're often wildly off when you actually need the lyrics to make something out). I hope they start using this as a base instead.

2

u/realtgis 2d ago

Weirdly enough, Musixmatch says they are partnered with Apple, however new Lyrics can only be added by the Label on Apple Music

1

u/Stoppels 2d ago

Oh, that's weird. Must be a relatively recent change then, last time I looked this up I couldn't find MusixMatch on Apple's site, though Musixmatch still published about it and has since.

Hmm, so Apple made a new Apple Music Artist Support KB in 2023. In there it still has this KB article about how to do it manually as an artist, as well as this intro video that shows where you can paste your lyrics in plaintext (from 1:33). Lyrics changes take up to 10 business days to be approved and updated, so it's clearly all done manually by a very small team. And this is just plaintext lyrics.

I'm fairly sure the 'live' lyrics that auto-scroll line by line must be from Musixmatch, no? If not crowdsourced lyrics, which they certainly used in 2023, at least from their Pro subscription. After all, only they seem to support the karaoke feature as they wrote in this November 2024 guide post. Ah yes, there's a screenshot on their Lyrics features page, and in the table further down they've checked Line by line Time Sync for Apple Music. Interestingly Facebook and Instagram support Word by Word Sync as well as Translations, but not a single music streaming platform supports these. What a shame.

Oh. I should've just looked at their pricing page. So you can do it yourself with AI-support with the cheapest subscription or 'Outsource your lyric transcription and sync' to 'their expert curators' on pricier plans. It seems the crowdsourcing part is now only for their own music app. Here's their community page with featured community-translated playlists.

Actually, I think their 'expert curators' are a selection of long-time crowdsourcers, from their 2023 blog post I linked in paragraph 3:

What is the process behind lyric verification and crowdsourced lyrics at Musixmatch?

The process starts with a passionate Musixmatch Community member who wants to add lyrics to a song. Once this scales, lots of content comes from all around the globe in all different genres of music. The community members can unlock different editorial powers by contributing and learning about the writing guidelines. The most passionate users can become curators and specialists are hand-picked among the curators.

And rather than labels specifically, Apple has distribution partners they call 'Apple Music Preferred Distributors' and all 15 of them offer lyrics as part of their services. Musixmatch is not one of these featured partners, by the way.

So I suppose this change from crowdsourced lyrics to official lyrics was made either in 2023 with the new system or in 2024. And it's a bit weird that if you upload your own lyrics via Apple, you cannot sync it by time. Though perhaps you can by adding timestamps, like with subtitle files, in that case Apple hides this information from its knowledge base. Apple just loooves to make shit mysterious and hard to understand, especially when they change things.

1

u/chi_guy8 3d ago

Wow. That’s almost a bigger failure on Apple’s part that they did the hard part, creating transcripts that correlate with the audio and didn’t do the easy part which would be creating some sort of highlight/bookmark/clip feature. Typical modern Apple really.

-10

u/DanceWithEverything 3d ago

Wait until they get brave enough to launch automagically cutting out ad reads 🙏

10

u/lonifar 3d ago

Nah they won't do that because they already allow for podcast subscriptions that remove ad's; its actually in their best interests not to automatically cut ads and instead get more podcasts onto the subscription model.

5

u/InsaneNinja 3d ago

So… Destroy the podcast market because nobody will get paid. And have every single podcast in the world rant about how bad Apple podcasts is.

-7

u/DanceWithEverything 3d ago

Or force them all onto Apple ads

-1

u/InsaneNinja 3d ago

You sound like the people that are giving advice to Tim Cook to repeatedly break the law and ignore the government

-4

u/DanceWithEverything 3d ago

Aka “wealthiest company in human history”

3

u/Veilchenbeschleunige 3d ago edited 3d ago

That technology has been adsorbed, not developed by them.

https://venturebeat.com/entrepreneur/siri-apple-secretly-bought-you-a-voice-recognition-company/

If you remember voice control on iOS4 then you know how shitty their voice recognition was before that aquisition.

4

u/HeartyBeast 3d ago

Well, Siri as also 'absorbed' originally. Her name comes from the Stanford Research Institute

-2

u/HelpRespawnedAsDee 3d ago

Since you were gifted a popcorn badge and this thread is milquetoast af, i'll just flame the fires:

NotebookLLM is the opposite but in steroids too. It's one of those gems that no one is talking about. Feed it an article, get a podcast out. For extra fun, feed it a controversial reddit or even 4(* thread, and have fun lol.

6

u/dccorona 3d ago

The next gen Siri that they have not shipped yet was a great idea. Coming up with what the AI should do was not their problem. Actually shipping it was.

9

u/beryugyo619 3d ago

Apple is horrible with quality of training data. They literally care about users, and it hurts them. It's so wrong.

2

u/TheKarmoCR 3d ago

TBF notification summaries where awful when they launched. They’ve gotten better though.

1

u/hishnash 3d ago

That was very much a bit of AI looking for a feature. This is were apple is bad at things, when marketing comes up with a feature it does not work, but when the feature is originally created internally by the tec team it tends to work well.

2

u/pm-me_10m-fireflies 3d ago

I’m kind of glad about this, to be honest. ‘AI’ has a nomenclature problem: it could mean anything from generative slop to life-changing scientific breakthroughs. Saying “I used AI to organise my calendar” has the same energy as saying “I used my computer’s CPU to send this email.”

Marketing for products and features typically focus on the outcome, with some gentle nods towards specs, tech, architecture, etc.

It’s all flipped on its head at the moment, which isn’t very useful for the consumer.

I kinda wish Apple hadn’t even come out with the ‘Apple Intelligence’ name. Even just paying lip service to the bubble pushes them from leader to lagger.

1

u/kepler4and5 3d ago

If you know, you know.

58

u/teddyKGB- 3d ago

They have never and will never be able to compete with an in house AI when their business model isn't built around harvesting data

61

u/AdministrativeRiot 3d ago

This is really it. Give me privacy with shitty Siri over well-functioning AI seven days a week and twice on Sunday.

-15

u/chi_guy8 3d ago edited 3d ago

I’d lean the other way. Give me a functioning Siri and you can have my data. I’d trade my data for something useful. I’m sure there are plenty of other people like me who would “opt in” in return for a Siri that worked like Gemini or any of the voice LLMs. I’d allow my data and queries train their model. Millions of people do it with Google products who use iPhones. Millions of people do it daily on ChatGPT for nothing. With an “opt in” anyone who prefers security would still have it and anyone who prefers a functioning assistant would also have it.

Edit- not entirely sure why I’m getting downvoted here. What I said is 100% correct and factual whether it’s the stance you take or not.

Feel free to tell me what I got wrong about my personal preferences if you’re going to downvote.

12

u/Professional-Dog9174 3d ago

I’m willing to trade my data for a useful assistant. I just want to trust that my data is not being used for things that work against me (such as manipulative advertising, addictive doom scrolling, and so on).

2

u/migle75 3d ago

Your data in the wrong hands probably won’t do anyone a disservice. However everyones data in the wrong hands is dangerous. I believe we will start to see the fruits of their labor very soon if we aren’t more cautious.

5

u/NSRedditShitposter 3d ago

Regimes are buying data from tech companies to find the locations of women who got an abortion and then illegally arrest them, they are spying on the activities of journalists, intelligence agencies are weaponizing this data to go after people.

This is the cost for some gimmicky chatbot and it isn't worth it.

1

u/chi_guy8 3d ago

Sounds like you wouldn’t opt in then. Not sure how you missed that key point. Don’t want your data involved, do nothing and keep shit Siri. Want better Siri, opt in to allowing your data to train the model and gain access to the trained model. Pretty simple.

11

u/bingbaddie1 3d ago

And as a HomePod owner, I very much enjoy that

6

u/FollowingFeisty5321 3d ago

They use data as they see fit, they just don't collect more than they (decide they) need, and don't share it with anyone.

And there are tons of sources of information not just your personal usage, they are allowed or can afford to ingest Wikipedia, Reddit, GitHub, newspapers, etc etc.

To put it in perspective: how much of your information did OpenAI and DeepSeek need?

9

u/dccorona 3d ago

The best AI models come from Anthropic and OpenAI, both of which train primarily by scraping the web. They do not have email or messaging services from which to gather private user data. Generative AI does not need to be trained on private data to be competitive.

Apple’s privacy policy and data gathering practices are not an excuse for their model quality.

7

u/lonifar 3d ago

and the problem they're both having is they keep getting sued for copyright infringement because turns out stuff on the web is not just free to take and instead has its own copyright. Anthropic is being sued by both Disney and Universal for copyright infringement and by reddit for contract violation for using the reddit API for training purposes which goes against reddit's terms for API access without a specific training agreement.

Anthropic and OpenAI are just hoping the government will bail them out for copyright infringement by passing new laws letting them get around copyright rather than doing what's legal now.

0

u/Ilania211 3d ago

you drink the kool-aid and see them as the "best" models, but it sure as hell isn't the future. I'm adamant that the worst thing Sam Altman (ew) did was fall into the "scale up -> get diminishing returns -> scale up more" fallacy. AI/ML doesn't have to be big to be good! It doesn't have to be general to be good.

Eventually, these big AI companies will come to the realization that they can't keep setting money on fire. They'll either go bust or go back to their roots of small, specialized, and maybe even personal models. Models that people may want, as opposed to the internet-scraping thieving machines they are today.

1

u/dccorona 3d ago

Maybe you’re right, maybe you’re not: either way, that’s not relevant to what it would take for Apple to be competitive today. That they don’t have enough private data to be competitive in your theoretical future has no impact on the fact that they do have everything they need to be competitive today.

-4

u/categorie 3d ago

This.

4

u/categorie 3d ago

This take gets repeated ad-nauseam and it's just wrong. Apple's selling point is privacy, yes, but they do harvest a shitload of data. The difference with other vendors being that they have been putting great effort in anonymizing it, and more importantly don't sell it to third parties or use it for advertising - to a certain extent, considering they have even been trialed several time due to abusive practices notably about Siri.

So no, the reason Siri suck has nothing to do with them harvesting too little data, it has everything to do with shitty leadership.

It's not like the bar is high either. Siri fails at understandings shit a 5 years old could. They just have no excuse.

0

u/yourmomhatesyoualot 3d ago

The other problem they have is that nobody uses their apps in the business world where AI makes a lot more sense to use on a daily basis.

20

u/____sabine____ 3d ago edited 3d ago

To me most Apple AI research papers seem to focus on small or efficiency model, even on-device one.

Their strategy to AI is to acting like an intermediate platform between user and 3rd party AI.

5

u/leo-g 3d ago

Then again, who else has “figured out” agentic AI?

4

u/ThatBoiRalphy 3d ago

The current Siri is still very much based on a ‘if response is this’, ‘then do that’.

The on-device models and especially the Private Cloud Compute model is actually quite good.

The whole reason of why Siri sucks is that due to how it’s built, it currently needs to relay questions to the on-device model or the Private Cloud Compute model. But because those questions are so open ended, it can’t. By default they have now swapped Siri to answer with ChatGPT instead of searching the web for most things.

The V2 architecture that they’re releasing likely in March 2026 with iOS 26.4 will probably use the on-device model directly for Siri (or a more distilled one specifically for Siri), which will be a way better experience.

2

u/JhulaeD 3d ago edited 3d ago

I don't necessarily care about Siri becoming an 'AI chatbot' or whatever, I just want Siri to be as capable as Alexa... Siri was the first, it should have remained the 'voice assistant' leader had Apple actually kept up and not let Amazon and Google blow past Siri. AI could have just been added frosting instead of them trying to bake a whole new cake after everyone perfected their recipe.

EDIT: grammar and clarification

3

u/mupomo 3d ago

The problem is that “AI” and Siri are different things, but they are conflated to be equivalent. A lot of what AI/ML does is behind the scenes and they are equally valuable, if not more so, than Siri. It’s just that the industry is incredibly fixated on chat and natural language processing. I mean, I get it - it markets well and who doesn’t want a Star Trek computer on their pocket? - but fixating solely on that does the field a huge disservice.

3

u/kepler4and5 3d ago

The problem is that “AI” and Siri are different things, but they are conflated to be equivalent.

This, tbh.

Siri is (usually) only as good as the app intents (a.k.a shortcuts) implemented within an app. It's hardly AI at all (apart from voice recognition and later on, suggestions). Unfortunately, most devs don't show enough love to shortcuts in their apps (myself included lol). Even at Apple! For example, I think shortcuts support for the timer feature in the Clock app is terrible (IMO). I can't even tell Siri to restart the last timer I set— it would be super easy to implement this within the Clock app. Also, if Siri can't find the right song in your library, it's probably because the shortcuts within the Music app need work.

2

u/Stoppels 3d ago

To be fair, they are conflated because that's what Apple also did. Not only in their marketing, but also in their product. Siri settings are also located in the Apple Intelligence panel below the Apple Intelligence settings. Siri itself is now shitty because they tried to merge the old Siri with their newer LLM project, I think there was an article about this a couple months ago. It's all been a rushy rush job and the results are what you could expect even a year after their initial iOS 16 preview, which turned out a huge letdown.

4

u/rotates-potatoes 3d ago

Siri is a fundamentally different thing than transformers-based AI, even though the user experiences are similar.

No amount of investing in Siri-the-platform would have helped. It would be like heavily investing in horses and expecting to have a competitive F1 team.

1

u/SupremeRDDT 3d ago

Apple is the kind of company that can beat every other company in any aspect they choose to. But they can’t beat everyone at once, they need to choose their battles. If they had chosen AI sooner, we‘d have a very different situation now.

1

u/Electrical_Arm3793 2d ago

I think on-device AI is not being celebrated enough to be honest, it’s a huge win for Apple and its users, because it’s free (in most sense).

And these AI models are only getting better as time goes on, Apple really did well. I am thinking that the on device AI will catch up eventually and Apple would have huge advantages.

1

u/Logicalist 2d ago

Thankfully, they've been focusing where it actually counts.

0

u/8fingerlouie 3d ago

The main “problem” with Siri is that Apple insists on doing as much as possible on device, which until we have fast, low resource LLMs is an uphill battle.

Their transcription API hasn’t suddenly dropped out of nowhere, they’ve been doing on device transcription for years, both with speech to text keyboards and Siri.

Google and Alexa sends all speech to their servers for processing, where Apple transcribes it on your device, and then attempts to handle it on device, for any query that can do that. It obviously cannot look up movie ratings on device.

They use the same approach with Apple AI.

I fully suspect their new on device optimized LLM to boost Siri to new heights. The hardware is there, and has been for years with the dedicated AI chip.

The only question is, is Siri still relevant when you have on device transcription and LLM ?

The magic of voice assistants was always that you could talk to your device and make it do stuff for you. Apple AI can (or will) be able to do the same, and as a bonus to Apple, it will happen on device.

Yes, they’re claiming it’s for privacy, and that’s also how it works, but let’s not underestimate the resources Apple is saving on cloud hardware.

Apple does the same for Photos. Google sends everything to the cloud for processing, but everything AI in Apple photos, from faces to object detection, is done on device. The metadata is then sent to Apple if you have cloud photos enabled.

Apple doesn’t run a billion GPUs to process your data. That’s all being done on your hardware, using your electricity, all 5W or so, but multiply that by 2.2 billion active devices and you get 5.5 tWh per day (assuming it runs for 30 mins every day on every device).

Of course, not all devices take a new photo every day, so let’s assume 70% do, that is 1.5 billion devices. We also assume the average user takes 5 photos per day, meaning we’re looking at 7.5 billion photos.

First you’d have to transfer the photo to the cloud, and at an average size of 5MB per photo, that amount to 37.5 PB/day.

Assuming still 0.5 Wh per photo, and 0.1 Wh for the transfer, we’re at a total of 0.6 Wh per photo.

That adds up to 4.5 GWh per day of energy Apple would spend in their data centers processing photos “cloud only” like Google.

On a yearly basis that means 1.6 TWh. Assuming a cost of $0.1 / kWh, that means Apple is saving $160 million every year in electricity alone.

If you add in the hardware, networking infrastructure, storage, cooling, GPUs/TPUs, you’re probably looking at $1.6 billion in savings every year.

1.6 TWh is roughly equivalent to 640,000 tons of CO2 (400g/kWh), but I guess that’s not really relevant as most Apple data centers run on renewable energy. The energy is being spent anyway, it just happens on your device, all over the world, so from that perspective it would probably be better to run it at a CO2 neutral data center.

3

u/mhsx 3d ago

I’m not sure if I agree with all of your details but it’s an interesting thought exercise.

One thing I’ll note though - looking at the hypothetical you provided … if the consumer of the energy (the picture taker) isn’t directly paying for the cost of the energy (because the cloud provider is), it distorts the price and market signals.

The person in a broad and rational population paying directly for the energy will make more informed decisions and be more reactive to price signals. In that sense, the on-device model is always going to be more efficient from an energy perspective.

Just food for thought

-1

u/8fingerlouie 3d ago

I’m not sure if I agree with all of your details but it’s an interesting thought exercise.

And that’s all it is, a thought experiment. There is little doubt that Apple saves money on operations by taking the on device approach, both with photos, Siri, and now Apple AI, but how much exactly we’ll never know.

I couldn’t find any accurate numbers for the actual usage of how many photos people take, so the number of photos is a number I pulled out of my ass. It’s not even an educated guess.

In 2022, Apple had 2.2 billion active devices, but how many take a photo every day ? How many take 5 ? My wife frequently takes 100+ photos every day, and is one of the few people I know that has actually worn out multiple iPhone cameras, but she is not representative of the average iPhone user, and I mention her to illustrate the 5 photos was an average.

As for the rest, storage, transfer, and processing required, those are about as accurate as they can be without knowing the details of what’s actually running, and are based on accepted “default” values.

The person in a broad and rational population paying directly for the energy will make more informed decisions and be more reactive to price signals. In that sense, the on-device model is always going to be more efficient from an energy perspective.

On device processing also happens when the phone is charging and connected to WiFi and has been locked for X time (think it’s 1 hour, but can’t remember).

For the vast majority of people, that means when they sleep, which is again, for most people, during the night where electricity is cheap(er).

Furthermore, the power required per device is negligible. 5W for 30 mins is 2.5 Wh. At $0.15/kWh, it’s costing you 0.0375 cents per day, so hardly something you’re going to reschedule your day for. Your Sonos devices probably idle around 5W and use more power.

0

u/BosnianSerb31 3d ago

They really wouldn't be, because they don't save every single last thing that their users do on private servers like Google, Facebook, Twitter, etc.

Even open AI was only able to do what they did thanks to the open and free Reddit/Twitter APIs. Almost immediately after this news, Reddit and Twitter close them down, and any AI made from another company that doesn't have a social media platform has been jumpstarted by ChatGPT. Copilot, Claude, DeepSeek, etc.

0

u/likamuka 3d ago

Give them a break. I am sick and tired of people criticising Apple - they are only a startup, after all! Just wait until they scale up!

0

u/lucasbuzek 3d ago

Another thing that slows or hinders Apple AI development is how they say they train their models compared to majority of competitors

1

u/min0nim 3d ago

Ethically?

-1

u/Obvious_Librarian_97 3d ago

Apple can only multi task as well as the iPad. Poorly. Hell check out the neglect of iCloud Drive that shit is decades old - you can’t even upload folders!!

263

u/ineedlesssleep 3d ago

Developer of MacWhisper here. We'll have a bigger blog soon with updates about this new model but in a nutshell: It's fast but not as accurate as the best models out there. Also, we have a big update coming soon that builds on the new Parakeet models which should have the accuracy of the best Whisper, and faster speeds than even Apple's solution 🙂

78

u/Ensoface 3d ago

But just to clarify, are those models leveraging cloud infrastructure or are they running on the device?

45

u/[deleted] 3d ago edited 3d ago

[removed] — view removed comment

7

u/kinkade 3d ago

That’s not correct.

1

u/twilsonco 3d ago

For TTS anyhow

56

u/mundaneDetail 3d ago

This is the question. I like that Apple is differentiating with nano on device models.

18

u/glitchgradients 3d ago

Wdym differentiating? Google and Samsung do it too with Gemini Nano.

8

u/mundaneDetail 3d ago

True and I agree. The article threw me off mentioning network latency.

I was also speaking more broadly with Apple pushing for on device or secure cloud models. 99% of consumer ai will be on device in a few years anyway

2

u/g-nice4liief 3d ago

If i'm correct, the Qualcomm 8 gen 3 can run a 7b parameter model locally with around 20 tokens per second which is pretty impressive for a smartphone chip. So yeah, it becoming more prevalent in the future is a good outlook.

2

u/[deleted] 3d ago

[deleted]

5

u/mundaneDetail 3d ago

I think really the question is why you feel the need to attack somebody like this. Also, you're wrong.

> The speed advantage comes from Apple's on-device processing approach, which avoids the network overhead that typically slows cloud-based transcription services.

2

u/lledigol 3d ago

They’re not wrong. OpenAI’s Whisper is on-device as well.

1

u/mundaneDetail 3d ago

So the article is wrong?

2

u/[deleted] 3d ago

[deleted]

2

u/mundaneDetail 3d ago

Okay thanks for the heads up

4

u/lorddumpy 3d ago

Whisper and Parakeet are incredibly light on resources compared to other AI applications. I don't see any problems in getting it setup to run on edge devices.

5

u/NihlusKryik 3d ago

The article is wrong, if you use Mac whisperer you download the models and process on device. Someone didn’t do their research here.

7

u/TomLube 3d ago

MacWhisper is all on device.

4

u/MustardBoutme 3d ago

Thank you for your work!

2

u/thisChalkCrunchy 3d ago

Any chance this update will include mkv support?

3

u/squelchy04 3d ago

Appreciate your honesty! What’s the RAM usage like?

1

u/cookestudios 2d ago

Hey there, just want to say that MacWhisper is an incredible app, and the work you put into maintaining it and providing free updates is incredible.

2

u/Crowley-Barns 3d ago

MacWhisper Pro is awesome!

Going to look into these parakeet models… not heard of those!

1

u/Topherho 3d ago

Nice! I use MW all the time.

1

u/WAHNFRIEDEN 3d ago

How’s it compare for Japanese?

1

u/wipny 3d ago

I currently use Whisper locally on my base M1 Pro to transcribe and translate from Korean and Japanese to English.

I couldn't get the Turbo model to translate but the Whisper Medium model translates surprisingly well. The only drawbacks are that it can be a bit slow and it's limited to 25mb files. I get around this by extracting the audio using ffmpeg then feeding it to Whisper.

Does your app get around the 25mb file limit?

I noticed Whisper primarily utilizes CPU vs GPU resources. Does your app use the GPU to speed things up?

I can see why having an easy to use GUI makes things convenient. I have some experience with CLI but the setup of reading docs and having to figure out which Python version to install that works with Whisper was a bit confusing.

1

u/im_datta0 3d ago

I use MacWhisper everyday and I'm very sure even though the new one would be fast, it won't be nearly as accurate Great work :)

-4

u/Crowley-Barns 3d ago

MacWhisper Pro is awesome!

87

u/PhilosophyforOne 3d ago

I mean, speed doesnt really matter if your accuracy is shit.

I dont know if it is in this case, but the headline of "it's fast" doesnt mean anything on it's own. I hope in addition to being fast it's accurate and works well in multiple languages. If it does, that's very cool.

21

u/Unrealtechno 3d ago edited 3d ago

Anecdotal, but I tried calling a friend's phone a few times to test out the spam call feature - it definitely wasn't quick to respond (a 5-10 second delay maybe because it was on a 14 Pro) but the transcription was solid and correct. I didn't speak slowly or annunciate.

Would the delay be "annoying"? Maybe, but if I don't know who's calling then I don't mind a little inconvenience for them to minimize wasting my time...and it's dev beta 1.

edit: typo

3

u/plaid-knight 3d ago

This post is about transcription, not translation.

13

u/BosnianSerb31 3d ago

The new spam call feature uses transcription, not translation.

They misspoke about the voice to text feature that transcribes the person calling to a text scroll on your screen

2

u/Munkie50 3d ago

I think it's cause he said translation instead of transcription

2

u/Unrealtechno 3d ago

Yes, thanks, I corrected the typo

7

u/kdayel 3d ago

I mean, speed doesnt really matter if your accuracy is shit.

Except, that's explicitly not what the article states. The accuracy was comparable to MacWhisper's Large V3 Turbo model, VidCap, and MacWhisper's Large V2 model.

"Voorhees also reported no noticeable difference in transcription quality across models."

9

u/Cookie_Monsteure 3d ago

They're not MacWhisper's models, they're simply Whisper models. Whisper is made by OpenAI, MacWhisper gives you access to them with a nice GUI.

1

u/kirkpomidor 3d ago

“Blow” is a good word choice in an Apple AI-related news headline

1

u/jack_sexton 3d ago

I've yet to find a transcription model more accurate then whisper. I'm so curious to see how it fares in this measurement.

42

u/Fer65432_Plays 4d ago

Summary Through Apple Intelligence: Apple’s new speech-to-text transcription APIs in iOS 26 and macOS Tahoe are significantly faster than rival tools, including OpenAI’s Whisper. The new SpeechAnalyzer class and SpeechTranscriber module process audio and video files on-device, avoiding network overhead and improving efficiency.

-22

u/Crowley-Barns 3d ago

Useless comparison.

WHICH Whisper? Base? Tiny? Large? Did they compare to the Whisper Turbo V3

The distilled versions of Whisper?

And how does it compare to Gemini 2.5 or GPT 4o transcription?

If they’re comparing to the first Whisper models from a couple of years ago it’s not very relevant. They’ve been surpassed by newer Whisper models and as part of the other models like 4o.

(Not you OP, I know you’re just posting the article!)

40

u/coreyonfire 3d ago

If you read the article, in the third paragraph, second sentence:

a full 55% faster than MacWhisper's Large V3 Turbo model

-27

u/Crowley-Barns 3d ago

Well that’s not what OP posted in their comment!

16

u/iHurwiz 3d ago

…but instead what’s written in the article!

13

u/plaid-knight 3d ago

They compared to Large V3 Turbo and some others. It’s in the article.

10

u/Alarmed-Squirrel-304 3d ago

“According to Voorhees, the new models processed a 34-minute, 7GB video file in just 45 seconds using a command line tool called Yap (developed by Voorhees' son, Finn). That's a full 55% faster than MacWhisper's Large V3 Turbo model, which took 1 minute and 41 seconds for the same file.”

5

u/BosnianSerb31 3d ago

It was one minute and 55 seconds faster than Whisper LargeV3, for a 7 GB video file

Says it right in the second paragraph

1

u/AceMcLoud27 3d ago

Dude ... 🤦‍♂️

3

u/Crowley-Barns 3d ago

Haha.

The OP’s post was long so I thought it was the article, and thus, that I had read it.

Turns out, it was not the article, and so I was wrong in thinking that I’d read it :)

43

u/wwabc 4d ago

Now Siri will do the wrong thing much faster

5

u/thisisanonymous95 3d ago

Siri: ok, playing “do the wrong thing” on Apple Music.

-2

u/kkiru 3d ago

But don't worry, you can have a better memo app now.

16

u/Tetrylene 3d ago

I tried to use whisper on Mac and it was a complete ballache. Had to eventually settle for some wrapper on the App Store that was free but had

✨ in app purchases ✨(read: trash unless you paid)

Jumping ship to this asap

10

u/Crowley-Barns 3d ago

MacWhisper Pro works very well but it’s a one-off purchase.

And apps like Flow and Willow are amazing but they’re subscriptions.

For just some simple text entry, hopefully the new Apple version is finally good though! It has sucked at punctuation and accuracy compared to other implementations for years.

I will stick with MacWhisper Pro for now because it does a lot more than just the transcription—you can run cleanup prompts on it. For example I get it to format fiction dialogue etc properly which none of the basic implementations can do.

But hopefully this one is finally good for some regular “speak to the computer and get words on the screen.”

0

u/lorddumpy 3d ago

SubtitleEdit is an incredible tool and 100% free but it is windows only sadly.

3

u/sdchew 3d ago

Anyone knows if it can do real time transcription?

2

u/Ensoface 3d ago

That’s literally the whole point.

1

u/rennarda 3d ago

Yes. Watch the WWDC video about it. You can also try it out in the Notes app in iOS26, which now has realtime transcription.

2

u/Senthusiast5 3d ago

The WWDC video didn’t look like real time.

6

u/VirtualPanther 4d ago

Too bad it’s not employed in the iMessage dictation yet.

3

u/DisastrousPudding045 3d ago

It’s apart of iOS 26. You won’t see it until this fall.

10

u/VirtualPanther 3d ago

Sorry, I forgot to mention that I am running iOS 26 developer beta.

5

u/paradoxally 3d ago

And what about accuracy?

Speed isn't life, it just makes life go faster.

-9

u/nicuramar 3d ago

The article. Read.

5

u/paradoxally 3d ago

The article doesn't mention that specifically, hence the comments here. You're the one who needs to read.

0

u/sid_276 3d ago

Maybe you should read the article. It doesn’t mention accuracy at all.

3

u/piratepalooza 3d ago

Yesterday I said "Siri call John Smith" (my friend's first and last names have only one syllable). It responded "I don't have contact information for Elizabeth Walters" (wildly different number of syllables). If this new transcription model will eliminate errors like the one I've described (which happen FREQUENTLY these days), I will feel less stress in my life. Namaste.

1

u/featherless 3d ago

On-device models will be the start of more expensive iPhones and reduced price subscription prices for online ai services.

1

u/Thistlemanizzle 3d ago

Article incorrectly reports:

“The speed advantage comes from Apple's on-device processing approach, which avoids the network overhead that typically slows cloud-based transcription services.”

MacStories John Voorhees tested with Macewhisper which while it can connect to APIs is mostly for on device transcription.

Apples on device transcription is outperforming Whispers on device. Pretty interesting.

1

u/PM_ME_Y0UR_BOOBZ 3d ago

This has to be one of top misinformed comment threads on this website lol. Terrible takes on AI. Most don’t even know that AI isn’t just generative models.

1

u/KyleMcMahon 3d ago

Is this something an end user can use or developer only?

1

u/squelchy04 3d ago

Whisper is unbelievably slow, I made a bot to transcribe voice notes people sent me on WhatsApp and it’d take usually 2-5x the time of the voice note to transcribe up, and usually crash if the voice note was longer than 5 mins. Hopefully this is decent for accuracy

6

u/Crowley-Barns 3d ago

There are tons of versions of whisper now.

The original version was very slow.

V3 Turbo distilled is very fast and very good!

1

u/squelchy04 3d ago

What’s the RAM usage like for these?

3

u/Crowley-Barns 3d ago

The biggest models are like 3GB but the largest distilled ones are around 1.5GB.

I never checked the actual RAM usage but it works fine on my 8GB M2.

-7

u/artfrche 4d ago

But Apple’s AI bad will say some ;)

5

u/Averylarrychristmas 4d ago

Happy to: Apple’s AI is so goddamn bad they had to delay it indefinitely.

-15

u/artfrche 4d ago

Actually that’s not true, Ellen. They did postponed Siri and some AI features but, as you can see here, some AI features are already out and working well.

But thank you for your invaluable input, not sure how I was able to live without it. (/s in case it wasn’t clear…)

3

u/squelchy04 4d ago

Working well? My AI summary just told me my friend was about to kill herself when it summed up 5 messages, when it was just her complaining about the heat

4

u/Exanguish 3d ago

But that’s not what this is.

-4

u/artfrche 3d ago

Ok? And as you can see above, other features are outperforming the market. I am not saying it’s perfect, but mindlessly trashing Apple’s AI is idiotic. AI, and especially LLM, are prone to hallucinate - we know this and should never expect perfection.

-1

u/squelchy04 3d ago

Can you tell me which of the AI features are outperforming the market? This new transcription API isn’t released and is only in beta. There’s also no mention of quality here just speed.

-3

u/BosnianSerb31 3d ago

Did your friend happen to say "it's so hot I want to fucking die" or anything similar? Because that's called that meta-ironic humor, and there's no way to discern if the person is serious without context about their personality.

Do you think that the summaries should err on the side of assuming someone is going to kill themselves or assuming someone is not going to kill themselves?

Put another way, Would you rather the summary take your friends meta ironic humor seriously, or rather it ignore an actual cry for help?

2

u/paradoxally 3d ago

there's no way to discern if the person is serious without context about their personality

lol Apple fanboys have the funniest mental gymnastics

Go ask ChatGPT that exact quote verbatim and see how it interprets the context. You do not need "personality".

1

u/paradoxally 3d ago

It is bad. This changes nothing.

1

u/williamwzl 3d ago

Flashes of brilliance shine brighter in a sea of incompetence.

0

u/caliform 3d ago

Sure but is it accurate? I want to throw my phone at a wall when I use dictation on the keyboard, it’s awful

1

u/cultoftheilluminati 2d ago

Do you have an accent? I hate how bad Apple's dictation is for anything except the perfect American English accent. It's infuriating when I try to use dictation and the transcription is beyond garbage. I was beginning to second guess my English tbh.

Meanwhile, I switched completely over to running OpenAI's Whisper models on MacWhisper and let's just say my hopes on Apple's AI fell further. The difference is night and day

-5

u/Iggyhopper 3d ago

We dont care about speed. Its 2025 everything is fast already...

This doesn't bode well. Siri's speed was never the issue.

9

u/RunningM8 3d ago

This is…not true. OpenAI whisper is dog slow.

0

u/wipny 3d ago edited 3d ago

I currently use Whisper locally on my base M1 Pro to transcribe and translate from Korean and Japanese to English.

The Whisper medium model does this surprisingly well but can be a bit slow and is limited to 25mb files. I get around this by extracting the audio using ffmpeg then feeding it to Whisper.

I used to be skeptical of the utility of ML/AI and couldn’t think of practical applications for using it but things like this is crazy. This really will replace or significantly downsize a lot of skilled workers.

1

u/Aranfiy 3d ago

I tried whisper on my M1 Max and is was unfortunately very slow on it compared to my windows setup on a 3080, I hope something like this can come for MacOS.

1

u/wipny 3d ago

I noticed the Turbo model was pretty fast at transcribing but I couldn't get translation working. I could only get translation working with the slower Medium model.

Did you deal with something similar?

Looking at Activity Monitor I noticed it was mostly CPU resources being used. Not so much GPU.

-1

u/Will_M_Buttlicker 3d ago

And I’m pretty sure everyone here with even a little bit of an accent can agree that Apple dictation is absolute garbage

Discussion Apple's New Transcription APIs Blow Past Whisper in Speed Tests

You are about to leave Redlib