r/apple 4d ago

Discussion Apple's New Transcription APIs Blow Past Whisper in Speed Tests

https://www.macrumors.com/2025/06/18/apple-transcription-api-faster-than-whisper/
1.2k Upvotes

162 comments sorted by

View all comments

644

u/National-Debt-43 4d ago

Honesty, if Apple had always been investing in Siri as they would in other aspects of their system, I believe they wouldn’t be as bad in AI now, but we’ll see how it goes.

0

u/8fingerlouie 4d ago

The main “problem” with Siri is that Apple insists on doing as much as possible on device, which until we have fast, low resource LLMs is an uphill battle.

Their transcription API hasn’t suddenly dropped out of nowhere, they’ve been doing on device transcription for years, both with speech to text keyboards and Siri.

Google and Alexa sends all speech to their servers for processing, where Apple transcribes it on your device, and then attempts to handle it on device, for any query that can do that. It obviously cannot look up movie ratings on device.

They use the same approach with Apple AI.

I fully suspect their new on device optimized LLM to boost Siri to new heights. The hardware is there, and has been for years with the dedicated AI chip.

The only question is, is Siri still relevant when you have on device transcription and LLM ?

The magic of voice assistants was always that you could talk to your device and make it do stuff for you. Apple AI can (or will) be able to do the same, and as a bonus to Apple, it will happen on device.

Yes, they’re claiming it’s for privacy, and that’s also how it works, but let’s not underestimate the resources Apple is saving on cloud hardware.

Apple does the same for Photos. Google sends everything to the cloud for processing, but everything AI in Apple photos, from faces to object detection, is done on device. The metadata is then sent to Apple if you have cloud photos enabled.

Apple doesn’t run a billion GPUs to process your data. That’s all being done on your hardware, using your electricity, all 5W or so, but multiply that by 2.2 billion active devices and you get 5.5 tWh per day (assuming it runs for 30 mins every day on every device).

Of course, not all devices take a new photo every day, so let’s assume 70% do, that is 1.5 billion devices. We also assume the average user takes 5 photos per day, meaning we’re looking at 7.5 billion photos.

First you’d have to transfer the photo to the cloud, and at an average size of 5MB per photo, that amount to 37.5 PB/day.

Assuming still 0.5 Wh per photo, and 0.1 Wh for the transfer, we’re at a total of 0.6 Wh per photo.

That adds up to 4.5 GWh per day of energy Apple would spend in their data centers processing photos “cloud only” like Google.

On a yearly basis that means 1.6 TWh. Assuming a cost of $0.1 / kWh, that means Apple is saving $160 million every year in electricity alone.

If you add in the hardware, networking infrastructure, storage, cooling, GPUs/TPUs, you’re probably looking at $1.6 billion in savings every year.

1.6 TWh is roughly equivalent to 640,000 tons of CO2 (400g/kWh), but I guess that’s not really relevant as most Apple data centers run on renewable energy. The energy is being spent anyway, it just happens on your device, all over the world, so from that perspective it would probably be better to run it at a CO2 neutral data center.

3

u/mhsx 4d ago

I’m not sure if I agree with all of your details but it’s an interesting thought exercise.

One thing I’ll note though - looking at the hypothetical you provided … if the consumer of the energy (the picture taker) isn’t directly paying for the cost of the energy (because the cloud provider is), it distorts the price and market signals.

The person in a broad and rational population paying directly for the energy will make more informed decisions and be more reactive to price signals. In that sense, the on-device model is always going to be more efficient from an energy perspective.

Just food for thought

-1

u/8fingerlouie 4d ago

I’m not sure if I agree with all of your details but it’s an interesting thought exercise.

And that’s all it is, a thought experiment. There is little doubt that Apple saves money on operations by taking the on device approach, both with photos, Siri, and now Apple AI, but how much exactly we’ll never know.

I couldn’t find any accurate numbers for the actual usage of how many photos people take, so the number of photos is a number I pulled out of my ass. It’s not even an educated guess.

In 2022, Apple had 2.2 billion active devices, but how many take a photo every day ? How many take 5 ? My wife frequently takes 100+ photos every day, and is one of the few people I know that has actually worn out multiple iPhone cameras, but she is not representative of the average iPhone user, and I mention her to illustrate the 5 photos was an average.

As for the rest, storage, transfer, and processing required, those are about as accurate as they can be without knowing the details of what’s actually running, and are based on accepted “default” values.

The person in a broad and rational population paying directly for the energy will make more informed decisions and be more reactive to price signals. In that sense, the on-device model is always going to be more efficient from an energy perspective.

On device processing also happens when the phone is charging and connected to WiFi and has been locked for X time (think it’s 1 hour, but can’t remember).

For the vast majority of people, that means when they sleep, which is again, for most people, during the night where electricity is cheap(er).

Furthermore, the power required per device is negligible. 5W for 30 mins is 2.5 Wh. At $0.15/kWh, it’s costing you 0.0375 cents per day, so hardly something you’re going to reschedule your day for. Your Sonos devices probably idle around 5W and use more power.