In current machines, that's way off depending on how you choose to count "1 CPU instruction" for the metaphor.
Take Apple's latest laptops. They have 16 CPU cores, 12 of those clocking at 4.5 GHz and able to decode/dispath up to 10 instructions per cycle. 4 of those clocking at 2.6 GHz, I'm not sure about their decode/dispatch width but let's assume 10. Those decoder widths don't translate to that many instructions-per-cycle in practice, but let's roll with it because the order of magnitude is close enough.
If the instructions are just right, that's 824 instructions per nanosecond. Or, roughly a million times faster than the 6502 in the Apple-II! Computers really have got faster, and we haven't even counted all the cores yet.
Scaling those to one per second, a RAM fetch taking 100ns would scale to 82400 seconds, which 22.8 hours, just short of a day.
Fine, but we forgot about the 40 GPU cores and the 16 ANE cores! More instructions per ns!
Now we're definitely into "days".
For the purpose of the metaphor, perhaps we should also count the multiple lanes of each vector instruction on the CPU, and lanes on the GPU cores, as if thery were separate processing instructions.
One way to measure that, which seems fair and useful to me, is to look at TOPS instead - tera operations per second. How many floating-point calculations can the processor complex do per second? I wasn't able to find good figures for the Apple M4 Max as a whole, only the ANE component, for which 38 TOPS is claimed. For various reasons tt's reasonable to estimate the GPU is the same order of magnitude in TOPS on those chips.
If you count 38 TOPS as equivalent to "CPU instructions" in the metaphor, then scale those to 1 per second, a RAM fetch taking 100ns scales to a whopping 43.9 days on a current laptop!
If you're counting all instruction executing in parallel with the maximum on-paper IPC on all CPUs, accelerators, and GPUs, the number your get has no clear relation to RAM latency. It really is comparing apples and oranges.
This scenario where all your 16 cores are doing 10 instructions per clock assumes everything is running without waiting, at full instruction-level and CPU-level parallelism. It's a measure of the maximum paper throughput when you're not blocked waiting on memory.
You could compare that to the maximum throughput of the RAM and the memory subsystem, and that would give you meaningful numbers (for instance, how many bytes/cycle can my cores handle? How many GB/s can my whole system process?).
Trying to add up the combined throughput of everything you can on one side and the latency of a single fetch on the other side will give you a really big number, but as a metaphor it will be more confusing than anything.
This seems like the classic 9 women making a baby in 1 month. Even if the CPU can execute 824 instructions per nanosecond, it can't execute 1 instruction in 1/824 nanoseconds. You can't mix throughput and latency like that.
Take Apple's latest laptops. They have 16 CPU cores, 12 of those clocking at 4.5 GHz and able to decode/dispath up to 10 instructions per cycle. 4 of those clocking at 2.6 GHz, I'm not sure about their decode/dispatch width but let's assume 10. Those decoder widths don't translate to that many instructions-per-cycle in practice, but let's roll with it because the order of magnitude is close enough.
If the instructions are just right, that's 824 instructions per nanosecond. Or, roughly a million times faster than the 6502 in the Apple-II! Computers really have got faster, and we haven't even counted all the cores yet.
Scaling those to one per second, a RAM fetch taking 100ns would scale to 82400 seconds, which 22.8 hours, just short of a day.
Fine, but we forgot about the 40 GPU cores and the 16 ANE cores! More instructions per ns!
Now we're definitely into "days".
For the purpose of the metaphor, perhaps we should also count the multiple lanes of each vector instruction on the CPU, and lanes on the GPU cores, as if thery were separate processing instructions.
One way to measure that, which seems fair and useful to me, is to look at TOPS instead - tera operations per second. How many floating-point calculations can the processor complex do per second? I wasn't able to find good figures for the Apple M4 Max as a whole, only the ANE component, for which 38 TOPS is claimed. For various reasons tt's reasonable to estimate the GPU is the same order of magnitude in TOPS on those chips.
If you count 38 TOPS as equivalent to "CPU instructions" in the metaphor, then scale those to 1 per second, a RAM fetch taking 100ns scales to a whopping 43.9 days on a current laptop!