Couldn’t you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU “L4” would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn’t you just treat it like really fast swap?
Wrote a longer reply to someone else, but briefly, yes, you are correct. Kinda.
Caches won’t help with bandwidth-bound compute (read: ”AI”) it the streamed dataset is significantly larger than the cache. A cache will only speed up repeated access to a limited set of data.
Couldn’t you just treat the socketed ram like another layer of memory effectively meaning that L1-3 are on the CPU “L4” would be soldered RAM and then L5 would be extra socketed RAM? Alternatively couldn’t you just treat it like really fast swap?
Wrote a longer reply to someone else, but briefly, yes, you are correct. Kinda.
Caches won’t help with bandwidth-bound compute (read: ”AI”) it the streamed dataset is significantly larger than the cache. A cache will only speed up repeated access to a limited set of data.