And even better, “testing” it. Maybe I’m sloppy, but I have failed runs, errors, hacks, hours of “tinkering” trying to get something to launch that just feels like an utter waste of an A100 mostly sitting idle… Hence I often don’t do it at all.
One thing you should keep in mind is that the compute power of this thing is not like an A/H100, especially if you get a big slowdown with rocm, so what could take you 2-3 days could take a week. It’d be nice if framework sold a cheap MI300A, but… shrug.
And even better, “testing” it. Maybe I’m sloppy, but I have failed runs, errors, hacks, hours of “tinkering” trying to get something to launch that just feels like an utter waste of an A100 mostly sitting idle… Hence I often don’t do it at all.
One thing you should keep in mind is that the compute power of this thing is not like an A/H100, especially if you get a big slowdown with rocm, so what could take you 2-3 days could take a week. It’d be nice if framework sold a cheap MI300A, but… shrug.
I don’t mind that it’s slower, I would rather wait than waste time on machines measured in multiple dollars per hour.
I’ve never locked up an A100 that long, I’ve used them for full work days and was glad I wasn’t paying directly.