I want to host some LLM’s locally and use more advanced models. Since new hardware is out of the question, I think I should be able to pull something off buying some yesteryear equipment on ebay etc. Did anybody attempt such a project? Does it scale horizontally? (I.e. can I connext two boxes to overcome single box slowness?)


The sad truth is that Apple Silicon, especially Ultra chip are champion of local inference. Using oMLX instead of ollama take the most out of it.
In my region older Mac Studio are hard to find but maybe you will be more lucky than I am.