What AI services are you selfhosting? Or, have tested and passed on

kiol@lemmy.world · 10 months ago

What AI services are you selfhosting? Or, have tested and passed on

kata1yst@sh.itjust.works · 10 months ago

I use OLlama & Open-WebUI, OLlama on my gaming rig and Open-WebUI as a frontend on my server.

It’s been a really powerful combo!

kiol@lemmy.world · 10 months ago

Would you please talk more about it. I forgot about Open-webui, but intending to start playing with. Honestly, what do you actually do with it?

Oisteink@feddit.nl · 10 months ago

I have the same setup, but its not very usable as my graphics card has 6gb ram. I want one with 20 or 24, as the 6b models are pain and the tiny ones don’t give me much.

Ollama was pretty easy to set up on windows, and its eqsy to download and test the models ollama has available

kiol@lemmy.world · 10 months ago

Sounds like you and I are in a similar place of testing.

Oisteink@feddit.nl · 10 months ago

Possibly. Been running it since last summer, but like i say the small models dont do much good for me. I have tried llama3.1 olmo2, deepseek r1 in a few variants, qwen2. Qwen2.5 coder, mistral, codellama, starcoder2, nemotron-mini, llama3.2, qwen2.5-coder, gamma2 and llava.

I use perplexity and mistral as paid, with much better quality. Openwebui is great though, but my hardware is lacking

Oisteink@feddit.nl · 10 months ago

Scrap that - after upgrading it went bonkers and will always use one of my «knowledges» no matter what I try. The websearch fails even with ddg as engine. Its aways seemed like the ui was made by unskilled labour, but this is just horrible. 2/10 not recommended

Lucy :3@feddit.org · 10 months ago

Sex chats. For other uses, just simple searches are better 99% of the time. And for the 1%, something like the Kagis FastGPT helps to find the correct keywords.

colourlesspony@pawb.social · 10 months ago

I messed around with home assistant and the ollama integration. I have passed on it and just use the default one with voice commands I set up. I couldn’t really get ollama to do or say anything useful. Like I asked it what’s a good time to run on a treadmill for beginners and it told me it’s not a doctor.

metoosalem@feddit.org · 10 months ago

Like I asked it what’s a good time to run on a treadmill for beginners and it told me it’s not a doctor.

Kirkland brand meseeks energy.

Starfighter@discuss.tchncs.de · 10 months ago

There are some experimental models made specifically for use with Home Assistant, for example home-llm.

Even though they are tiny 1-3B I’ve found them to work much better than even 14B general purpose models.

That being said they’re still LLMs. I like to keep the “prefer handling commands locally” option turned on and only use the LLM as a fallback.

kiol@lemmy.world · 10 months ago

Haha, that is hilarious. Sounds like it gave you some snark. afaik you have to clarify by asking again when it says such things. “I’m not asking for medical advice, but…”

RonnyZittledong@lemmy.world · 10 months ago

I could boot my gaming machine into it’s linux partition and SSH into it to play with stuff but that is a PITA I have not felt like doing yet.

SmokeyDope@lemmy.world · 10 months ago

I run kobold.cpp which is a cutting edge local model engine, on my local gaming rig turned server. I like to play around with the latest models to see how they improve/change over time. The current chain of thought thinking models like deepseek r1 distills and qwen qwq are fun to poke at with advanced open ended STEM questions.

As for actual use: I prefer using mistral small 24b and treating it like a local search engine with the legitimacy of wikipedia. I ask it questions about general things I don’t know about or want advice on, it usually then do further research through more legitimate sources. Its important to not take the LLM too seriously as theres always a small statistical chance it hallucinates some bullshit but most of the time its fairly accurate and is a pretty good jumping off point for further research.

Like if I want an overview of how can I repair holes concrete, or general ideas on how to invest. If the LLM says a word or related concept I don’t recognize I grill it for clarifying info.

I’ve used an LLM to help me go through old declassified documents and speculate on internal gov terminalogy I was unfamiliar with.

I’ve used a speech to text model and get it to speek just for fun. Ive used multimodal model and get it to see/scan documents for info.

Ive used websearch to get the model to retrieve information it didn’t know off a ddg search, again mostly for fun.

Feel free to ask me anything, I’m glad to help get newbies started.

y0shi@lemm.ee · 10 months ago

I’ve an old gaming PC with a decent GPU laying around and I’ve thought of doing that (currently use it for linux gaming and GPU related tasks like photo editing etc) However ,I’m currently stuck using LLMs on demand locally with ollama. Energy costs of having it powered on all time for on demand queries seems a bit overkill to me…

pezhore@infosec.pub · 10 months ago

I put my Plex media server to work doing Ollama - it has a GPU for transcoding that’s not awful for simple LLMs.

RonnyZittledong@lemmy.world · 10 months ago

None currently. Wish I could afford a GPU to play with some stuff.

kiol@lemmy.world · 10 months ago

Well, let me know your suggestions if you wish. I took the plunge and am willing to test on your behalf, assuming I can.

MangoPenguin@lemmy.blahaj.zone · 10 months ago

If Immich counts for its search system, then there’s that.

Otherwise I’ve tried some various things and found them lacking in functionality, and would require leaving my PC on all the time to use.

ikidd@lemmy.world · 10 months ago

LMStudio is pretty much the standard. I think it’s opensource except for the UI. Even if you don’t end up using it long-term, it’s great for getting used to a lot of the models.

Otherwise there’s OpenWebUI that I would imagine would work as a docker compose, as I think there’s ARM images for OWU and ollama

L_Acacia@lemmy.ml · 10 months ago

Well they are fully closed source except for the open source project they are a wrapper on. The open source part is llama.cpp

ikidd@lemmy.world · 10 months ago

Fair enough, but it’s damn handy and simple to use. And I don’t know how to do speculative decoding with ollama, which massively speeds up the models for me.

L_Acacia@lemmy.ml · 10 months ago

Their software is pretty nice. That’s what I’d recommand to someone who doesn’t want to tinker. It’s just a shame they don’t want to open source their software and we have to reinvent the wheel 10 times. If you are willing to tinker a bit koboldcpp + openewebui/librechat is a pretty nice combo.

ikidd@lemmy.world · 10 months ago

That koboldcpp is pretty interesting. Looks like I can load a draft model for spec decode as well as a pile of other things.

What local models have you been using for coding? I’ve been disappointed with things like deepseek-coder and the qwen-coder, it’s not even a patch on Claude, but that damn cost for anthropic has been killing me.

L_Acacia@lemmy.ml · 10 months ago

As much as I’d like to praise the open-weight models. Nothing comes close to Claude sonnet in my experience too. I use local models when info are sensitive and claude when the problem requires being somewhat competent.

What setup do you use for coding? I might have a tip for minimizing claude cost you depending on what your setup is.

ikidd@lemmy.world · 10 months ago

I’m using vscode/Roocode with Gosucoder shortprompt, with Requesty providing models. Generally I’ll use R1 to outline a project and Claude to implement. The shortprompt seems to reduce the context quite a bit and hence the cost. I’ve heard about Cursor but haven’t tried it yet.

When you’re using local models, which ones are you using? The ones I mention don’t seem to give me much I can use, but I’m also probably asking more of them because I see what Claude can do. It might also be a problem with how Roocode uses them, though when I just jump into a chat and ask it to spit out code, I don’t get much better.

Helmaar@lemmy.world · 10 months ago

I was able to run a distilled version of DeepSeek on Linux. I ran it inside a PODMAN container with ROCM support (I have an AMD GPU). It wasn’t super fast but for a locally deployed and self hosted option the performance was okay. Apart from that I have deployed Fooocus for image generation in a similar manner. Currently, I am working on deploying Stable Diffusion with either ComfyUI or Automatic1111 inside a PODMAN container with ROCM support.

couch1potato@lemmy.dbzer0.com · 10 months ago

I spun up ollama and paperless-gpt to add ai ocr sidecar to paperless-ngx. It’s okay. It can read handwritten stuff okayish, which is better than tesseract (doesnt read hand writing at all), so I throw handwritten stuff to it, but the difference on typed text is marginal in my single day I spent testing 3 different models on a few different typed receipts.

kiol@lemmy.world · 10 months ago

Which specific models did you try and how would you rank each is usability?

couch1potato@lemmy.dbzer0.com · 10 months ago

I tried minicpm-v, granite3.2-vision, and mistral.

Granite didn’t work with paperless-gpt at all. Mistral worked sometimes but also just kept running sometimes and didn’t finish within a reasonable time (15 minutes for 2 pages). minicpm-v finishes every time, but i just looked at some of the results and seems as though it’s not even worth keeping it running either. I suppose maybe the first one I tried that gave me a good impression was a fluke.

To be fair, I’m a noob at local ai, and I also don’t have a good gpu (gtx1650). So these failures could all be self induced. I like the idea of ai powered ocr so I’ll probably try again in the future…

kiol@lemmy.world · 10 months ago

I find your experiments inspired. Thank you! I’m learning about this myself on an rtx and excited to discuss on my little podcast.james.network one of these days. Been using paperless minus the AI functionality so far. About to start testing different AI services on an arm64 device with 16gb ram that claims some level of AI support; will see how that goes. Let me know if there are any other specific services/models you’d recommend or are curious about.

couch1potato@lemmy.dbzer0.com · 10 months ago

Sure, and let me know how it goes for you. I’m on a dell r720xd, about to upgrade my ram from 128 to 296 gb… don’t want to spend the money for a new gpu right now.

I’ll report back after I try again.

Grandwolf319@sh.itjust.works · 10 months ago

I have Immich that has AI searching for my photos. Pretty useful for finding stuff actually

MrPistachios@lemmy.today · 10 months ago

I have immich machine learning and ollama with openwebui

I use immich search a lot to find things like pictures of the side of the road to post on my community !sideoftheroad@lemmy.today

I almost never use the ollama though, not really sure what to do with it other than ask it dumb questions just to see what it says

I use the duckduckgo one when it auto has an answer to something I searched but its not too reliable

What AI services are you selfhosting? Or, have tested and passed on

What AI services are you selfhosting? Or, have tested and passed on

Testing Indiedroid Nova w/ 16gb ram - Learning Together