Checking....what's the status for FOSS agentic AI models with skills?

iturnedintoanewt@lemmy.world · 10 hours ago

Checking....what's the status for FOSS agentic AI models with skills?

hendrik@palaver.p3x.de · edit-2 6 hours ago

We got open-source agents like OpenCode. OpenClaw is weird, and not really recommended by any sane person, but to my knowledge it’s open source as well. We got a silly(?) “clean-room rewrite” of the Claude Agent, after that leaked…

Regarding the models, I don’t think there’s any strictly speaking “FLOSS” models out there with modern tool-calling etc. You’d be looking at “open-weights” models, though. Where they release the weights under some permissive license. The training dataset and all the tuning remain a trade secret with pretty much all models. So there is no real FLOSS as in the 4 freedoms.

Google dropped a set of Gemma models a few days ago and they seem pretty good. You could have a look at Qwen 3.5, or GLM, DeepSeek… There’s a plethora of open-weights models out there. The newer ones pretty much all do tool-calling and can be used for agentic tasks.

9tr6gyp3@lemmy.world · 10 hours ago

https://wiki.archlinux.org/title/Ollama

Ollama is an application which lets you run offline large language models locally.

PetteriPano@lemmy.world · 3 hours ago

I’ve had better luck with llama.cpp for opencode. I’m guessing it does formatting better for tool use.

Auster@thebrainbin.org · 10 hours ago

There’s also a community for it here on the fediverse, to those interested: !Ollama@lemmy.world

Also, from my tests, it works decent enough even on Android’s Termux, though a powerful phone seems needed.

Auster@thebrainbin.org · 10 hours ago

deleted by creator

ThePowerOfGeek@lemmy.world · 10 hours ago

I’m curious about this too. I know that on the latest version of Ollama it’s possible to install OpenClaw. But I assumed you needed to point it to a paid API (Claude, ChatGPT, Grok, etc.) for it to really work. But yeah, maybe it works with Qwen 3 or similar models?

I guess a major factor to this is what your system resources look like, especially howmuch RAM you have. And therefore which model you are hosting locally.

cecilkorik@lemmy.ca · 9 hours ago

Absolutely. There are tons of open-licenced, open-weight (the equivalent of open-source for AI models) models capable of what is called “tool usage”. The key thing to understand is that they’re never quite perfect, and they don’t all “use tools” quite as effectively or in the same way as each other. This is common to LLMs and it is critical to understand that at the end of the day they are just text generators, they do not “use tools” themselves. They create specific structured text that triggers some other software, typically called a harness but could also be called a client or frontend, to call those tools on your system. Openclaw is an example of such a harness (and not a great or particularly safe one in my opinion but if you want to be a lunatic and give an AI model free reign it seems to be the best choice) You can use commercial harnesses too by configuring or tricking them into connecting to a local model instead of their commercial one, although I don’t recommend this for a variety of reasons if you really want to use claude code itself people have done it but I don’t find it works very well since all its prompts and tool calling is optimized for Claude models. Besides OpenClaw, Other popular harnesses for local models include OpenCode (as close as you’re going to get to claude for local models) or Cursor, even Ollama has their own CLI harness now. Personally I use OpenCode a lot but I am starting to lean towards pi-mono (it’s just called pi but that’s ungoogleable) it is very minimal and modular, making it intentionally easy to customize with plugins and skills you can automatically install to make it exactly as safe or capable or visual as you wish it to be.

As a minor diversion we should also discuss what a “tool” is, in this context there are some common basic tools that some or most tool-use models will have or understand some variation of, out of the box. Things like editing files, running command-line tools, opening documents, searching the web, are common built-in skills that pretty much any model advertising itself capable of “tool use” or “tool calling” will support, although some agents will be able to use these skills more capably and effectively than others. Just like some people know the Linux commandline fluently and can completely operate their system with it, while others only know basic commands like ls or cat and need a GUI or guidance for anything more complex, AI models are similar, some (and the latest models in particular) are incredibly capable with even just their basic built-in tools. However they’re not limited by what’s built in, as like I said, they can accept guidance on what to use and how to use it. You can guide them explicitly if you happen to be fluent in their tools, but there are kind of two competing models for how to give them that guidance automatically. These are MCP (model context protocol) which is a separate server they can access that provides structured listings of different kinds of tools they can learn to use and how they work, basically allowing them to connect to a huge variety of APIs in almost any software or service. Some harnesses have an MCP built-in. The other approach is called “skills” and seems to be (to me) a more sensible and flexible approach to giving the AI model enough understanding to become more capable and expand the tools it can use. Again, providing skills is usually something handled by the harness you’re using.

To make this a little less abstract you can put it in perspective of Claude: Anthropic provides several different Claude models like Haiku, Sonnet, and Opus. These are the text-generation models and they have been trained to produce a particular tool usage format, but Opus tends to have more built-in capability than something like Haiku for example. Regardless of which model you choose though (and you can switch at any time) you’ll be using a harness, typically “claude code” which is typically the CLI tool most people use to interact with Claude in an agentic, tool calling capacity.

On the open and local side of the landscape, we don’t have anything quite as fast or capable as Claude code unfortunately, but we can do surprisingly okay considering we’re running small local models on consumer hardware, not massive data center farms being enticingly given away or rented for pennies on the dollar of what they’re actually costing these companies on the hopes of successful marketshare-capture and vendor-lock-in leading to future profits.

Here are some pretty capable tool-use models I would recommend (most should be available for download through ollama and other sources like huggingface)

gemma4 (the latest and greatest hotness, MIT licensed using TurboQuant to deliver pretty incredible capability, performance and results even with limited VRAM)
qwen3.5 (from Alibaba, a consistent and traditional leader in open models so far with good capability and modest performance)
qwen3-coder-next (a pretty huge coding-focused model you might struggle to run unless you have a very beefy system and GPU)
glm4.7-flash (a modestly capable and reasonably fast option)
devstral-small-2 (an older, not-so-small variant of mistral, the French open-weight AI model if you’re looking for a non-Chinese, non-US based model which are few and far between)

PetteriPano@lemmy.world · edit-2 3 hours ago

Gemma4 doesn’t Turboquant. But it is leaner on the KV cache.

edit: looks like there are forks that do turboquant already

Mike Wooskey@lemmy.thewooskeys.com · 9 hours ago

I’m my experience, running Ollama locally works great. I do have a beefy GPU, but even on affordable consumer grade GPUs you can get good results with smaller models.

So it technically works to run an AI agent locally, but my experience has been that coding agents don’t work well. I haven’t tried using general AI agents.

I think the amount of VRAM affordable/available to consumers is nowhere near enough to support a context length that’s necessary for a coding agent to remain coherent. There are tools like Get Shit Done which are supposed to help with this, but I didn’t have much luck.

So I’m using OpenCode via OpenRouter to use LLMs in the cloud. Sad that I can’t get local-only to work well enough to use for coding agents, but this arrangement works for me (for now).

HelloRoot@lemy.lol · 10 hours ago

If you are on linux, and want ai assisted stuff like you mentioned there has been this for a while: https://github.com/qwersyk/Newelle

( or the weeb version if you prefer: https://wiki.nyarchlinux.moe/nyarchassistant/ )

and it can use locally run models. But have realistic expectations. If you want it to work well, you need a beefy GPU, a lot of RAM and swap. The “intelligence” is kind of limited if you run low spec models, to the point of it maybe being utterly useless.

artyom@piefed.social · 9 hours ago

They all suck and should be avoided.