Ask HN: Local models to support home network infrastructure?

I have had a blast getting Claude Code to manage my home infrastructure. I have been against the cloud forever, so I have had to build a home setup that does a lot of cloud stuff. Like, I run Resillio Sync for all my family iOS photo backups, and a local NAS to host my legally downloaded and owned movies and tv shows, I also use a bunch of raspberry pis, doing things like running local Home Assistant z-wave and zigbee sensors. The router, switches, and APs are all UniFI, same with all the cameras, door bells, and VoIP. Again, all local first (except Talk, obvs).

As you can imagine, maintaining entropy for all these disparate systems takes time, of which I have less now that I have young kids. So when Claude Code was released, I took to it like a fish to water. We mapped my entire network, I created accounts on all the devices so it can SSH into everything and configure everything (including the Ubiquiti Dream Machine Pro!). I have been blow away at how well it troubleshoots and fixes everything.

I have a DGX Spark AI workstation (128gb of memory), and I really want to now hand off the work to a local model, either using Opencode or Claude Code harnesses and simply pointing at a vLLM instantiated model accessable by API (just point Opencode or Claude Code at the local IP and API endpoint).

It works, except I tried Qwen3-coder just now and it's refusing to help due to security concerns. Ugh. I then tried GLM-4.7-Flash, but vLLM doesn't support yet and so before I rebuild (ask Claude Code to rebuild and deploy) to try GLM.4-7-Flash with some other inference provider, does anyone have a model they use for infrastructure maintenance that isn't a little bitch? I will probably eventually go to an abliterated model if none of the open source ones will help.

There was something on HN recently about how to "trick" the open ones to help.

OK, I'll look around. Thanks!

What had Qwen3-coder rejected? Right now that seems like the strongest recommendation. GLM-4.7-Flash seems very promising but is so new.

Gemma3 is also very good. Nanbeige-4 is supposedly incredibly capable. Both are very small. https://huggingface.co/google/gemma-3-4b-it https://huggingface.co/Nanbeige/Nanbeige4-3B-Thinking-2511

Ideally IMO, you should probably build little tools or a multi-tool for doing the work you want done. Rather than having LLMs having to figure out what needs to be done, doing a more code mode style of development and giving the LLM's the ability to call your tool will be far faster and far more consistent with far lower resources. Tiny models like FunctionGemma will be able to take simple commands and get the work done, very fast, with very little resources. Anthropic wrote this up, citing also CloudFlare calling it Code Mode. https://blog.google/innovation-and-ai/technology/developers-... https://www.anthropic.com/engineering/code-execution-with-mc...

(Note that while Anthropic is suggesting MCP for their "code mode" direction, and while writing MCP's is super easy: writing a cli tool can have just as good as a results! And is often easier for humans to work with!)