Nevermind. And forget ROCm.

March 26, 2026
1:04 am

So that was several posts in a row where I was Eurekaing about figuring out how to finally get ROCm stable, shortly after which… it would break again without warning.

The thing about ROCm that I have now fully internalized is that I don’t understand it, and it may be made in such a way that, or relying on things that, I simply don’t understand. It has periods of meta-stability.

You’ll hunt for new things, new SSDT files, new memory mappings, new kernel parameters, build options, versions, firmware files, etc. And right after a change, it will work!

“AHA!” you say, “I have found the magic thing I was missing. It’s smooth sailing from here on out!”

And it will work all day. Or maybe all week. You might push a couple million tokens through it. And then suddenly, it crashes. And once it has crashed, thereafter it will immediately crash again every time you start it. You’ll swear and wonder why. You’ll be frantic.

“This was just working. I used it all week! What’s changed?! If it was just random crashes that would be one thing, but we went from rock solid all week to it crashes every single start before first turn and I didn’t change a thing!”

You’ll reboot, clear CMOS, do card resets with rocm-smi… and eventually, though it doesn’t make any sense, start hunting for that next “tweak” that must actually be the one that you need.

And hours of searching and experimenting later, you’ll suddenly “find” one that enables you to start again. And not only does it start, but it’s stable! I can ask it to do a task of 50 tool calls and it just rolls! Huzzah!”

Only no. In a day, or a week, it’ll suddenly stop and be unstartable again.

No more, I’m done.

— § —

Previously Vulkan had been rock sold every time, completely no drama, but I’d hesitated to use it because it was about 50% slower than ROCm, which seemed like too much to trade away… better to keep trying until I “figured ROCm out.”

But then I found someone who suggested removing the display card from the devices stack for Vulkan, and when I did… Boom. Not a 50% difference, but about a 5% difference. For no drama. And about 80% less power usage.

I’m sold. I’m Vulkan now on the two v620s and Qwen 3.5 35B A3B q8 (Bartowski). On the RX 6700XT, I’ll throw like a text and a visual embedding model maybe. Simple tools that will be of use.

— § —

Meanwhile, also worth keeping track of: every single Qwen 3.5 template out there is broken for llama.cpp because, it turns out, with Qwen the expectation is that the template does a bunch of fancy things to clean up, but llama.cpp doesn’t support Jinja, only a subset called Minja because until 5 minutes ago that sort of “eh whatevs we’ll do it in template) logic wasn’t really considered viable.

That creates a problem: all the templates are for jinja, but I needed a template for minja (in llama.cpp) so that we could get tool use working properly.

So I had Claude Opus review a bunch of the existing community jinja templates and then pick one to refactor, to the extent possible, for minja.

Here you go: template.jinja

Key and somewhat vexing note, for this to work properly, you also need to to into src/llama-grammar.cpp and change the value of MAX_REPETITION_THRESHOLD from 2,000 to:

#define MAX_REPETITION_THRESHOLD 200000

Because OpenClaw uses like six million parameters for some of its tools and that explodes the grammar enforcer (I still don’t fully understand whether this is a part of minja or not by convention), so once you have a template that actually works instead of breaks, it will seem to be a template that breaks instead of works until you recompile with that constant defined.

— § —

I started all of this nearly two months ago. It’s taken a long time, but we’re finally feeling like we’re getting to the point where I can start to build soon, rather than just be playing with toys.