It started as one of those weekends that you want back. Everyone’s been sick since Friday, O— worst of all. Sounds terrible, feels terrible, has hardly come out of his room, isn’t eating. M— started to feel rotten, too. And the same for dad.
And I have been trying to wrap up the first stage of the technology “rebuild” and return to the “user” role. There was a time when I wanted to hack all the time just to say I was hacking but that ended, like, 20 years ago. Now I sort of want to get things done.
The first hurdle in all of this was just getting three graphics cars into a case (two of them exceptionally long) and powered and cooled (two of them are designed for directional air server farms and don’t have active cooling). Then the next hurdle was getting reliable, performance inference going with a model I liked. That turned out to be Qwen 3.5, though not without hiccups.
Then I got stuck on the next two things:
-
Plug OpenClaw into the local inference
-
Get Mac OS booting again (that’s right, dual-booting a Hackintosh with wild hardware)
Both of these things needed to be done, and both were non-trivial tasks. So to make my weekend worse, on top of everyone feeling various degrees of sick, Friday night and yesterday I spent most of my time feeling lousy and alternating between sleep and banging on these two problems.
— § —
I must have spent six hours yesterday on the Mac problem alone, and I have no idea how many times I was like:
<press reset>
<select Linux>
<log in>
$ mkdir /tmp/EFI
$ sudo mount -t auto /dev/sdc1 /tmp/EFI
<enter password>
$ nano /tmp/EFI/EFI/OC/config.plist
Eventually I got frustrated enough to enlist the help of Claude (my own agent being down, more on this momentarily) to scour the web looking for answers. All kinds of wild stuff was tried in the OpenCore plist and with ACPI tables and so on. No matter what, it seemed like there was a hard hang around PCI enumeration, and I presumed the two v620s with their massive footprints in address space were the cause. Into late last night, between Claude and I, we just couldn’t get past PCI enumeration and mapping.
Claude was telling me that it was time to give up, it just wasn’t on the cards, the Mac OS IOPCI stuff couldn’t cope with what I was trying to do, that Mac OS had never been intended to work with. We had spent hours trying to turn off PCIe slots, build SSDTs, directly patch memory with OpenCore shells scripts on bootup, build .efi modules to rearrange PCI windows, but nothing seemed to work, and in fact nothing ever really seemed to change.
— § —
In the wee hours I pivoted over to OpenClaw, where I’d been having enough trouble with Qwen that I’d started downloading and trying instead, always to disappointment. The thing is, Qwen 3.5 35B A3B is just really good in certain ways. It’s blazing fast. It’s really smart. And it was perfect on my 76GB of VRAM (more on this in a bit) in that I could fit like an Unsloth dynamic q6_XL on it and have room for a massive context window at q8. It’s local inference that doesn’t feel like local inference.
Only, the other thing is, turns and tool calling were completely broken. Like, completely. It had been hard enough to get the model up and running—all kinds of errors, apparent incompatibilities, updates… and then we ended up in a state in which it would often lose the ability to call tools altogether, and would just pitifully announce it was going to do so, then do nothing, then apologize profusely when asked, then announce it was going to succeed this time, then do nothing, then say it couldn’t understand what was happening.
Of course, being new to running my own LLM, I was leaning heavily on LLMs to help me through, and they were coming up with all kinds of ideas and all kinds of opinions about Openclaw. We must have tried 10 different versions of Openclaw and 100 different (often hallucinated) configuration options for openclaw.json. We also tried vLLM (don’t bother with RDNA2, the tensor parallelism really requires better card-to-card links than PCIe, something that’s just not in the architecture, plus it really fights with just about every version of ROCm), Aphrodite, Ollama, LM Studio, a bunch of everyone’s hand-written proxies, and about six different front-ends for chat.
Nothing had worked, and I was very tired of complaining at LLMs while they complained at me as we tried to get my own LLM to behave with my own agent.
Then, as is so often the case, a less direct search somehow opened up the channels of serendipity enough to make a massive difference. I searched Google for “local models similar to Qwen 3.5 35b but with more stable tool calling” and then, without warning, Gemini opened up with all of the answers that it had previously hidden from me in our hours of chatting and banging on openclaw.json and template.jinja files. It suddenly just rattled off a list:
-
Qwen 3.5 MoE models mix tool calling outputs into <think> phases, which nothing on earth can handle right now
-
Qwen 3.5 is also a bit of a wild child as you increase temperature and response length (which I have liked), but that wildness includes its approach to JSON and XML
-
Llama.cpp and Openclaw both have aggressive timeouts
-
All together this means turns that platforms like Openclaw misinterpret and may even miss, leaving turns or entire sessions in undefined states
-
Solution: Set 20 minute timeouts everywhere, turn of Qwen’s thinking mode at the inference level, and reduce temperature from 1+ to 0.7 or less
I implemented these fixes in about five minutes in the middle of the night, and voila, Qwen 3.5 35B A3B was suddenly a rock solid tool user in Openclaw. Like WTF. But also, in a good way.
Well, obviously what I did is put it to work on my entire chat summaries with Claude and GPT working on the Hackintosh problem and ask it to build me a library of findings about each area of research, as well as a final THEORY.md outlining its own theory, based on all the research, of just what was breaking in Mac OS so that we could try to fix it, since Claude and GPT were out of ideas.
— § —
So today, I came back in the morning, and boom. Nestled in the guts of THEORY.md was the point that just because the hang was around where Mac OS was spitting out that it was initializing PCI, that did not mean that this was causing the hang, and since my BIOS version (already checked) was good and my BIOS settings were right and the config.plist had been tested a million times, and all the effort spend trying to get the card to disappear for Mac OS in one way or another had not worked, it seemed likely that it was a kext, not the kernel, that was causing a hang, and the proximity of this hang to output about PCI enumeration had fooled us all.
And, at that moment, it all fell into place. I had opted for a display card with the same architecture ad the two Radeon Pro v620 units, with the idea that we could get another 12GB that way for inference. But what it also meant was that NootRX (the kext that enables certain Navi-baed hardware) might be choking on the cards in kextland, rather than this being about BARs and address space, etc.
Pulled the NootRX source and that was the answer. NootRX grabs one Navi card, whichever is the first it finds on the bus, and tries to use it for display.
It was likely grabbing a v620 first and we might not even be hung at all, there just aren’t any display outputs on a v620. And since we had the source, we had the power to fix the kext…
— § —
So anyway, tonight is here, and:
-
Yes, inference is up and running
-
Yes, Openclaw is up and running on it, including solid, stable, tool use and long projects
-
Yes, Mac OS and Linux are dual booting again on this now wilder-ass PC with 76GB VRAM, 128GB DRAM, 3 RDNA2 GPUs, and 16 hardware CPU threads
-
I can move on
Point being, the weekend ended on a high note, and I’m that much closer to re-architecting my computing and data life.
— § —
Next up:
-
OwnCloud and a pile of non-HFS+ storage on a dedicated host
-
Plus migrate a bunch of data off of the seemingly infinite universe of HFS+ I have here
It’s happening. Local inference and a future trajectory that is on Linux once again. Meanwhile, it’s late at night (or even wee hours) with the work week ahead. Not even sure this is coherent. So… end.
