One day after posting that I thought I had the problem solved, more was needed. Turns out that Linux was not honoring the mainboard BIOS’s aperture setting was was leading to stability on runs, crashes on others.
So we also added:
amdgpu.gartsize=2048
I have the nagging feeling that even this may not kill it once and for all, and there is every chance that in the end, we will have to drop back to Vulkan.
But for now, we seem to be golden, with Q6 weights and 128k Q8 context cache doing 42-46 tokens/second.
