Have you benchmarked against other 3-bit dynamic quants like Unsloth? I am sorry but this framing against a full precision, newer, smaller MoE just seems misleading. Also, Gemma-4-26B-A4B is not the SOTA for edge. Even at launch, that would be the 31B.
I can't find it. Can you state your performance versus comparable 3-bit quantization from Unsloth/Bartowski? Edit: I appreciate that you seem to have open-sourced the quantization pipeline. This is not to question your work, but to understand where the outputs stand relative to the SoTA for quantization.