Decoding Creality's locked Klipper fork
Table of contents
I bought a Creality K2 Plus because I wanted a big enclosed printer that runs Klipper out of the box. What I did not bargain for was that the Klipper it runs is a fork with the interesting parts compiled into a 1.88 MB Cython binary and a handful of smaller .so modules around it, and that some of those interesting parts disagree with what the touchscreen says.
This is the short version. The long version lives in a repo that does not have a public face yet, because I would like the printer to keep working through the process.
The bait#
What pushed me to look closer was a PETG clog. Stock part-fan profile, stock filament, stock everything, and the extruder backed up mid-print on a tall straight wall. The touchscreen suggested wet filament. The filament was not wet.
After a few cold pulls and re-tunes I traced the actual cause: the fan profile cooled the hotend block below setpoint during overhangs. The heater could not catch up, back-pressure spiked, the extruder gave up. Capping the part fan at 45% and the overhang fan at 80% made it disappear.
Stock max fans are 80% and 100%. The hardware can do those numbers. The firmware accepts them. There is no scenario on this hotend where those values are correct for PETG. The menus and the slicer cheerfully agree on a setting that will reliably destroy a print.
That is the moment I went looking at the firmware.
What is actually inside#
The K2 Plus runs an Allwinner T113-S3. Two Cortex-A7 cores at 1 GHz. 512 MB of RAM. eMMC with an A/B rootfs setup. The user-facing software stack is eight binaries that talk via POSIX shared memory and Unix sockets:
master-server conductor; AI detection, scheduling, shared-memory writer
display-server LVGL on /dev/fb0 + Goodix touch
web-server HTTP API on :80, :443, :9998, :9999 (libhv)
app-server MQTT to Creality Cloud
audio-server buzzer / aplay
wifi-server wpa_supplicant wrapper
upgrade-server OTA orchestration
Monitor a watchdog that supervises the seven above
Underneath sits Python Klipper, pinned to CPU1, OOM score -500. Underneath that sits the Cython.
box_wrapper.cpython-39.so(1.88 MB) runs the multi-filament system. 251 methods, 9 classes.motor_control_wrapper.cpython-39.so(1.0 MB) is closed-loop stepper control over RS485.prtouch_v3_wrapper.cpython-39.so(1.3 MB) is the bed pressure probe.serial_485_wrapper.cpython-39.so(140 KB) is the RS485 driver itself.
Cython compiles Python down to C and then to a shared object. The names and method tables survive. The bodies are machine code. None of it is upstream Klipper.
There is also a HiFi4 DSP and a RISC-V E906 core on the SoC, both with their own A/B partitions, both unused by the print firmware. Two unused cores in a 4 EUR chip is a small reminder of how much of modern silicon is just there in case anyone wants it.
How to read it without the source#
The standard advice for an .so you cannot read is "use Ghidra". For Cython that mostly does not work, because the body is one giant state machine through __Pyx_* helpers and the interesting structure is in the Python type slots, not the function bodies.
Four techniques carried most of the weight.
MockConfig injection. Build a fake Klipper config object that logs every config.get*(name, default) call. Hand it to the module. You get back every config field the module reads, plus the default the binary expects for it. Zero disassembly, full config map.
SmartSelf proxy. A __getattribute__ proxy that returns real values for known attributes and tracks every other access. Pass it to a method that runs purely in Python (cal_flush_list for example). You see exactly which attributes get touched in what order. That tells you the call graph from the outside.
Live introspection. import box_wrapper; obj = BoxAction(...); inspect.signature(obj.motor_send_data). Static decode said the signature was (self, addr, cmd, timeout, state=0, data=b'', retries_en=False). Live introspection said state=b'\x00' and retries_en=True. Both defaults were wrong in the static decode. A thirty-second import was correct. Always do the cheap check before the expensive one.
ARM disassembly only where it pays. Cython compiles getattr as a load from a slot table. Every call to box_action.motor_send_data in the consumer binary compiles to an ldr r1, [Rn, #-3020], because slot 3020 is where the attribute name is cached. Grep the disassembly for that exact instruction encoding, cross-reference with the PyMethodDef table, and you have every call site of every method in the consuming binary with one grep and one awk. The technique generalises to any Cython module.
The first thing it taught me#
24 distinct call sites of motor_send_data in motor_control_wrapper.so. 17 of them expected a readback: write a parameter, read the same register back, raise an error if the value did not match.
I had tried, earlier and embarrassingly, to synthesise motor_send_data for motor addresses to dodge a homing failure. The synthetic ACK returned a fixed payload of zero bytes. That answered the 7 fire-and-forget call sites correctly. The 17 readback sites all raised key803 "axis step init fails", because zero is not what they wrote.
Fast paths through a hot-path API only work if you enumerate every consumer of that API first. Counting the call sites in one method and assuming that was the full set is the kind of mistake that, in hindsight, looks obvious. In foresight it just looked like a clever shortcut.
The lies the menus tell#
A handful of places where the screen and the firmware disagree about reality.
Belt tension displays a number from a strain gauge whose threshold is hard-coded in belt_mdl.py. Replace the binary with an open-source equivalent and forget the threshold table, and the belt-tension menu will show two healthy bars while one belt is slack.
Bed temperature on the touch UI is low-pass filtered. The raw value is honest. For PID tuning, read klippy.log, not the screen.
Auto-addressing on the toolhead RS485 bus uses a Creality-specific handshake. Strip it out without replacing it and the toolhead silently falls back to a non-functional pseudo-address. The firmware does not surface this as an error. It just stops working.
CFS extrude state machines look like single commands from the outside (extrude_process(slot, stage, param)). Stock implements them as a multi-stage state machine via the communication_* wrapper layer. Collapse the layers into the gcode handler and you skip the retries, the timeout backoff, the unsolicited-frame routing, and the dirty-flag updates. The happy path works. Anything that depends on a retry does not.
This is not malicious. It is the cost of a vertically integrated product where the same team writes the firmware, the screen, and the slicer profile, and the assumptions leak between layers.
Where I actually am#
The belt-tension module is replaced and running in production. 280 lines of pure Python where the stock was 621 lines plus a closed binary. The CFS replacement runs first-cycle homing, all the queries, all the buffer state, and the full extrude state machine, verified against captured stock traces.
A Z-homing failure on the second cycle still blocks deployment. Current best guess is GIL latency in the main reactor thread: Python callbacks holding the thread longer than the stock Cython equivalents, and the nozzle MCU's response-timeout window missing. The test plan exists. The proof does not.
The interesting part, more than the modules themselves, is that the techniques transfer. The Cython slot-load trick gives you every call site of every attribute in any binary. MockConfig gives you the config surface of any module that uses Klipper's config object. Live introspection beats static decode every time you can run the code. Together they give you a way to read closed Python machinery without ever opening Ghidra.
The story continues. The repo lives at ~/k2-open-klipper and the document at the centre of it is called KNOWN_MISTAKES.md, currently at twenty-one entries. That is the part of the project most likely to be useful to someone else.