Detailed walkthrough of a compression algorithm for G-code.
This post is the second in a series on binary G-code.
When compressing arbitrary data, you need a lossless compression algorithm like Deflate or LZMA. Both are general purpose and achieve good compression ratios on pretty much everything. Without any prior knowledge about the data, they need significant CPU and memory resources to figure out how to shrink it down. This is usually not an issue for modern commodity hardware.
However, it does matter in the G-code scenario where we are streaming data to a microcontroller for real time processing. Our resources are so limited that using even a few kilobytes of memory is hard to justify. Baking our own content-aware compression is a way to work around this; we know that we are sending G-code and only G-code. Take a look at the frequency distribution of the 16 most common characters in some example G-code files:
Generated with Python using Matplotlib, hist.py
They account for ~95.42% of the characters in a G-code file, so it’s possible to come up with some encoding that takes advantage of this.
MeatPack compresses G-code by leveraging these statistics. Common G-code characters are replaced with a shorter 4-bit representation using a static lookup table:
character | 4-bit code |
---|---|
0-9 | 0-9 |
. | 10 |
E | 11 |
\n | 12 |
G | 13 |
X | 14 |
other | N/A |
Before going into specifics, let me drop some bullets on why this approach is a great fit for our microcontroller scenario:
Characters are encoded in pairs of 2. The encoded representation can be 1 to 3 bytes depending on whether or not they are in the lookup table:
first_code | (second_code << 4)
15
is used as a signal code to indicate that a character could not be packed and is represented by the next byte15
, indicating that they are represented by the two bytes that followIf the input has an odd number of characters, it is padded to an even number with a space so we always have pairs. If the input has lowercase E, G, or X, it is converted to uppercase so it will have a 4-bit code. G-code is case insensitive so this is harmless1.
ignoring G-code strings
Decompression is quite straightforward. The lookup table is inverted and we check for signal codes as we go:
MeatPack has some command sequences to control the decompression process. A sequence of [255, 255]
indicates that the next byte will be a command byte. The defined command bytes are:
Byte | Command | Explanation |
---|---|---|
251 | Enable packing | Turns on MeatPack |
250 | Disable packing | Turns off MeatPack |
249 | Reset all | Resets internal state. Usually sent at the end of MeatPack data |
248 | Query config (?) | Unknown, can be ignored |
247 | Enable no space mode | Replaces E with in the lookup table |
246 | Disable no space mode | Replaces with E in the lookup table |
MeatPack is disabled by default so there will always be an “enable packing” command sequence before compressed data begins.
This compression scheme is great when data fits the expected histogram. However, it doesn’t perform well on comments or quoted strings and can actually increase size by 50%. The histogram is completely different for these types so packing becomes unlikely. With that in mind, the encoder should disable MeatPack whenever a comment or string begins and send it literally.
My goal in understanding MeatPack was to port it to Rust so that svg2gcode can eventually read/write binary G-code. The port is available in the g-code crate. There are a few more pieces needed before binary G-code can be fully supported. The next post in this series will talk about another compression algorithm called HeatShrink.