MeatPack Algorithm

5 minute read Published 29 May 2024

Detailed walkthrough of a compression algorithm for G-code.

This post is the second in a series on binary G-code.

Content-aware Compressionđź”—

When compressing arbitrary data, you need a lossless compression algorithm like Deflate or LZMA. Both are general purpose and achieve good compression ratios on pretty much everything. Without any prior knowledge about the data, they need significant CPU and memory resources to figure out how to shrink it down. This is usually not an issue for modern commodity hardware.

However, it does matter in the G-code scenario where we are streaming data to a microcontroller for real time processing. Our resources are so limited that using even a few kilobytes of memory is hard to justify. Baking our own content-aware compression is a way to work around this; we know that we are sending G-code and only G-code. Take a look at the frequency distribution of the 16 most common characters in some example G-code files:

Histogram of the 16 most common G-code characters showing that digits, whitespace, period, and some characters like G are very common Generated with Python using Matplotlib, hist.py

They account for ~95.42% of the characters in a G-code file, so it’s possible to come up with some encoding that takes advantage of this.

MeatPackđź”—

MeatPack compresses G-code by leveraging these statistics. Common G-code characters are replaced with a shorter 4-bit representation using a static lookup table:

character4-bit code
0-90-9
.10
E11
\n12
G13
X14
otherN/A

Before going into specifics, let me drop some bullets on why this approach is a great fit for our microcontroller scenario:

Compressionđź”—

Characters are encoded in pairs of 2. The encoded representation can be 1 to 3 bytes depending on whether or not they are in the lookup table:

If the input has an odd number of characters, it is padded to an even number with a space so we always have pairs. If the input has lowercase E, G, or X, it is converted to uppercase so it will have a 4-bit code. G-code is case insensitive so this is harmless1.

1

ignoring G-code strings

Decompressionđź”—

Decompression is quite straightforward. The lookup table is inverted and we check for signal codes as we go:

Controlđź”—

MeatPack has some command sequences to control the decompression process. A sequence of [255, 255] indicates that the next byte will be a command byte. The defined command bytes are:

ByteCommandExplanation
251Enable packingTurns on MeatPack
250Disable packingTurns off MeatPack
249Reset allResets internal state. Usually sent at the end of MeatPack data
248Query config (?)Unknown, can be ignored
247Enable no space modeReplaces E with in the lookup table
246Disable no space modeReplaces with E in the lookup table

MeatPack is disabled by default so there will always be an “enable packing” command sequence before compressed data begins.

Comments and Stringsđź”—

This compression scheme is great when data fits the expected histogram. However, it doesn’t perform well on comments or quoted strings and can actually increase size by 50%. The histogram is completely different for these types so packing becomes unlikely. With that in mind, the encoder should disable MeatPack whenever a comment or string begins and send it literally.

What’s next🔗

My goal in understanding MeatPack was to port it to Rust so that svg2gcode can eventually read/write binary G-code. The port is available in the g-code crate. There are a few more pieces needed before binary G-code can be fully supported. The next post in this series will talk about another compression algorithm called HeatShrink.

Comment Icon Comments🔗