ZIP File Quine

This is the ZIP file that never ends.

Droste.zip, 28KB

I did a proper writeup in 2005 or so, but the server died before archive.org found it, so it's gone. I might do a proper one at some point.

The LZ77 quine

DEFLATE (the standard zip compression algorithm) consists of two parts: LZ77(W) compression, followed by Huffman encoding(W).

LZ77 basically defines a simple language with just two operations: "print the following N bytes verbatim" and "repeat M bytes you have already written, starting N bytes from the end"

e.g. "BANANA" could be compressed as

Print the next 3 line(s)
B
A
N
Repeat 3 lines from the output, starting 2 lines from the end (i.e. from the A)
Note in particular that the copy overlaps with itself:
  B            B
  A            A
  N  \         N
 (A) \  A  --> A
     \  N      N
        A      A
Here's a proper quine, using the "banana" trick twice. The italic parts are "data" the first time around.
Print the next 0 lines (i.e. NOP)
Print the next 2 lines
Print the next 0 lines (i.e. NOP)
Print the next 2 lines
Print the next 0 lines (i.e. NOP)
Repeat 3 lines from the output, starting 2 lines from the end
Print the next 2 lines
Repeat 3 lines from the output, starting 2 lines from the end
Print the next 2 lines
Repeat 3 lines from the output, starting 2 lines from the end
(this is one of very few programming problems I know where you can get shorter code by allowing NOPs)

This quine only prints itself. To get a valid ZIP file, it has to print a valid zip header, itself, and a valid footer.

Surviving the Huffman step

The "copy" and "print" commands are encoded as a bit stream. This makes it a bit harder - you can only quote entire bytes at a time, so you need to synchornize with the bit stream every now and then. The "quote" command ("block type 00") forces byte alignment, so it's not a big problem.

Getting the CRC right

One last hurdle is the fact that the ZIP file format uses a CRC-32(W) check to guard against accidental damage to the file. The CRC applies to the uncompressed data, which means the file needs to contain its own CRC.

Each bit of the CRC is a linear function of the bits of the file, so you can reduce the problem to a set of 32 linear equations in GF2 (hence, easily solvable).

Posted March 29 2006
How to enumerate bits quickly using DeBruijn sequences.
GCC has some clever tricks up its sleeve for compiling the humble division operator.
Some simple bite-sized browser enhancements. Drag them to your bookmarks toolbar.
These little hacks were originally posted at the Medallia blog; since I no longer work there, I thought it would be more appropriate to keep them here.
More...
Nov 27 2008
15:18 ui\
15:18 wef
Nov 28 2008
22:08 cool, random art, right?
22:08 seen better, though
Nov 29 2008
19:35 wtf :)
Dec 1 2008
14:12 Kom til Trondheim da...
15:10 hi
Dec 3 2008
14:52 пп
Dec 4 2008
03:56 blabb
04:58 bleep
Dec 5 2008
06:10 yah!
Dec 11 2008
02:38 irc+code
Dec 12 2008
15:22 Not a search box, sorry. And the WAP IRC is gone forever.
Dec 28 2008
20:02 shit
Dec 31 2008
23:09 test - bb
Jan 3 2009
17:11 nice site

My CSS-fu is weak; please use a recent browser.

Some rights reserved.

Random, semi-related background image by Stampit.