How to Create HTML/ZIP/PNG Polyglot Files

by gildason 12/27/2024, 11:10 PMwith 37 comments

by Retr0idon 12/28/2024, 12:28 AM

> a bug in “Archive Utility” on macOS prevents it from decompressing the resulting file

I looked into this in the past, it's because they check for a "PK" header at the start of the file - which is of course not actually required. I assumed it was deliberate because it does exclude most "weird" ZIPs.

By the way, if you're interested in this sort of file format wrangling, check out Ange Albertini's talk tomorrow at 38c3: https://fahrplan.events.ccc.de/congress/2024/fahrplan/talk/Q...

by gildason 12/28/2024, 1:13 AM

Note that you can also take advantage of the fact that a ZIP can be password-protected and make your web page secret! For example https://gildas-lormeau.github.io/private/ (password: "thisisapage").

by zzo38computeron 12/28/2024, 10:47 PM

I would probably prefer to use text other than "Please wait..." since it won't work if JavaScripts are disabled. This can be fixed by changing the text to something such as "This is a HTML/ZIP/PNG polyglot file". And then, omit the <title> to save space.

The URL jar:https://raw.githubusercontent.com/gildas-lormeau/Polyglot-HT... can be used to display the HTML file in some web browsers, although it cannot display the PNG file in this way since it uses # as the URL of the picture.

by OkGoDoIton 12/28/2024, 1:18 AM

I was hoping for an example PNG on the webpage to showcase that it actually works. I’m on my phone so I can’t do much with a downloaded zip file. But it would be cool to see that the PNG renders like a normal image on Safari mobile.

by Dwediton 12/28/2024, 1:52 AM

I think there's probably a much more efficient way to pack the correction data than JSON. For example, if you wanted to embed a 10MB video file in there, the correction data would be huge.

In the project there, correction data is used to recover bytes that have been changed into LF when they are actually CR or CRLF.

One idea is to store the correction data as binary, then read two bits every time you see a LF byte. It's either an actual LF, a CR, or a CRLF. The downside is that binary data itself could need correction as well, and encoding nearly 1-bit data in 2 bits is still wasteful (but simple). Packing five 3-state values into a byte is less wasteful and would eliminate forbidden symbols, but is still not optimal.

by porridgeraisinon 12/28/2024, 6:32 AM

> However, there’s a problem: due to the same-origin policy, retrieving ZIP data directly with fetch(””) fails when the page is opened from the filesystem (except in Firefox).

  chromium --allow-access-from-files

by lifthrasiiron 12/28/2024, 6:47 AM

> The bootstrap page is now encoded in windows-1252, which allows data to be read from the DOM with minimum degradation.

This is not always the case if the encoded content happens to have `-->`, for example. A better approach would be the `<plaintext>` element which can never be closed.

by nhinck3on 12/28/2024, 6:42 AM

I don't think need any external libraries to do this anymore with DecompressionStream.

by EmileSonneveldon 12/28/2024, 10:20 PM

Could they embed “zip.min.js” too? It is not a single file otherwise