APK, the strangest format

I use Alpine Linux for a number of things at my employer. One question I needed an answer to recently was how exactly the APK format works. There are plenty of good guides from the Alpine team explaining how to package software using the APKBUILD format. That uses the abuild tool produce an APK. This tool is quite nice, you can view the source code here. The only section you need to understand to understand the APK format is this

msg "Create $apk"
mkdir -p "$REPODEST/$repo/$(arch2dir "$subpkgarch")"
cat control.tar.gz data.tar.gz > "$REPODEST/$repo/$(arch2dir "$subpkgarch")/$apk"

An APK is the concatenation of two GZIP'd tarballs. The file data.tar.gz is the actual files that are to be installed on the file system. The control.tar.gz is metadata about the package. You can download the GRUB 2.12 APK for Alpine 3.21 directly and see this

$ wget https://mirrors.gigenet.com/alpinelinux/v3.21/main/x86_64/grub-2.12-r7.apk
--2025-01-04 08:48:38--  https://mirrors.gigenet.com/alpinelinux/v3.21/main/x86_64/grub-2.12-r7.apk
Resolving mirrors.gigenet.com (mirrors.gigenet.com)... 2001:1850:f000:f000:f000:f000::, 69.65.16.171
Connecting to mirrors.gigenet.com (mirrors.gigenet.com)|2001:1850:f000:f000:f000:f000::|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 5623552 (5.4M) [application/octet-stream]
Saving to: ‘grub-2.12-r7.apk’

grub-2.12-r7.apk                                                    100%[==================================================================================================================================================================>]   5.36M  12.1MB/s    in 0.4s

2025-01-04 08:48:39 (12.1 MB/s) - ‘grub-2.12-r7.apk’ saved [5623552/5623552]

$ file grub-2.12-r7.apk
grub-2.12-r7.apk: gzip compressed data, max compression, from Unix, original size modulo 2^32 12021760
$ gzip -c -d grub-2.12-r7.apk | tar tvf - | head -n 10
-rw-r--r-- 0/0             512 2025-01-01 04:54 .SIGN.RSA.alpine-devel@lists.alpinelinux.org-6165ee59.rsa.pub
-rw-r--r-- root/root      1687 2025-01-01 04:54 .PKGINFO
-rwxr-xr-x root/root       442 2025-01-01 04:54 .post-upgrade
-rwxr-xr-x root/root       122 2025-01-01 04:54 .trigger
drwxr-xr-x root/root         0 2025-01-01 04:54 etc/
drwxr-xr-x root/root         0 2025-01-01 04:54 etc/default/
tar: Ignoring unknown extended header keyword 'APK-TOOLS.checksum.SHA1'
-rw-r--r-- root/root        94 2025-01-01 04:54 etc/default/grub
tar: drwxr-xr-x root/root         0 2025-01-01 04:54 etc/grub.d/
Ignoring unknown extended header keyword 'APK-TOOLS.checksum.SHA1'
-rwxr-xr-x root/root      8718 2025-01-01 04:54 etc/grub.d/00_header
tar: Ignoring unknown extended header keyword 'APK-TOOLS.checksum.SHA1'
tar: -rwxr-xr-x root/root     11644 2025-01-01 04:54 etc/grub.d/10_linux
Ignoring unknown extended header keyword 'APK-TOOLS.checksum.SHA1'

By asking gzip to decompress the file to standard out and passing it to the tar command the contents of both tar files is visible. One interesting thing is that the GZIP format is not self delimiting. The wikipedia article explains this in detail. The format consists of

SectionLengthDescription
header10 bytesStarts with 0x1f8b08
extra headersvariableas indicated by the header
bodyvariablethis is the DEFLATE compressed payload
trailer8 bytesCRC-32 and the length of the uncompressed data

Looking at this, there is no way to simply skip a section since there is nothing in the format indicating the length of the GZIP section. This means that in order to split the compressed sections apart, you have to actually decompress the data. So in order to split out an APK file, I need to identify each GZIP section as it is decompressed. I was able to modify the existing gzip.decompress function from Python's standard libraries to do this

 split_apk.py 1.3 kB

import gzip
import zlib
import io
import struct
import sys

def decompress(data):
    """Decompress a gzip compressed string in one shot.
    Return the list of decompressed sections.
    """
    decompressed_members = []
    while True:
        fp = io.BytesIO(data)
        if gzip._read_gzip_header(fp) is None:
            return decompressed_members
        # Use a zlib raw deflate compressor
        do = zlib.decompressobj(wbits=-zlib.MAX_WBITS)
        # Read all the data except the header
        decompressed = do.decompress(data[fp.tell():])
        if not do.eof or len(do.unused_data) < 8:
            raise EOFError("Compressed file ended before the end-of-stream "
                           "marker was reached")
        crc, length = struct.unpack("<II", do.unused_data[:8])
        if crc != zlib.crc32(decompressed):
            raise BadGzipFile("CRC check failed")
        if length != (len(decompressed) & 0xffffffff):
            raise BadGzipFile("Incorrect length of data produced")
        decompressed_members.append(decompressed)
        data = do.unused_data[8:].lstrip(b"\x00")

with open(sys.argv[1], 'rb') as fin:
    for i, section in enumerate(decompress(fin.read())):
        sys.stdout.write("%d\n" % (len(section),))
        with open("part_%d" % (i,), 'wb') as fout:
            fout.write(section)

Running this produces the following output

$ python3 split_apk.py grub-2.12-r7.apk
1024
7680
12021760

Surprisingly this file seems to consist of three GZIP'd sections. This is because the control.tar.gz file is actually produced by using the abuild-sign command. This script has some code that does this

sig=".SIGN.$sigtype.$keyname"
$openssl dgst $dgstargs -sign "$privkey" -out "$sig" "$i"

if [ -n "$SOURCE_DATE_EPOCH" ]; then
    touch -h -d "@$SOURCE_DATE_EPOCH" "$sig"
fi

tmptargz=$(mktemp)
tar --owner=0 --group=0 --numeric-owner -f - -c "$sig" | abuild-tar --cut | $gzip -n -9 > "$tmptargz"
tmpsigned=$(mktemp)
cat "$tmptargz" "$i" > "$tmpsigned"
rm -f "$tmptargz" "$sig"
chmod 644 "$tmpsigned"
mv "$tmpsigned" "$i"
msg "Signed $i"

So the final APK file is actually three GZIP'd sections with each one being a TAR file. So the APK format just gets stranger. The section I am actually interested in is always the second section.

$ tar tvf part_1
-rw-r--r-- root/root      1687 2025-01-01 04:54 .PKGINFO
-rwxr-xr-x root/root       442 2025-01-01 04:54 .post-upgrade
-rwxr-xr-x root/root       122 2025-01-01 04:54 .trigger

The .PKGINFO file contains the information about this package. We can extract that file and view it, the file format is just text.

$ tar xf part_1 .PKGINFO
$ cat .PKGINFO
# Generated by abuild 3.14.1-r3
# using fakeroot version 1.36
pkgname = grub
pkgver = 2.12-r7
pkgdesc = Bootloader with support for Linux, Multiboot and more
url = https://www.gnu.org/software/grub/
builddate = 1735728888
packager = Buildozer <alpine-devel@lists.alpinelinux.org>
size = 11934332
arch = x86_64
origin = grub
commit = 5e1b96f77ac1a263d90609bf4166dd918815ad5d
maintainer = Timo Teräs <timo.teras@iki.fi>
license = GPL-3.0-or-later
depend = initramfs-generator
depend = /bin/sh
triggers = /boot
# automatically detected:
provides = cmd:grub-bios-setup=2.12-r7
provides = cmd:grub-editenv=2.12-r7
provides = cmd:grub-file=2.12-r7
provides = cmd:grub-fstest=2.12-r7
provides = cmd:grub-glue-efi=2.12-r7
provides = cmd:grub-install=2.12-r7
provides = cmd:grub-kbdcomp=2.12-r7
provides = cmd:grub-macbless=2.12-r7
provides = cmd:grub-menulst2cfg=2.12-r7
provides = cmd:grub-mkconfig=2.12-r7
provides = cmd:grub-mkimage=2.12-r7
provides = cmd:grub-mklayout=2.12-r7
provides = cmd:grub-mknetdir=2.12-r7
provides = cmd:grub-mkpasswd-pbkdf2=2.12-r7
provides = cmd:grub-mkrelpath=2.12-r7
provides = cmd:grub-mkrescue=2.12-r7
provides = cmd:grub-mkstandalone=2.12-r7
provides = cmd:grub-ofpathname=2.12-r7
provides = cmd:grub-probe=2.12-r7
provides = cmd:grub-reboot=2.12-r7
provides = cmd:grub-render-label=2.12-r7
provides = cmd:grub-script-check=2.12-r7
provides = cmd:grub-set-default=2.12-r7
provides = cmd:grub-sparc64-setup=2.12-r7
provides = cmd:grub-syslinux2cfg=2.12-r7
provides = cmd:update-grub=2.12-r7
depend = so:libc.musl-x86_64.so.1
depend = so:libdevmapper.so.1.02
depend = so:liblzma.so.5
datahash = 3149f0c0e2c7629ebee4c26ab7e4eb06ebc840aac307a8bdf71416d4619f3881

The format of this is fairly straightforward, so we can just modify the prior script to extract this data in a formatted manner

 apk_pkginfo.py 2.1 kB

import gzip
import zlib
import io
import struct
import sys
import tarfile
import json

def decompress(data):
    """Decompress a gzip compressed string in one shot.
    Return the list of decompressed sections.
    """
    while True:
        fp = io.BytesIO(data)
        if gzip._read_gzip_header(fp) is None:
            return
        # Use a zlib raw deflate compressor
        do = zlib.decompressobj(wbits=-zlib.MAX_WBITS)
        # Read all the data except the header
        decompressed = do.decompress(data[fp.tell():])
        if not do.eof or len(do.unused_data) < 8:
            raise EOFError("Compressed file ended before the end-of-stream "
                           "marker was reached")
        crc, length = struct.unpack("<II", do.unused_data[:8])
        if crc != zlib.crc32(decompressed):
            raise BadGzipFile("CRC check failed")
        if length != (len(decompressed) & 0xffffffff):
            raise BadGzipFile("Incorrect length of data produced")
        yield(decompressed)
        data = do.unused_data[8:].lstrip(b"\x00")

metadata = {}
with open(sys.argv[1], 'rb') as fin:
    for i, section in enumerate(decompress(fin.read())):
        if i != 1: # first section is signature, second is package metadata, third is package data
            continue
        sio = io.BytesIO(section)
        star = tarfile.TarFile(name=None, mode='r', fileobj=sio)
        for tarinfo in star:
            if tarinfo.name != '.PKGINFO':
                continue
            v = star.extractfile(tarinfo)
            v = v.read().decode('utf-8')
            for line in v.split("\n"):
                line = line.strip()
                if len(line) == 0:
                    continue
                if line[0] == '#':
                    continue
                parts = [part.strip() for part in line.split('=',1)]
                k = parts[0]
                entry = metadata.get(k)
                if entry is None:
                    entry = []
                entry.append(parts[1])
                metadata[k] = entry

json.dump(metadata, sys.stdout, indent=1)

Running this produces JSON output

$ python3 apk_pkginfo.py grub-2.12-r7.apk
{
 "pkgname": [
  "grub"
 ],
 "pkgver": [
  "2.12-r7"
 ],
 "pkgdesc": [
  "Bootloader with support for Linux, Multiboot and more"
 ],
 "url": [
  "https://www.gnu.org/software/grub/"
 ],
 "builddate": [
  "1735728888"
 ],
 "packager": [
  "Buildozer <alpine-devel@lists.alpinelinux.org>"
 ],
 "size": [
  "11934332"
 ],
 "arch": [
  "x86_64"
 ],
 "origin": [
  "grub"
 ],
 "commit": [
  "5e1b96f77ac1a263d90609bf4166dd918815ad5d"
 ],
 "maintainer": [
  "Timo Ter\u00e4s <timo.teras@iki.fi>"
 ],
 "license": [
  "GPL-3.0-or-later"
 ],
 "depend": [
  "initramfs-generator",
  "/bin/sh",
  "so:libc.musl-x86_64.so.1",
  "so:libdevmapper.so.1.02",
  "so:liblzma.so.5"
 ],
 "triggers": [
  "/boot"
 ],
 "provides": [
  "cmd:grub-bios-setup=2.12-r7",
  "cmd:grub-editenv=2.12-r7",
  "cmd:grub-file=2.12-r7",
  "cmd:grub-fstest=2.12-r7",
  "cmd:grub-glue-efi=2.12-r7",
  "cmd:grub-install=2.12-r7",
  "cmd:grub-kbdcomp=2.12-r7",
  "cmd:grub-macbless=2.12-r7",
  "cmd:grub-menulst2cfg=2.12-r7",
  "cmd:grub-mkconfig=2.12-r7",
  "cmd:grub-mkimage=2.12-r7",
  "cmd:grub-mklayout=2.12-r7",
  "cmd:grub-mknetdir=2.12-r7",
  "cmd:grub-mkpasswd-pbkdf2=2.12-r7",
  "cmd:grub-mkrelpath=2.12-r7",
  "cmd:grub-mkrescue=2.12-r7",
  "cmd:grub-mkstandalone=2.12-r7",
  "cmd:grub-ofpathname=2.12-r7",
  "cmd:grub-probe=2.12-r7",
  "cmd:grub-reboot=2.12-r7",
  "cmd:grub-render-label=2.12-r7",
  "cmd:grub-script-check=2.12-r7",
  "cmd:grub-set-default=2.12-r7",
  "cmd:grub-sparc64-setup=2.12-r7",
  "cmd:grub-syslinux2cfg=2.12-r7",
  "cmd:update-grub=2.12-r7"
 ],
 "datahash": [
  "3149f0c0e2c7629ebee4c26ab7e4eb06ebc840aac307a8bdf71416d4619f3881"
 ]

Can we do better than GZIP compression for APKs?

The abuild tool always produces APK files that are compressed using gzip -9 or similar. This compression method is relatively fast to compress and very fast to decompress. However, the compression ratio is not always so great. For example, let's look at the linux kernel package from Alpine 3.21

$ wget https://mirrors.gigenet.com/alpinelinux/v3.21/main/x86_64/linux-lts-6.12.8-r0.apk
--2025-01-04 10:21:37--  https://mirrors.gigenet.com/alpinelinux/v3.21/main/x86_64/linux-lts-6.12.8-r0.apk
Resolving mirrors.gigenet.com (mirrors.gigenet.com)... 2001:1850:f000:f000:f000:f000::, 69.65.16.171
Connecting to mirrors.gigenet.com (mirrors.gigenet.com)|2001:1850:f000:f000:f000:f000::|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 103609629 (99M) [application/octet-stream]
Saving to: ‘linux-lts-6.12.8-r0.apk’

linux-lts-6.12.8-r0.apk                                             100%[==================================================================================================================================================================>]  98.81M  27.7MB/s    in 5.8s

2025-01-04 10:21:43 (17.0 MB/s) - ‘linux-lts-6.12.8-r0.apk’ saved [103609629/103609629]

$ python3 split_apk.py linux-lts-6.12.8-r0.apk
1024
2560
123873280

The APK file is 103609629 bytes, around 99 megabytes. The individual sections total up to 123876864 bytes, around 118 megabytes. So

$$ 103609629 / (1024+2560+123873280) = 0.836 $$

So the compressed output is around 83.6% of the original size. One simple idea is to use the pigz tool with the zopfli compression algorithm option. This is done by running pigz -11. This produces GZIP compatible output, but often achieves a better compression ratio. The first two sections are unlikely to compress much better, given their small size and the simple nature of the data. After compression with this I wound up with the following

partoriginal sizezopfli sizepercentage of original
signature102464963.3%
metadata256052020.3%
data12387328010276298182.9%

The total compressed size is 102764150 bytes, or 82.9% of the original. This is not even a 1% gain over just using gzip -9. The only compression format supported by the APK tooling is GZIP. But what if we can end run the entire compression method used? We can "compress" each section with pigz -0 which produces a section that is technically a valid GZIP section, but not actually compressed in any meaningful manner. We can then recombine that and compress it with another compression tool. I tried this with several common tools

methodcommmandoutput size (bytes)percentage of original
bzip2bzip210463652684.4%
zstdzstd --ultra --threads=010563008185.2%
xzxz --extreme -910113767681.6%

None of these methods do particularly well, with xz just barely achieving a noticeable compression increase. As it turns out, we can look at the original linux kernel APK and figure out why

$ tar tvf linux-lts-6.12.8-r0.apk 2> /dev/null | grep '\.gz' | head -n 10
-rw-r--r-- root/root      7235 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/aegis128-aesni.ko.gz
-rw-r--r-- root/root     36120 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/aesni-intel.ko.gz
-rw-r--r-- root/root      5828 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/blowfish-x86_64.ko.gz
-rw-r--r-- root/root      9320 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/camellia-aesni-avx-x86_64.ko.gz
-rw-r--r-- root/root      9868 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/camellia-aesni-avx2.ko.gz
-rw-r--r-- root/root     22711 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/camellia-x86_64.ko.gz
-rw-r--r-- root/root     13039 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/cast5-avx-x86_64.ko.gz
-rw-r--r-- root/root     16245 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/cast6-avx-x86_64.ko.gz
-rw-r--r-- root/root     11300 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/chacha-x86_64.ko.gz
-rw-r--r-- root/root      4380 2025-01-02 06:14 lib/modules/6.12.8-0-lts/kernel/arch/x86/crypto/crc32-pclmul.ko.gz

So as it turns out, the compressed tarball contains GZIP compressed kernel modules. Trying to compress these won't really do much, since layered compression methods are rarely practical. So let's look at another APK named rust-wasm-1.83.0-r0.apk which is 105255109 bytes or 101 megabytes. The decompressed size of this is over 277 megabytes. Let's use the same pigz -0 trick to try creating a compresed version of this package that outperforms the original

methodcommmandoutput size (bytes)percentage of original
bzip2bzip210713404736.9%
zstdzstd --ultra --threads=010154643035.0%
xzxz --extreme -97114512024.5%

Since the rust-wasm package is not internally compressed, this package compresses very well using this techinque. The xz tool seems to absolutely win in terms of smallest output size. This is all I care about. This can be combined into one single shell command

 python3 split_apk.py rust-wasm-1.83.0-r0.apk && pigz -0 part_*  && cat part_0.gz part_1.gz part_2.gz > nocomp.apk && xz --extreme -9 nocomp.apk

The problem with this is: the signature present is no longer valid on the APK. This is because the GZIP sections are signed by the tool and those have been changed by this process. This can be demonstrated by trying to install this APK in docker

$ docker run --mount type=bind,source=$PWD,target=/opt/apk,readonly --rm -it alpine:3.21 /bin/sh
/ # apk update && apk add xz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/community/x86_64/APKINDEX.tar.gz
v3.21.0-260-g5f0cd789034 [https://dl-cdn.alpinelinux.org/alpine/v3.21/main]
v3.21.0-259-g82f13e6dd79 [https://dl-cdn.alpinelinux.org/alpine/v3.21/community]
OK: 25397 distinct packages available
(1/2) Installing xz-libs (5.6.3-r0)
(2/2) Installing xz (5.6.3-r0)
Executing busybox-1.37.0-r8.trigger
OK: 7 MiB in 17 packages
/ # cp /opt/apk/nocomp.apk.xz .
/ # xz -d nocomp.apk.xz
/ # apk add --allow-untrusted nocomp.apk
ERROR: nocomp.apk: BAD signature

You can convince apk to allow an untrusted signature but not to ignore the signature entirely. So it is entirely possible to do this, but the full steps can be described as.

  1. read the original APK file and decompress it into 3 sections
  2. the first section is the signature, discard it
  3. the third section is the data TAR file, use GZIP to compress it at a zero level
  4. compute the SHA256 of the compressed data section
  5. the second section is the control section, a TAR file. Unpack the control TAR file
  6. in the control section, find the .PKGINFO file
  7. scan the .PKGINFO file until a line is found starting with datahash =. Modify this line it to read datahash = <SHA256> where <SHA256> is the hash of the compressed data section from earlier. The hash is encoded in hexadecimal.
  8. repack the control TAR file in the original file order, with the modified .PKGINFO file
  9. compresss the control TAR file with GZIP at the zero level
  10. create a signature of the compressed control TAR file using openssl dgst -sha1 -sign private-key.pem -out foo control.tar.gz or similar
  11. place this signature file in a TAR file with the name formatted like .SIGN.RSA.<USER>@<DOMAIN>-<FILENAME>. This needs to be structurally correct but doesn't have to refer to an exact user
  12. compress the signature TAR file
  13. write out the signature TAR file, the control TAR file, and the data TAR file as a single file

The outcome of this is an APK file that is valid and has the exact same contents as the original APK, but is effectively uncompressed. This can be compressed with whatever tool you'd like. At this point I combined all of this into a Python script

$ ls -lh rust-wasm-1.83.0-r0.apk
-rw-rw-r-- 1 ericu ericu 101M Nov 28 18:35 rust-wasm-1.83.0-r0.apk
$ python3 recomp_apk.py private-key.pem rust-wasm-1.83.0-r0.apk
$ ls -lh nocomp.apk
-rw-rw-r-- 1 ericu ericu 277M Jan  5 09:12 nocomp.apk
$ xz -9 --extreme nocomp.apk
$ ls -lh nocomp.apk.xz
-rw-rw-r-- 1 ericu ericu 68M Jan  5 09:12 nocomp.apk.xz
$ docker run --mount type=bind,source=$PWD,target=/opt/apk,readonly --rm -it alpine:3.21 /bin/sh
/ # apk add xz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/main/x86_64/APKINDEX.tar.gz
fetch https://dl-cdn.alpinelinux.org/alpine/v3.21/community/x86_64/APKINDEX.tar.gz
(1/2) Installing xz-libs (5.6.3-r0)
(2/2) Installing xz (5.6.3-r0)
Executing busybox-1.37.0-r8.trigger
OK: 7 MiB in 17 packages
/ # cp /opt/apk/nocomp.apk.xz .
/ # xz -d nocomp.apk.xz
/ # apk add --allow-untrusted ./nocomp.apk
(1/22) Installing libgcc (14.2.0-r4)
(2/22) Installing jansson (2.14-r4)
(3/22) Installing libstdc++ (14.2.0-r4)
(4/22) Installing zstd-libs (1.5.6-r2)
(5/22) Installing binutils (2.43.1-r1)
(6/22) Installing libgomp (14.2.0-r4)
(7/22) Installing libatomic (14.2.0-r4)
(8/22) Installing gmp (6.3.0-r2)
(9/22) Installing isl26 (0.26-r1)
(10/22) Installing mpfr4 (4.2.1-r0)
(11/22) Installing mpc1 (1.3.1-r1)
(12/22) Installing gcc (14.2.0-r4)
(13/22) Installing musl-dev (1.2.5-r8)
(14/22) Installing libffi (3.4.6-r0)
(15/22) Installing libxml2 (2.13.4-r3)
(16/22) Installing llvm19-libs (19.1.4-r0)
(17/22) Installing scudo-malloc (19.1.4-r0)
(18/22) Installing rust (1.83.0-r0)
(19/22) Installing lld-libs (19.1.4-r0)
(20/22) Installing lld (19.1.4-r0)
(21/22) Installing wasi-libc (0.20240926-r0)
(22/22) Installing rust-wasm (1.83.0-r0)
Executing busybox-1.37.0-r8.trigger
OK: 806 MiB in 39 packages

The file nocomp.apk.xz needs to decompressed before usage, but after that it can be installed with apk add --allow-untrusted inside of a docker container. So this proves my intial theory that I can distribute APK files that take up less disk space that the default ones. What I'd really like to be able to do is create a file with as much compression as possible that I can use to install APKs from on any Alpine machine. Rather than just sending compressed APKs one at a time I can combine all of them into one file. The easiest way to do this is to use SquashFS. This is a compressed, read-only filesystem. I test this by making a SquashFS image with just the one APK

$ mksquashfs nocomp.apk test.squashfs -no-xattrs -comp xz -Xdict-size 1M -b 1M -not-reproducible
$ ls -lh test.squashfs
-rw-r--r-- 1 ericu ericu 74M Jan  5 09:34 test.squashfs

The output is compressed, although not quite as well as using just xz. Almost any computer running Linux can mount this image. My laptop running Ubuntu can mount it just by clicking on the file.

Logically my next step was to download the entire main and community repositories from an Alpine 3.21 mirror and then compress them all into a SquashFS image. I only downloaded the packages for the x86_64 architecture. I then put all of the things I had learned into a script to create one single SquashFS with the entirety of every Alpine package in it. I thought this step would take forever, but it only took 22 minutes since I was able to use all 36 cores of a dual CPU machine I have. The entire image is only 40 gigabytes, so you can pretty easily carry it around on a thumb drive. This normally takes 49 gigabytes of storage.

But as as I quickly discovered, there was still another step I needed. Each repository has a listing file called APKINDEX.tar.gz. It's a compressed tar file with an internal listing of all the packages within the repository. The actual listing of packages is within the archive and named APKINDEX. The listed files hasn't changed, but each entry looks something like this

C:Q1KFPY0UY6wRYT1adax/rVLH/lzWE=
P:7zip
V:24.08-r0
A:x86_64
S:1775308
I:1763704
T:File archiver with a high compression ratio
U:https://7-zip.org/
L:LGPL-2.0-only
o:7zip
m:Alex Xu (Hello71) <alex_y_xu@yahoo.ca>
t:1732266688
c:f34d70b4e86dd0e938070f67db46aaa4cf68db11
k:100
D:so:libc.musl-x86_64.so.1 so:libgcc_s.so.1 so:libstdc++.so.6
p:7zip-virtual p7zip=24.08-r0 cmd:7z=24.08-r0 cmd:7zz=24.08-r0

The values prefixed with S: and C: are the compressed size of the APK and the SHA256 hash of the control segment defined by control.tar.gz. The SHA256 hash is actually base64 encoded and prefixed with Q1. What I had to do was process the entirety of the original APKINDEX updating each of these values, then create a new APKINDEX.tar.gz with this updated file. Then and only then did I convince the Alpine LiveCD to accept my "repository".

To test this out what I did was start up a virtual machine with the file as a read-only device. This is the command I used to do this

virt-install -n blogtest --os-variant=alpinelinux3.10 --ram=1024 --vcpus=2 --disk path=/opt/data/apk_x86_64.squashfs,driver.io=io_uring,readonly=on,target.bus=virtio  --serial pty --network default --virt-type kvm --cdrom /opt/iso/alpine-standard-3.21.2-x86_64.iso --sound none --noautoconsole --boot uefi

The actual OS being booted here is just the regular ISO from the Alpine project. Once up and running it only takes a few commands to mount the repository and then install software from it

localhost:~# mount -o ro /dev/vda /mnt
localhost:~# echo -e '/mnt/main\n/mnt/community' > /etc/apk/repositories
localhost:~# apk update --allow-untrusted
v3.21.0-265-g8f76abadc6e [/mnt/main]
v3.21.0-264-g6f8774869cf [/mnt/community]
OK: 25416 distinct packages available
localhost:~# apk add --allow-untrusted bsd-games

The game 'robots' installed from the repository

This isn't practical as a method of distribution. Every time an Alpine package is updated, the entire SquashFS image would have to be reproduced and redownloaded. My real goal here was just to see how small I could fit all of Alpine into. I also got to learn some more about SquashFS.

Runing it yourself

If you want to try this for yourself you can run the script. You need to download all the packages for at least one architecture of Alpine

Example usage

Running the command below creates a SquashFS in the current directory from whatever repositories you mirror into /opt/alpinemirror.

$ export TMPDIR=/tmp # this path needs to have a huge amount of free space
$ python3 recomp_apk.py squashfs private-key.pem x86_64 /opt/alpinemirror

Creating private keys

You need to generate a private key to sign files with, although you can just ignore this when installing packages by running with --allow-untrusted. You can do that with openssl

# openssl genrsa -out private-key.pem 4096

Copyright Eric Urban 2025, or the respective entity where indicated