The public benchmark surface reports measured, SHA-exact results only. Research targets stay internal until they produce verified artifacts.
All three Cubbit validation datasets are listed explicitly. DS1 is protected by the raw-floor path; DS2 and DS3 improve; the aggregate improves.
| Dataset | Raw / reference | Algoland measured | Delta | Verdict |
|---|---|---|---|---|
| DS1 | 52,428,960 B | 52,429,120 B | +160 B | Protected raw-floor path; no fake compression claimed. |
| DS2 | 21,618,427 B reference / 37,548,410 B raw | 20,674,714 B | -943,713 B vs reference | Strongly improved; about 44.94% smaller than raw and about 6.01% smaller than the solid xz comparison used in the validation notes. |
| DS3 | 5,571,150 B reference / 7,673,566 B raw | 5,566,429 B | -4,721 B vs reference | Improved; about 27.46% smaller than raw. |
| Aggregate | 97,650,936 B raw | 78,670,263 B | -18,980,673 B | 19.44% measured raw reduction, SHA-exact restore. |
Algoland Compressor routes files, folders and archive bundles through measured compression paths. Public wording distinguishes the delivered best-of path from core-only measurement rows.
| Surface | Measured coverage | Bytes / counts | Public-safe interpretation |
|---|---|---|---|
| Representative public sweep | 104 real files, 45 extensions, 1 B to about 2 MB | Raw 40,583,771 B to delivered best-of 21,114,386 B; SHA-exact 104/104 | Best-of Algoland/xz/raw delivery path is never worse than xz on every measured row. |
| Mode winners inside the 104-file sweep | Core / xz child / raw-floor accounting | Core wins 80 rows; xz child wins 6 rows; raw-floor protects 18 rows | Do not say core alone beats xz everywhere yet. The research target is to reduce the xz-child rows to zero. |
| Expanded engineering sweep | 193 measured rows across broader filetypes and sizes, plus the latest Stage 5H target set | Starting losses: xz 15 rows, brotli 80 rows. Banked and measured stages moved the active target set to xz 6 open and brotli 24 open. Stage 5H improved three ZIP-family container rows without regressions. | Public victory waits until xz losses = 0 and brotli losses = 0 with SHA-exact restore. |
| Generated speed-accurate sanity matrix | 47 generated files across code, text, logs, JSON, XML, CSV, SRT, binary patterns, random data, WAV and mixed ZIP | R1 container runtime: 47/47 Algoland byte counts captured, 0 missing Algoland rows, 0 xz open, 20 brotli open, 7 zstd open. | Includes Algoland runtime timing per row. R0 banked the random/no-gain fast path; R1 then improved archive_mixed.zip from 131,458 B to 131,366 B and made that row about 3.17x faster without any byte regression across the matrix. |
| Active row ledger | Current measured row CSV | Published one by one below on this page. | Rows are listed even when xz or brotli still wins. That is the transparent engineering queue until the gate reaches zero losses. |
| Archive and folder payloads | Folders, .tar.gz, .tgz, .zip, .rar, .7z, .tar and mixed archive bundles | Algoland Compressor packages folders and archives into .algd artifacts through the live API | Runtime support is active; already-compressed payloads are protected by measured no-gain handling instead of fake compression. |
| Chunked large-object API | Session init, indexed chunk upload, status, finalize and manifest generation | Public HTTPS lifecycle tested with multi-chunk payloads; whole-object buffering avoided | Designed for very large payload architecture. Current VPS pilot is not a 500 TB storage backend; production requires object storage and horizontal workers. |
| API load burst | 10,000 HTTPS status requests, 500 concurrency, single VPS | 9,993 OK / 7 failed; 304.52 req/s; p50 727 ms; p95 3,261 ms; p99 8,544 ms | Pilot-node burst test passed with small failure rate. Formal 10,000 concurrent heavy-upload certification requires a dedicated load-test cluster. |
Canterbury, Silesia and enwik/Hutter-style rows are listed with the measured boundary. No Hutter Prize victory or PAQ/cmix victory is claimed.
| Corpus | Measured state | Boundary |
|---|---|---|
| Canterbury | 443,009 B release measurement on the per-file public surface; SHA-exact restore. | Included as a small mixed-file corpus gate. All-counted packaging rules must be stated when the accounting method changes. |
| Silesia | All-counted historical Algoland row: 47,850,663 B vs xz 48,456,004 B; SHA-exact restore. | Included as a larger generality gate. Current public release keeps claims tied to the measured report row. |
| enwik prefixes | enwik1/2/8/16/64 MB release row: 266,157 / 525,682 / 2,015,925 / 3,937,347 / 15,062,161 B. | Hutter-style prefix surface, not an official Hutter Prize claim. |
| 64 MB scale pass | Engineering stack: 15,062,161 B to 14,548,395 B, SHA-exact, -513,766 B. | Scale-confirmed measured win. Not inserted as the packaged release baseline until the release gate is rebuilt. |
| Full enwik9 / Hutter | Public release status: pending official package/accounting gate. | No Hutter Prize win is claimed. |
Old cross-domain rows are not reused as final claims. Each domain gets a fresh remeasurement track with current code, current prompts/data, explicit verifier, and dated artifacts.
| Domain | What will be measured | Current public status |
|---|---|---|
| RAG | Retrieval accuracy, citation grounding, latency, corpus update behavior, and reproducible answer traces. | Refresh required before public scoring. |
| Ollama / local models | Local inference workflow, memory footprint, latency, task quality, and private deployment behavior. | Refresh required before public scoring. |
| Mistral adapter | Prompt/runtime integration, output quality, latency, and controlled benchmark prompts. | Refresh required before public scoring. |
| SAT / constraint solving | Satisfiability instances, proof/witness checks, solver timing, and reproducibility artifacts. | Refresh required before public scoring. |
| Simon / mining | Exploratory algorithmic experiments with explicit verifier boundaries. | Internal track only; no public performance claim until remeasured. |
The benchmark surface is intentionally expanding: broad public sweep, generated size bands, standard corpora, active absorber rows, and API/runtime gates. This is a monotonic ledger: every new row is either conquered by measurement or kept visible as an absorber target.
1 B, 10 B, 64 KiB, 256 KiB, 1 MiB, 2 MiB, 4 MiB, 8 MiB, 16 MiB, 64 MiB and chunked large-object paths are tracked as separate gates.
Single files, folders, TAR, TAR.GZ, TGZ, ZIP, RAR, 7Z, GZ, XZ, mixed archives and already-compressed objects are part of the runtime surface.
Rows are not relabeled as victories until the Algoland artifact is smaller or tied, restore is SHA-exact, and all metadata is counted.
The public page now shows the breadth directly. Families below are either already measured in the public sweep, generated speed-accurate matrix, Stage 5 absorption set, or queued for the next measured absorber.
| Family | Examples | Current measured status | Next absorber direction |
|---|---|---|---|
| Text and logs | txt, md, log, srt, csv, tsv | Strong measured rows. Stage 5C closed the large CSV row and SQL dump row; Stage 5D closed the app log row against brotli. | Template/timestamp fields, repeated-phrase count coding, SRT block absorber, deeper CSV/log column-token contour. |
| Structured text | json, ndjson, xml, html, svg, yaml, sql | Stage 5D token contour closed SVG, HTML, JSON-record and XML-catalog rows against xz/brotli where measured. | Key/value context maps, tag-depth, delimiter and schema orbit indexing. |
| Source and executable-adjacent | js, ts, py, c, cpp, h, o, wasm, exe/dll class | Object-code row remains an active absorber target. | Section-aware contours, relocation/call target normalization, BCJ-like native transform. |
| Office and documents | pdf, docx, xlsx, pptx, odt, epub | Stage 5H improved XLSX/ODT ZIP-family rows with SHA-exact restore and no selector regressions. | Deeper container-page modeling and metadata-safe recompression where reversible. |
| Images | png, jpg, webp, gif, tiff, bmp, svg | SVG and TIFF/image-plane rows are active measured targets. | Image-plane predictors, row delta, palette/channel contours, raw-floor for already-compressed media. |
| Audio and media | wav, mp3, mp4, mov, flac, ogg | Stage 5A closed the WAV tone row against xz and brotli: 54,669 B, SHA-exact. | PCM predictor lanes; media containers protected unless reversible structure is found. |
| ML and numeric arrays | npy, safetensors, parquet, tensor weights | Stage 5A closed safetensors: 11,998 B, SHA-exact. Parquet remains open. | Column/page contours, endian lanes, delta/XOR residuals, sparse numeric maps. |
| Archives and compressed data | zip, rar, 7z, tar.gz, gz, xz, zstd, brotli | Runtime support is public; Stage 5H reduced h_archive.zip by 441 B, and Speed R1 improved archive_mixed.zip from 131,458 B to 131,366 B with a 3.17x row speedup. | Member-aware container contours, safe metadata handling, reversible recompression only when measured. |
| Random/encrypted data | random, encrypted, high-entropy object chunks | Raw/no-gain protection is verified. Raw-orbit speed intrinsic preserves artifact size and accelerates random_1048576.bin from 15.157 s to 0.059 s. | Canonical raw-orbit preflight, chunk index, skip heavy modeling when the bit-orbit is already raw. |
This is the speed-accurate generated matrix. It is included because it catches harness bugs, micro-losses and speed pathologies. It is not a universal victory claim.
| File | Class | Raw | Algoland | Runtime s | xz | brotli | zstd | Best | Gap | Status |
|---|---|---|---|---|---|---|---|---|---|---|
| code_c_1048576.c | code | 1048576 | 214 | 0,867 | 348 | 62 | 159 | brotli | 152 | absorber open |
| code_py_1048576.py | code | 1048576 | 177 | 0,849 | 332 | 59 | 142 | brotli | 118 | absorber open |
| csv_1048576.csv | structured_text | 1048576 | 183 | 0,805 | 336 | 67 | 146 | brotli | 116 | absorber open |
| json_1048576.json | structured_text | 1048576 | 58 | 0,809 | 340 | 66 | 148 | Algoland | 0 | best or tied |
| log_1048576.log | log | 1048576 | 262 | 0,87 | 364 | 84 | 172 | brotli | 178 | absorber open |
| pattern_1048576.bin | binary | 1048576 | 363 | 0,854 | 828 | 707 | 684 | Algoland | 0 | best or tied |
| random_1048576.bin | binary | 1048576 | 1048584 | 0,068 | 1048696 | 1048584 | 1048613 | raw | 8 | absorber open |
| srt_1048576.srt | subtitle_text | 1048576 | 197 | 0,829 | 340 | 64 | 151 | brotli | 133 | absorber open |
| text_repeated_1048576.txt | text | 1048576 | 175 | 0,795 | 332 | 53 | 143 | brotli | 122 | absorber open |
| xml_1048576.xml | structured_text | 1048576 | 64 | 0,833 | 348 | 60 | 161 | brotli | 4 | absorber open |
| zero_1048576.bin | binary | 1048576 | 17 | 0,874 | 292 | 14 | 53 | brotli | 3 | absorber open |
| wav_tone_3s.wav | raw_audio | 264644 | 3282 | 1,079 | 4464 | 3953 | 4357 | Algoland | 0 | best or tied |
| code_c_262144.c | code | 262144 | 66 | 0,755 | 236 | 62 | 93 | brotli | 4 | absorber open |
| code_py_262144.py | code | 262144 | 50 | 0,778 | 224 | 59 | 76 | Algoland | 0 | best or tied |
| csv_262144.csv | structured_text | 262144 | 55 | 0,75 | 224 | 67 | 80 | Algoland | 0 | best or tied |
| json_262144.json | structured_text | 262144 | 58 | 0,779 | 228 | 66 | 82 | Algoland | 0 | best or tied |
| log_262144.log | log | 262144 | 80 | 0,789 | 252 | 84 | 106 | Algoland | 0 | best or tied |
| pattern_262144.bin | binary | 262144 | 362 | 0,824 | 716 | 707 | 618 | Algoland | 0 | best or tied |
| random_262144.bin | binary | 262144 | 262152 | 0,055 | 262224 | 262149 | 262163 | raw | 8 | absorber open |
| srt_262144.srt | subtitle_text | 262144 | 59 | 0,759 | 228 | 64 | 85 | Algoland | 0 | best or tied |
| text_repeated_262144.txt | text | 262144 | 52 | 0,802 | 220 | 53 | 77 | Algoland | 0 | best or tied |
| xml_262144.xml | structured_text | 262144 | 64 | 0,928 | 236 | 58 | 95 | brotli | 6 | absorber open |
| zero_262144.bin | binary | 262144 | 17 | 0,769 | 180 | 14 | 29 | brotli | 3 | absorber open |
| archive_mixed.zip | archive_or_compressed | 131710 | 131366 | 1,507 | 131488 | 131715 | 131307 | zstd | 59 | absorber open |
| wav_tone_1s.wav | raw_audio | 88244 | 3279 | 0,955 | 4436 | 3947 | 4333 | Algoland | 0 | best or tied |
| code_c_65536.c | code | 65536 | 66 | 0,826 | 204 | 62 | 80 | brotli | 4 | absorber open |
| code_py_65536.py | code | 65536 | 49 | 0,749 | 192 | 59 | 63 | Algoland | 0 | best or tied |
| csv_65536.csv | structured_text | 65536 | 55 | 0,776 | 196 | 67 | 67 | Algoland | 0 | best or tied |
| json_65536.json | structured_text | 65536 | 58 | 0,751 | 196 | 66 | 69 | Algoland | 0 | best or tied |
| log_65536.log | log | 65536 | 80 | 0,858 | 224 | 84 | 93 | Algoland | 0 | best or tied |
| pattern_65536.bin | binary | 65536 | 361 | 0,923 | 644 | 705 | 602 | Algoland | 0 | best or tied |
| random_65536.bin | binary | 65536 | 65544 | 0,05 | 65608 | 65541 | 65550 | raw | 8 | absorber open |
| srt_65536.srt | subtitle_text | 65536 | 58 | 0,765 | 196 | 64 | 72 | Algoland | 0 | best or tied |
| text_repeated_65536.txt | text | 65536 | 51 | 0,779 | 192 | 53 | 64 | Algoland | 0 | best or tied |
| xml_65536.xml | structured_text | 65536 | 64 | 0,74 | 208 | 59 | 77 | brotli | 5 | absorber open |
| zero_65536.bin | binary | 65536 | 16 | 0,795 | 148 | 13 | 23 | brotli | 3 | absorber open |
| code_c_16384.c | code | 16384 | 65 | 0,791 | 184 | 62 | 79 | brotli | 3 | absorber open |
| code_py_16384.py | code | 16384 | 49 | 0,734 | 168 | 59 | 62 | Algoland | 0 | best or tied |
| csv_16384.csv | structured_text | 16384 | 55 | 0,738 | 172 | 67 | 66 | Algoland | 0 | best or tied |
| json_16384.json | structured_text | 16384 | 57 | 0,744 | 172 | 65 | 68 | Algoland | 0 | best or tied |
| log_16384.log | log | 16384 | 79 | 0,764 | 200 | 84 | 93 | Algoland | 0 | best or tied |
| pattern_16384.bin | binary | 16384 | 338 | 0,795 | 432 | 361 | 381 | Algoland | 0 | best or tied |
| random_16384.bin | binary | 16384 | 16392 | 0,66 | 16452 | 16389 | 16398 | raw | 8 | absorber open |
| srt_16384.srt | subtitle_text | 16384 | 58 | 0,762 | 176 | 64 | 72 | Algoland | 0 | best or tied |
| text_repeated_16384.txt | text | 16384 | 51 | 0,739 | 168 | 53 | 63 | Algoland | 0 | best or tied |
| xml_16384.xml | structured_text | 16384 | 63 | 0,816 | 184 | 59 | 77 | brotli | 4 | absorber open |
| zero_16384.bin | binary | 16384 | 16 | 0,754 | 128 | 13 | 22 | brotli | 3 | absorber open |
This table publishes the current measured row ledger for Algoland Compressor: counted artifact size, xz size, brotli size, selected compression path, gap and SHA status.
Rows marked open are not hidden. They are active engineering targets. The rule is maximum margin: once a competitor is beaten on a filetype, the Algoland byte count keeps being pushed lower.
| File | Raw | Algoland | xz | brotli | Compression path | Gap vs xz | Gap vs brotli | SHA | Status |
|---|---|---|---|---|---|---|---|---|---|
| h_data.json | 56944 | 49048 | 53252 | sealed path | 7896 | 3692 | YES | open | |
| g_vector_256k.svg | 28277 | 29996 | 28998 | sealed path | -1719 | -721 | YES | beats both | |
| g_weights_small.safetensors | 11998 | 13480 | 14281 | sealed path | -1482 | -2283 | YES | beats both | |
| h_object.o | 4272 | 4236 | 4393 | sealed path | 36 | -121 | YES | beats brotli | |
| g_scene_256.tiff | 12909 | 11824 | 13219 | sealed path | 1085 | -310 | YES | beats brotli | |
| g_scene_512.tiff | 46661 | 41208 | 47241 | sealed path | 5453 | -580 | YES | beats brotli | |
| h_archive.zip | 1768276 | 1768172 | 1775036 | sealed path | 104 | -6760 | YES | beats brotli | |
| g_table.parquet | 102871 | 102312 | 115962 | sealed path | 559 | -13091 | YES | beats brotli | |
| g_tone.wav | 54669 | 73680 | 157111 | sealed path | -19011 | -102442 | YES | beats both | |
| g_sales_1m.csv | 11441 | 99024 | 58648 | sealed path | -87583 | -47207 | YES | beats both | |
| h_code.h | 125891 | 135196 | 122210 | sealed path | -9305 | 3681 | YES | beats xz | |
| h_image.gif | 1709127 | 1731108 | 1702303 | sealed path | -21981 | 6824 | YES | beats xz | |
| h_image.webp | 434715 | 437604 | 431801 | sealed path | -2889 | 2914 | YES | beats xz | |
| h_archive.gz | 1444121 | 1447148 | 1441423 | sealed path | -3027 | 2698 | YES | beats xz | |
| h_video.mp4 | 1672012 | 1687080 | 1669561 | sealed path | -15068 | 2451 | YES | beats xz | |
| g_page_500k.html | 3303 | 16592 | 10305 | sealed path | -13289 | -7002 | YES | beats both | |
| g_dump.sql | 859 | 5472 | 3592 | sealed path | -4613 | -2733 | YES | beats both | |
| h_video_small.mp4 | 408768 | 413568 | 408451 | sealed path | -4800 | 317 | YES | beats xz | |
| g_page_100k.html | 1719 | 4484 | 2823 | sealed path | -2765 | -1104 | YES | beats both | |
| g_syn_text_1048576.txt | 184 | 404 | 124 | sealed path | -220 | 60 | YES | beats xz | |
| g_enwik_10240.txt | 3247 | 3760 | 3024 | sealed path | -513 | 223 | YES | beats xz | |
| h_office.xlsx | 19773 | 19888 | 19596 | sealed path | -115 | 177 | YES | beats xz | |
| h_pdf_mid.pdf | 101242 | 102412 | 101074 | sealed path | -1170 | 168 | YES | beats xz | |
| h_pdf_small.pdf | 12869 | 13100 | 12791 | sealed path | -231 | 78 | YES | beats xz | |
| g_arrays.npz | 240281 | 241084 | 240228 | sealed path | -803 | 53 | YES | beats xz | |
| g_page_5k.html | 445 | 608 | 396 | sealed path | -163 | 49 | YES | beats xz | |
| g_records_10k.json | 819 | 1052 | 825 | sealed path | -233 | -6 | YES | beats both | |
| g_catalog_10k.xml | 500 | 744 | 571 | sealed path | -244 | -71 | YES | beats both | |
| g_sheet.ods | 4967 | 5068 | 4935 | sealed path | -101 | 32 | YES | beats xz | |
| g_service_small.rb | 851 | 1084 | 825 | sealed path | -233 | 26 | YES | beats xz | |
| g_document.odt | 4949 | 5048 | 4928 | sealed path | -99 | 21 | YES | beats xz | |
| h_package.swift | 470 | 644 | 452 | sealed path | -174 | 18 | YES | beats xz | |
| g_book.epub | 9602 | 9736 | 9585 | sealed path | -134 | 17 | YES | beats xz | |
| g_app_10k.log | 660 | 920 | 766 | sealed path | -260 | -106 | YES | beats both | |
| g_service_small.pl | 788 | 1024 | 777 | sealed path | -236 | 11 | YES | beats xz | |
| h_data.xml | 527 | 676 | 516 | sealed path | -149 | 11 | YES | beats xz | |
| h_build.lua | 457 | 604 | 449 | sealed path | -147 | 8 | YES | beats xz | |
| h_db.sqlite | 512 | 636 | 504 | sealed path | -124 | 8 | YES | beats xz | |
| h_code.js | 1235 | 1560 | 1236 | sealed path | -325 | -1 | YES | beats both | |
| g_zero_8m.bin | 16 | 1364 | 14 | sealed path | -1348 | 2 | YES | beats xz |