Parametric Matching for Improved Data Compression

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Modern general-purpose compressors can compress a wide variety of files but do not achieve high compression ratios on files that contain short sequences of delimiters with interleaved numeric data and generally with interleaved data where each sequence is not well correlated to the previous bytes. We demonstrate Parametric Matching (PM), which vastly improves the compression of various structured languages, including PDF, SVG, and G-code files. By de-interleaving and coalescing delimiters and storing data as delta-encoded, discretized binary, compressions of a factor of 10 or more are possible. A Python prototype compresses files to a binary representation, which is then compressed using Lempel-Ziv-Markov (LZMA) to efficiently store the binary tokens in a minimal number of bits. Table 1 shows a ratio of 6 for PDF files containing only text, which are first parsed, and recompressed using PM. For SVG, we demonstrate a factor of 8 to 10 for files including a randomized spiral and a US county map. For the G-code, we compressed the Statue of Liberty, demonstrating that even when the layers are different, a high degree of compression can be achieved. Times are all less than 250ms, even in our Python prototype.

Original languageEnglish
Title of host publicationProceedings - DCC 2025
Subtitle of host publication2025 Data Compression Conference
EditorsAli Bilgin, James E. Fowler, Joan Serra-Sagrista, Yan Ye, James A. Storer
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages383
Number of pages1
ISBN (Electronic)9798331534714
DOIs
StatePublished - 2025
Event2025 Data Compression Conference, DCC 2025 - Snowbird, United States
Duration: 18 Mar 202521 Mar 2025

Publication series

NameData Compression Conference Proceedings
ISSN (Print)1068-0314

Conference

Conference2025 Data Compression Conference, DCC 2025
Country/TerritoryUnited States
CitySnowbird
Period18/03/2521/03/25

Keywords

  • compression
  • lzma
  • parametric matching

Fingerprint

Dive into the research topics of 'Parametric Matching for Improved Data Compression'. Together they form a unique fingerprint.

Cite this