Increasing the Huffman generation code algorithm to equalize compression ratio and time in lossless 16-bit data archiving

Tonny Hidayat,Mohd Hafiz Zakaria,Ahmad Naim Che Pee
DOI: https://doi.org/10.1007/s11042-022-14130-1
IF: 2.577
2022-12-04
Multimedia Tools and Applications
Abstract:Compression is a process that is always carried out in terms of digitizing data, which is considered very important, especially in the development and growth of the Big Data era. Lossless compression is the process of reducing the size of the data but with the condition that it can be returned to its original source during the decompression process. One of the purposes of doing Lossless compression is to archive a file, usually the file is RAW and has a large file size with a minimum 16-Bit file system (65,536 possible differences in values). Huffman's algorithm is currently still very effective in compressing 8-bit data, which can be grouped into Static, Dynamic, and Adaptive extensions, but its performance cannot be determined if it is performed on data with complex variables and probabilities such as WAV format audio data. Based on a literature review, the compression performance measurement for file archiving uses the Compression Ratio (CR) and Compression Time (CT) indicators. This research resulted in a new scheme which we named 4-ary/MQ, the architectural basis of which is based on entropy coding rooted in the static, dynamic and adaptive variants of the Huffman scheme. For the variable code length characteristics, it follows the Quad Tree dynamic branching (FGK rule), the node symbol setting adopts an adaptive method, namely adding a maximum of 2 variables with a value of '0' to maintain the root of the branch after the root always has 4 branches. Based on descriptive analysis of compression results, deviation, average, ANOVA and DMRT, 4-ary/MQ produces optimal CR with fast CT when compared to various variants of the Huffman algorithm and other lossless compression applications such as (PKZIP, WinZip, 7-Zip, and Monkeys Audio). From the results of trial analysis based on manual mathematical and statistical calculations, it is certain that 4-ary/MQ provides high compression results with a very fast process, so it has many benefits if it is used to compress data on local storage media, hosting/cloud and bandwidth.
computer science, information systems, theory & methods,engineering, electrical & electronic, software engineering
What problem does this paper attempt to address?