Adaptive Huffman Algorithm for Data Compression Using Text Clustering and Multiple Character Modification


  • Babita Kumari Research Scholar, P.G. Department of Mathematics & Computer Science, Magadh University, Bodh Gaya, Bihar, India
  • Neeraj Kumar Kamal Assistant Professor, Department of Physics, Anugrah Memorial College, Gaya, Bihar, India
  • Arif Mohammad Sattar Assistant Professor, Department of Computer Science & Information Technology, Anugrah Memorial College, Gaya, Bihar, India
  • Mritunjay Kr. Ranjan Assistant Professor, School of Computer Sciences and Engineering, Sandip University, Nashik, Maharashtra, India



Adaptive Huffman Algorithm, Data Compression, Text Clustering, Multiple Character Modification.


Adaptive Huffman algorithm is a popular data compression technique that creates a variable-length binary code for each symbol in a message. However, the original algorithm may not be efficient in compressing text data, particularly when dealing with long sequences of repeated characters. In this study, we propose a novel approach to enhance the compression ratio of the Adaptive Huffman algorithm by utilizing text clustering and multiple character modification. The proposed method first clusters the text data into groups of similar words or phrases. Then, it modifies multiple characters in each group to reduce redundancy and increase the frequency of the most common characters. This modification enables the Adaptive Huffman algorithm to produce shorter codes for the modified characters and effectively compress the clustered text data. Experimental results on a benchmark dataset show that the proposed method achieves better compression ratios than the traditional Adaptive Huffman algorithm and other state-of-the-art compression methods. The proposed method can be applied to various text data, such as documents, emails, and chat messages, and can significantly reduce storage and transmission costs.


Ramakrishnan M, Satish L, Kalendar R, Narayanan M, Kandasamy S, Sharma A, Emamverdian A, Wei Q, Zhou M. The dynamism of transposon methylation for plant development and stress adaptation. Int J Mol Sci. 2021 Jan; 22(21): 11387.

Djusdek DF, Studiawan H, Ahmad T. Adaptive image compression using adaptive Huffman and LZW. In 2016 IEEE International Conference on Information & Communication Technology and Systems (ICTS). 2016 Oct 12; 101–106.

Almawgani AH, Alhawari AR, Hindi AT, Al-Arashi WH, Al-Ashwal AY. Hybrid image steganography method using Lempel Ziv Welch and genetic algorithms for hiding confidential data. Multidimens Syst Signal Process. 2022 Jun 1; 33(2): 561–578.

Astuti EZ, Hidayat EY. Kode Huffman untuk Kompresi Pesan. Techno Com. 2013 May 1; 12(2): 117–26. 5. Chandra S, Sharma A, Singh GK. A comparative analysis of performance of several wavelet based ECG data compression methodologies. IRBM. 2021 Aug 1; 42(4): 227–44.

Ali A, Hafeez Y, Hussain S, Yang S. Role of requirement prioritization technique to improve the quality of highly-configurable systems. IEEE Access. 2020 Feb 3; 8: 27549–73.

Usama M, Malluhi QM, Zakaria N, Razzak I, Iqbal W. An efficient secure data compression technique based on chaos and adaptive Huffman coding. Peer-to-Peer Networking and Applications. 2021 Sep; 14: 2651–64.

Painsky A, Rosset S, Feder M. A simple and efficient approach for adaptive entropy coding over large alphabets. In 2016 IEEE Data Compression Conference (DCC). 2016 Mar 30; 369–378.

Sinaga H, Sihombing P, Handrizal H. Perbandingan Algoritma Huffman Dan Run Length Encoding Untuk Kompresi File Audio. In Talent Conf Ser: Sci Technol (ST). 2018 Oct 17; 1(1): 010–015.

Siahaan AP. Implementasi Teknik Kompresi Teks Huffman. J Inform: Ahmad Dahlan. 2016; 10(2): 101651.

Chulkamdi MT, Pramono SH, Yudaningtyas E. Kompresi Teks Menggunakan Algoritma Huffman dan Md5 pada Instant Messaging Smartphone Android. Jurnal EECCIS (Electrics, Electronics, Communications, Controls, Informatics, Systems). 2015; 9(1): 103–8.

Nasution YR, Johar A, Coastera FF. Aplikasi Penyembunyian Multimedia Menggunakan Metode End of File dan Huffman Coding. Rekursif: Jurnal Informatika. 2017 Nov 9; 5(1): 86–106.

Rachesti DA, Purboyo TW, Prasasti AL. Comparison of Text Data Compression Using Huffman, Shannon-Fano, Run Length Encoding, and Tunstall Methods. Int J Appl Eng Res. 2017; 12(23): 13618–22.

Pratama AM, Hasibuan NA, Buulolo E. Penerapan algoritma huffman dan shannon-fano dalam pemampatan file teks. Informasi dan Teknologi Ilmiah (INTI). 2017 Oct 30; 5(1): 31–5.

Jamaluddin J. Analisis Perbandingan Kompresi Data dengan Fixed-Length Code, Variable-Length Code dan Algoritma Huffman. Majalah Ilmiah Methoda. 2013; 3(2): 41–47.

Nandi U, Mandal JK. Region based huffman (RBH) compression technique with code interchange. Malays J Comput Sci. 2010 Sep 1; 23(2): 111–20.

Septianto T. Pemampatan Tata Teks Berbahasa Indonesia Dengan Metode Huffman Menggunakan Panjang Simbol Bervariasi. Doctoral dissertation. Universitas Brawijaya; 2015.

Yansyah DA. Perbandingan Metode Punctured Elias Code Dan Huffman Pada Kompresi File Text. JURIKOM (Jurnal Riset Komputer). 2015 Dec 12; 2(6): 33–36.