Java 中的霍夫曼代码

霍夫曼编码是一种创建节点二叉树的数据压缩算法。该节点可以是内部节点或叶节点。

本教程详细描述并演示了使用 Java 的 Huffman 代码。

在 Java 中演示使用 Huffman 编码算法

霍夫曼编码算法的思想是根据相应字符的频率为输入字符分配可变长度代码。

这些代码被称为前缀代码,因为赋予每个字符的代码是唯一的,这有助于霍夫曼编码的解码没有任何歧义。

我们可以使用 Java 中的优先级队列构建 Huffman 树,其中优先级最高的节点频率最低。我们将按照下面给出的步骤进行。

  • 首先,为给定文本的每个字符创建一个叶节点,并将节点添加到优先级队列中。
  • 如果队列中有多个节点,则从该队列中删除频率最低且优先级最高的两个节点。
  • 现在,创建一个新节点,其中包含之前删除的两个子节点,新节点的频率将等于两个节点的频率之和。然后将该节点添加到优先级队列中。
  • 最后剩下的节点就是根节点,树就完成了。

让我们看一个将文本转换为霍夫曼编码的 Java 示例。

主类 Huffman.java

import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
class Huffman
{
    // Huffman Tree Traversing and storing the Huffman Codes in a dictionary.
    public static void encode_huffman(Huffman_Node root_node, String str,
                        Map<Character, String> huffman_Code)
    {
        if (root_node == null) {
            return;
        }
        // if the root node is a leaf node
        if (is_Leaf(root_node)) {
            huffman_Code.put(root_node.charac, str.length() > 0 ? str : "1");
        }
        encode_huffman(root_node.left, str + '0', huffman_Code);
        encode_huffman(root_node.right, str + '1', huffman_Code);
    }
    // Huffman Tree Traversing and decoding the encoded string
    public static int decode_huffman(Huffman_Node root_node, int index, StringBuilder sb)
    {
        if (root_node == null) {
            return index;
        }
        // if the root node is a leaf node
        if (is_Leaf(root_node))
        {
            System.out.print(root_node.charac);
            return index;
        }
        index++;
        root_node = (sb.charAt(index) == '0') ? root_node.left : root_node.right;
        index = decode_huffman(root_node, index, sb);
        return index;
    }
    // This function checks if Huffman Tree contains only one single node
    public static boolean is_Leaf(Huffman_Node root_node) {
        return root_node.left == null && root_node.right == null;
    }
    // Main Huffman tree build function
    public static void Main_Build_HuffmanTree(String text)
    {
        // Base case: empty string
        if (text == null || text.length() == 0) {
            return;
        }
        // Calculate the frequency of each character and store it in a map of dict
        Map<Character, Integer> frequency = new HashMap<>();
        for (char c: text.toCharArray()) {
            frequency.put(c, frequency.getOrDefault(c, 0) + 1);
        }
        // priority queue to store nodes of the Huffman tree
        // the highest priority item has the lowest frequency
        PriorityQueue<Huffman_Node> prio_queue;
        prio_queue = new PriorityQueue<>(Comparator.comparingInt(l -> l.frequency));
        // leaf node for each character, adding it to the priority queue.
        for (var entry: frequency.entrySet()) {
            prio_queue.add(new Huffman_Node(entry.getKey(), entry.getValue()));
        }
        //repeat the process till there is more than one node in the queue
        while (prio_queue.size() != 1)
        {
            // Then remove the two nodes with the highest priority and lowest frequency
            Huffman_Node left = prio_queue.poll();
            Huffman_Node right = prio_queue.poll();
            // Now create a new internal node with two children nodes, and the frequency will be the some of both nodes; add the new node to the priority queue.
            int sum = left.frequency + right.frequency;
            prio_queue.add(new Huffman_Node(null, sum, left, right));
        }
        Huffman_Node root_node = prio_queue.peek();
        // Huffman tree Traversing and storing the Huffman codes in a dict or map
        Map<Character, String> huffmanCode = new HashMap<>();
        encode_huffman(root_node, "", huffmanCode);
        // Display the Huffman codes
        System.out.println("The Huffman Codes for the given text are: " + huffmanCode);
        System.out.println("The original text is: " + text);
        // display the encoded string
        StringBuilder sb = new StringBuilder();
        for (char c: text.toCharArray()) {
            sb.append(huffmanCode.get(c));
        }
        System.out.println("The encoded text is: " + sb);
        System.out.print("The decoded text is: ");
        if (is_Leaf(root_node))
        {
            // For input like a, aa, aaa, etc.
            while (root_node.frequency-- > 0) {
                System.out.print(root_node.charac);
            }
        }
        else {
            // Huffman Tree traversing with decoding the encoded string
            int index = -1;
            while (index < sb.length() - 1) {
                index = decode_huffman(root_node, index, sb);
            }
        }
    }
    // Call the Huffman code
    public static void main(String[] args)
    {
        String text = "This is delftstack";
        Main_Build_HuffmanTree(text);
    }
}

节点类 Huffman_Node.java

import java.util.Comparator;
import java.util.HashMap;
import java.util.Map;
import java.util.PriorityQueue;
// A Tree node
class Huffman_Node
{
    Character charac;
    Integer frequency;
    Huffman_Node left = null, right = null;
    Huffman_Node(Character charac, Integer frequency)
    {
        this.charac = charac;
        this.frequency = frequency;
    }
    public Huffman_Node(Character charac, Integer frequency, Huffman_Node left, Huffman_Node right)
    {
        this.charac = charac;
        this.frequency = frequency;
        this.left = left;
        this.right = right;
    }
}

第一类是执行霍夫曼编码算法操作的主要类,第二类是用于创建节点的类。该代码将为给定文本、编码文本生成霍夫曼代码,并对其进行解码。

输出:

The Huffman Codes for the given text are: { =010, a=11100, c=1010, d=11101, e=1000, f=11011, H=0110, h=10010, i=1111, k=11010, l=000, m=01110, .=01111, o=1100, s=001, T=10011, t=1011}
The original text is: Hello This is delftstack.com
The encoded text is: 011010000000001100010100111001011110010101111001010111011000000110111011001101111100101011010011111010110001110
The decoded text is: Hello This is delftstack.com

我们可以看到,给定的文本包含 25 个字符,即 25×8 = 200 位,而编码的字符串只有 111 位,几乎 45% 的数据压缩。这种数据压缩是霍夫曼编码的主要目的。