Huffman codes are very effective and widely used technique for compressing data. Huffman encoding problem is of finding the minimum length bit string which can be used to encode a string of symbols. It uses a table of frequencies of occurrence of each character to represent each character as a binary string, optimally. It uses a simple heap based priority queue. Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the weights of the leaves in its sub-tree. The Huffman encoding scheme is an example of a greedy algorithm.
Rough notes about the Algorithm and how it is implemented in the code above: Heap is declared globally so that we do not need to pass it as an argument every time. Heap implemented here is Min Heap. Heap’s node (Node) structure is defined with fields character type ch, integer type freq, *left (pointer to the structure Node basically denotes the left subtree of a node) and *right (pointer to the structure Node basically denotes the right subtree of a node). Given the list of characters along with their frequencies, our goal is to predict the encoding of the characters such that total length of message when encoded becomes minimum. Firstly Heap of type Node is initialized with heapSize = 0, heap -> freq = -INT_MAX(maximum possible value of signed int) and heap = (Node *)malloc(sizeof(Node)).
Input the character string and the frequency of that character. Store the values in a temp Node and initialize the left and right subtree to NULL.
temp -> ch = ch temp -> freq = freq temp -> left = temp -> right = NULL Insert the temp Node in the heap using Insert function. For a special case when there is only one character print “Character code of the character is 0”.
For all the distinct characters (distinct_character), find the left & right leaf nodes by deleting the minimum element them from heap. Add them in left and right subtree of temporary node. Update the frequency of the temporary node to the sum of the frequencies of he left and right subtree nodes. Insert this temporary node in the heap.
For iter=0 to distinct_char-2 Node * left = DeleteMin() Node * right = DeleteMin() Node * temp = (Node *) malloc(sizeof(Node)) temp -> ch = 0 temp -> left = left temp -> right = right temp -> freq = left->freq + right -> freq Insert(temp) Iter + 1 Initialize a Node tree and store the minimum element of the heap in it. Node *tree = DeleteMin() Declare an array of character type, code and initialize code = ‘\0’ (NULL) Print the final tree using print function.
Insert function – It takes the element to be inserted in the heap as an argument. • heapSize is increased by 1, and element is inserted at the last place. heapSize++ heap[heapSize] = element • Now the position of the element is adjusted such that heap property is maintained. That is done by comparing it with its parent and swapping them until it is greater than its parent. Store the heapSize in a temporary variable (now, refers to the index at which we are now).Until heap[now/2] > element, o heap[now] = heap[now/2] i.e. replace the value at index now by the value of its parent(index now/2) o Divide now by 2 for moving above in the list. • Now when the right index has been found, store the element there. heap[now] = element
DeleteMin function - heap is the minimum element. So we remove heap. Size of the heap is decreased. Now heap is filled with the last element in the heap and see if it fits. If it does not fit, take minimum element among both its children and replace the last element with it. Again see if the last element fits in that place. Basically, percolate down and swap with minimum child as necessary. To check if the last element fits or not it suffices to check if the last element is less than the minimum element among both the children, if it is then we are done. This is done by comparing their frequencies. print function – It takes pointer to the tree Node as temp and pointer to the code array. • If temp->left and temp->right = NULL, then this is the leaf element of the tree. Hence print the character and its code, and return from the function. • Initialize an integer variable length to the length of the array code. • Declare two arrays leftcode and rightcode to store the code of the left subtree and right subtree respectively. Initially copy the code to both leftcode and rightcode. • Append leftcode with 0 and NULL, and rightcode with 1 and NULL. • Move to the left subtree and right subtree of the temp Node.
Related Tutorials ( Common examples of Greedy Algorithms ) :
Some Important Data Structures and Algorithms, at a glance:
Basic Data Structures and Algorithms
Sorting- at a glance
Computer Science >