Computer Science‎ > ‎

### Huffman Encoding

Huffman codes are very effective and widely used technique for compressing data. Huffman encoding problem is of finding the minimum length bit string which can be used to encode a string of symbols. It uses a table of frequencies of occurrence of each character to represent each character as a binary string, optimally. It uses a simple heap based priority queue. Each leaf is labeled with a character and its frequency of occurrence. Each internal node is labeled with the sum of the weights of the leaves in its sub-tree. The Huffman encoding scheme is an example of a greedy algorithm.

### Analysis

The running time of Huffman on a set of n characters is O(nlogn).

## Huffman Encoding - C Program Source code for generating Huffman Codes

`#include<string.h>#include<stdio.h>#include<limits.h>#include<stdlib.h>typedef struct node{        char ch;        int freq;        struct node *left;        struct node *right;}node;/*Declaring heap globally so that we do not need to pass it as an argument every time*//* Heap implemented  here is Min Heap */node * heap;int heapSize;/*Initialize Heap*/void Init(){        heapSize = 0;        heap = (node *)malloc(sizeof(node));        heap->freq = -INT_MAX;}/*Insert an element into the heap */void Insert(node * element){        heapSize++;        heap[heapSize] = element; /*Insert in the last place*/        /*Adjust its position*/        int now = heapSize;        while(heap[now/2] -> freq > element -> freq)         {                heap[now] = heap[now/2];                now /= 2;        }        heap[now] = element;}node * DeleteMin(){        /* heap is the minimum element. So we remove heap. Size of the heap is decreased.            Now heap has to be filled. We put the last element in its place and see if it fits.           If it does not fit, take minimum element among both its children and replaces parent with it.           Again See if the last element fits in that place.*/        node * minElement,*lastElement;        int child,now;        minElement = heap;        lastElement = heap[heapSize--];        /* now refers to the index at which we are now */        for(now = 1; now*2 <= heapSize ;now = child)        {                /* child is the index of the element which is minimum among both the children */                 /* Indexes of children are i*2 and i*2 + 1*/                child = now*2;                /*child!=heapSize beacuse heap[heapSize+1] does not exist, which means it has only one                   child */                if(child != heapSize && heap[child+1]->freq < heap[child] -> freq )                 {                        child++;                }                /* To check if the last element fits ot not it suffices to check if the last element                   is less than the minimum element among both the children*/                if(lastElement -> freq > heap[child] -> freq)                {                        heap[now] = heap[child];                }                else /* It fits there */                {                        break;                }        }        heap[now] = lastElement;        return minElement;}void print(node *temp,char *code){        if(temp->left==NULL && temp->right==NULL)        {                printf("char %c code %s\n",temp->ch,code);                return;        }        int length = strlen(code);        char leftcode,rightcode;        strcpy(leftcode,code);        strcpy(rightcode,code);        leftcode[length] = '0';        leftcode[length+1] = '\0';        rightcode[length] = '1';        rightcode[length+1] = '\0';        print(temp->left,leftcode);        print(temp->right,rightcode);}/* Given the list of characters along with their frequencies, our goal is to predict the encoding of the   characters such that total length of message when encoded becomes minimum */ int main(){        Init();        int distinct_char ;        scanf("%d",&distinct_char);        char ch;        int freq;               int iter;        for(iter=0;iter<distinct_char;iter++)        {                char t;                scanf("%s",t); //Scanning the character as string to avoid formatting issues of input.                ch = t;                scanf("%d",&freq);                node * temp = (node *) malloc(sizeof(node));                temp -> ch = ch;                temp -> freq = freq;                temp -> left = temp -> right = NULL;                Insert(temp);        }        /* Special Case */        if(distinct_char==1)        {                printf("char %c code 0\n",ch);                return 0;        }        for(iter=0;iter<distinct_char-1 ;iter++)        {                node * left = DeleteMin();                node * right = DeleteMin();                node * temp = (node *) malloc(sizeof(node));                temp -> ch = 0;                temp -> left = left;                temp -> right = right;                temp -> freq = left->freq + right -> freq;                Insert(temp);        }        node *tree = DeleteMin();        char code;        code = '\0';        print(tree,code);}`

```Rough notes about the Algorithm and how it is implemented in the code above:

Heap is declared globally so that we do not need to pass it as an argument every time. Heap implemented here is Min Heap. Heap’s node (Node) structure is defined with fields character type ch, integer type freq, *left (pointer to the structure Node basically denotes the left subtree of a node) and *right (pointer to the structure Node basically denotes the right subtree of a node).

Given the list of characters along with their frequencies, our goal is to predict the encoding of the characters such that total length of message when encoded becomes minimum.

Firstly Heap of type Node is initialized with heapSize = 0, heap -> freq = -INT_MAX(maximum possible value of signed int) and heap = (Node *)malloc(sizeof(Node)).```
`Input the character string and the frequency of that character. Store the values in a temp Node and initialize the left and right subtree to NULL.`
```
temp -> ch = ch
temp -> freq = freq
temp -> left = temp -> right = NULL
Insert the temp Node in the heap using Insert function.
For a special case when there is only one character print “Character code of the character
is 0”.```
```
For all the distinct characters (distinct_character), find the left & right leaf nodes by deleting the minimum element them from heap. Add them in left and right subtree
of temporary node. Update the frequency of the temporary node to the sum of the frequencies of he left and right subtree nodes. Insert this temporary node in the heap.```
```
For iter=0 to distinct_char-2
Node * left = DeleteMin()
Node * right = DeleteMin()
Node * temp = (Node *) malloc(sizeof(Node))
temp -> ch = 0
temp -> left = left
temp -> right = right
temp -> freq = left->freq + right -> freq
Insert(temp)
Iter + 1
Initialize a Node tree and store the minimum element of the heap in it.
Node *tree = DeleteMin()
Declare an array of character type, code and initialize code = ‘\0’ (NULL)
Print the final tree using print function.```
```
Insert function – It takes the element to be inserted in the heap as an argument.
• heapSize is increased by 1, and element is inserted at the last place.
heapSize++
heap[heapSize] = element
• Now the position of the element is adjusted such that heap property is maintained.
That is done by comparing it with its parent and swapping them until it is greater
than its parent. Store the heapSize in a temporary variable (now, refers to the
index at which we are now).Until heap[now/2] > element,
o heap[now] = heap[now/2] i.e. replace the value at index now by the

value of its parent(index now/2)

o Divide now by 2 for moving above in the list.

• Now when the right index has been found, store the element there.

heap[now] = element```
```
DeleteMin function - heap is the minimum element. So we remove heap. Size of the heap is decreased. Now heap is filled with the last element in the heap and see if it fits. If it does not fit, take minimum element among both its children and replace the last element with it. Again see if the last element fits in that place. Basically, percolate down and swap with minimum child as necessary. To check if the last element fits or not it suffices to check if the last element is less than the minimum element among both the children, if it is then we are done. This is done by comparing their frequencies.

print function – It takes pointer to the tree Node as temp and pointer to the code array.
• If temp->left and temp->right = NULL, then this is the leaf element of the tree.
Hence print the character and its code, and return from the function.
• Initialize an integer variable length to the length of the array code.
• Declare two arrays leftcode and rightcode to store the code of the left
subtree and right subtree respectively. Initially copy the code to both leftcode and
rightcode.
• Append leftcode with 0 and NULL, and rightcode with 1 and NULL.
• Move to the left subtree and right subtree of the temp Node.```

Related Tutorials ( Common examples of Greedy Algorithms ) :

 Elementary cases : Fractional Knapsack Problem, Task Scheduling Elementary problems in Greedy algorithms - Fractional Knapsack, Task Scheduling. Along with C Program source code. Data Compression using Huffman Trees Compression using Huffman Trees. A greedy technique for encoding information.

Some Important Data Structures and Algorithms, at a glance:

 Arrays : Popular Sorting and Searching Algorithms Bubble Sort Insertion Sort Selection Sort Shell Sort Merge Sort Quick Sort Heap Sort Binary Search Algorithm Basic Data Structures  and Operations on them Stacks Queues Single Linked List Double Linked List Circular Linked List
 Tree Data Structures Binary Search Trees Heaps Height Balanced Trees Graphs and Graph Algorithms Depth First Search Breadth First Search Minimum Spanning Trees: Kruskal Algorithm Minumum Spanning Trees: Prim's Algorithm Dijkstra Algorithm for Shortest Paths Floyd Warshall Algorithm for Shortest Paths Bellman Ford Algorithm Popular Algorithms in Dynamic Programming Dynamic Programming Integer Knapsack problem Matrix Chain Multiplication Longest Common Subsequence Greedy Algorithms Elementary cases : Fractional Knapsack Problem, Task Scheduling Data Compression using Huffman Trees

Basic Data Structures and Algorithms Sorting- at a glance

 Bubble Sort - One of the most elementary sorting algorithms to implement - and also very inefficient. Runs in quadratic time. A good starting point to understand sorting in general, before moving on to more advanced techniques and algorithms. A general idea of how the algorithm works and a the code for a C program.Insertion Sort - Another quadratic time sorting algorithm - an example of dynamic programming. An explanation and step through of how the algorithm works, as well as the source code for a C program which performs insertion sort.Selection Sort - Another quadratic time sorting algorithm - an example of a greedy algorithm. An explanation and step through of how the algorithm works, as well as the source code for a C program which performs selection sort.Shell Sort- An inefficient but interesting algorithm, the complexity of which is not exactly known.Merge Sort An example of a Divide and Conquer algorithm. Works in O(n log n) time. The memory complexity for this is a bit of a disadvantage.Quick Sort In the average case, this works in O(n log n) time. No additional memory overhead - so this is better than merge sort in this regard. A partition element is selected, the array is restructured such that all elements greater or less than the partition are on opposite sides of the partition. These two parts of the array are then sorted recursively.Heap Sort- Efficient sorting algorithm which runs in O(n log n) time. Uses the Heap data structure.Binary Search Algorithm- Commonly used algorithm used to find the position of an element in a sorted array. Runs in O(log n) time.Basic Data Structures and Algorithms Stacks Last In First Out data structures ( LIFO ). Like a stack of cards from which you pick up the one on the top ( which is the last one to be placed on top of the stack ). Documentation of the various operations and the stages a stack passes through when elements are inserted or deleted. C program to help you get an idea of how a stack is implemented in code.Queues First in First Out data structure (FIFO). Like people waiting to buy tickets in a queue - the first one to stand in the queue, gets the ticket first and gets to leave the queue first. Documentation of the various operations and the stages a queue passes through as elements are inserted or deleted. C Program source code to help you get an idea of how a queue is implemented in code.Single Linked List A self referential data structure. A list of elements, with a head and a tail; each element points to another of its own kind.Double Linked List- A self referential data structure. A list of elements, with a head and a tail; each element points to another of its own kind in front of it, as well as another of its own kind, which happens to be behind it in the sequence.Circular Linked List Linked list with no head and tail - elements point to each other in a circular fashion. Binary Search Trees A basic form of tree data structures. Inserting and deleting elements in them. Different kind of binary tree traversal algorithms. Heaps - A tree like data structure where every element is lesser (or greater) than the one above it. Heap formation, sorting using heaps in O(n log n) time. Height Balanced Trees - Ensuring that trees remain balanced to optimize complexity of operations which are performed on them. Graphs Depth First Search - Traversing through a graph using Depth First Search in which unvisited neighbors of the current vertex are pushed into a stack and visited in that order.Breadth First Search - Traversing through a graph using Breadth First Search in which unvisited neighbors of the current vertex are pushed into a queue and then visited in that order.Minimum Spanning Trees: Kruskal Algorithm- Finding the Minimum Spanning Tree using the Kruskal Algorithm which is a greedy technique. Introducing the concept of Union Find.Minumum Spanning Trees: Prim's Algorithm- Finding the Minimum Spanning Tree using the Prim's Algorithm.Dijkstra Algorithm for Shortest Paths- Popular algorithm for finding shortest paths : Dijkstra Algorithm.Floyd Warshall Algorithm for Shortest Paths- All the all shortest path algorithm: Floyd Warshall AlgorithmBellman Ford Algorithm - Another common shortest path algorithm : Bellman Ford Algorithm.Dynamic Programming A technique used to solve optimization problems, based on identifying and solving sub-parts of a problem first.Integer Knapsack problemAn elementary problem, often used to introduce the concept of dynamic programming.Matrix Chain Multiplication Given a long chain of matrices of various sizes, how do you parenthesize them for the purpose of multiplication - how do you chose which ones to start multiplying first?Longest Common Subsequence Given two strings, find the longest common sub sequence between them. Elementary cases : Fractional Knapsack Problem, Task Scheduling - Elementary problems in Greedy algorithms - Fractional Knapsack, Task Scheduling. Along with C Program source code.Data Compression using Huffman TreesCompression using Huffman Trees. A greedy technique for encoding information.