Data Structure Review

Data Structure Review for Final Exam#

Author:itsbrqs

Preface#

This note is a review summary specifically for the content of the 2023 Data Structure final exam. The timeliness and accuracy of the exam points are not guaranteed, compiled and organized by itsbrqs. This note is created by itsbrqs to prepare for the final exam.

Knowledge Points and Exam Points#

Summary of Exam Points#

Chapter 1 Introduction:
- Abstract Data Types
- Time Complexity
Chapter 2 Linear Lists:
- Sequential List
- Backward Singly Linked List
Chapter 3 Stacks and Queues
- Structural Characteristics of Stacks and Queues
- Basic Operations of Stacks and Queues
Chapter 5 Arrays and Generalized Lists
- Address Calculation Formula
- Storage Structure of Triples
Chapter 6 Trees and Binary Trees
- Properties of Binary Trees
- Traversal Algorithms of Binary Trees and Their Applications
- Traversing Binary Trees and Threaded Binary Trees
- Conversion Between Trees and Forests
- Traversal Algorithms of Trees and Forests (how to write sequences)
Chapter 7 Graphs
- Properties of Graphs
- Storage Structures of Graphs
  - Adjacency Matrix
  - Adjacency List
- Graph Traversal
  - Writing Traversal Sequences
  - Depth-First Search and Breadth-First Search
- Minimum Spanning Tree
- Applications of Graphs
  - Topological Sorting
  - Critical Path
  - Shortest Path

Question Types#

5 Multiple Choice Questions + 5 Fill-in-the-Blank Questions + 4 Application Questions + 2 Algorithm Writing Questions

Chapter 1 Introduction#

Knowledge Point 1 Definition of Data Structure#

Definition of the Discipline: Data structure is a discipline that studies the computational objects of computer programming problems related to non-numeric calculations and their relationships and operations.
Definition of Data Structure: Data elements and their interrelations.
Simple Classification of Data Structures: Logical Structure and Storage Structure.

Knowledge Point 2 Four Basic Logical Structures#

Set: There are no other relationships between data elements except for the relationship of "belonging to the same set."
Linear Structure: There is a one-to-one relationship between data elements.
Tree Structure: There is a one-to-many relationship between data elements.
Graph Structure: There is a many-to-many relationship between data elements.

Among them, set structure, tree structure, and graph structure are all non-linear structures.

Non-linear Structures: Tree, Binary Tree, Directed Graph, Undirected Graph, Set

Linear Structures: Linear List, Stack and Queue, Array, Generalized List

Knowledge Point 3 Data#

Definition: In the field of computer science, data refers to the totality of symbols that can be input into a computer and processed by it, which are the "raw materials" for computer programs. It includes numbers, characters, images, sounds, etc.
Operations: Input, Output, Sorting, Searching, Inserting, Deleting.

Knowledge Point 4 Classification of Data Structures#

Logical Structure: The logical relationships and interconnections between data elements.
Physical Structure (Storage Structure): The representation of data structures in a computer. Storage structures are further divided into Sequential Storage Structures and Linked Storage Structures.

Knowledge Point 5 Data Storage Structure#

Definition: The storage representation of data objects in a computer is called the data storage structure, also known as the physical structure.
Data elements in a computer have two basic storage structures: sequential storage structure and linked storage structure.

Sequential Storage Structure

Definition: The sequential storage structure represents the logical relationships between data elements by the relative positions of the elements in memory, usually described using the array type of programming languages.
Adjacent storage, using the relative positions of elements in memory to represent the logical relationships between data elements.
Advantages: Occupies less storage space, high storage density. Disadvantages: Fragmentation occurs, and moving elements is required during insertion or deletion.

Linked Storage Structure#

The sequential storage structure requires all elements to be stored in a contiguous block of memory, while the linked storage structure does not need to occupy a whole block of memory. However, to represent the relationships between nodes, each node must have additional pointer fields to store the addresses of successor elements, so linked storage structures are usually described using pointer types in programming languages.
Non-adjacent storage, using pointers that store the addresses of elements to represent the logical relationships between data elements.
Advantages: High space utilization, simple operations. Disadvantages: Occupies more storage space, low data density.

The relationship between the two structures: The logical structure and physical structure of data are two inseparable aspects; the design of an algorithm depends on the selected logical structure, while the implementation of the algorithm relies on the adopted storage structure.

Knowledge Point 6 Abstract Data Types#

Abstract Data Types generally refer to user-defined mathematical models that represent application problems, along with a set of operations defined on this model, specifically including three parts: data objects, a set of relationships on data objects, and a set of basic operations on data objects.

Knowledge Point 7 Algorithms#

Algorithms are a finite sequence of operations specified to solve a certain type of problem.

Five characteristics that an algorithm must satisfy:

Finiteness
Definiteness: For every situation, the operations to be performed are clearly specified in the algorithm, avoiding ambiguity, allowing the executor or reader of the algorithm to understand its meaning and how to execute it.
Effectiveness
Input
Output

Knowledge Point 8 Evaluation of Algorithms#

Correctness
Readability
Robustness
Efficiency: Efficiency includes both time and space aspects; time efficiency refers to the rational design of the algorithm and high execution efficiency, which can be measured by time complexity; space efficiency refers to the rational occupation of storage capacity by the algorithm, which can be measured by space complexity. Time and space complexity are the two main indicators for measuring algorithms.

Knowledge Point 9 Time Complexity#

The main methods for measuring algorithm efficiency are divided into two categories: post-statistical methods and pre-estimation methods. The post-statistical method requires the algorithm to be implemented first, and then the time and space overhead are measured. The shortcomings of this method are obvious: first, the algorithm must be converted into an executable program, and second, the measurement results of time and space overhead depend on environmental factors such as the software and hardware of the computer, which can easily obscure the advantages and disadvantages of the algorithm itself. Therefore, the commonly adopted method is pre-analysis estimation, which measures the efficiency of the algorithm by calculating its asymptotic complexity.

Calculation of Algorithm Time Complexity#

To find the deepest loop count, only the highest order needs to be calculated.

Example of Algorithm Time Complexity Analysis#

Constant Order O(1)
```
{x ++ ;}
```

Linear Order O(n)

for( int i = 0 ; i < n ; i ++ ) 
    s ++ ;

Square Order O(n^2)

# Bubble sort is also square order
for (i = 0; i < size-1;i ++)//size-1 is because it does not need to compare with itself, so there is one less number to compare
	{
		int count = 0;
		for (j = 0; j < size-1 - i; j++)	//size-1-i is because each pass will have one less number to compare
		{
			if (arr[j] > arr[j+1])// This is the ascending order method, comparing the previous number with the next number, if the previous number is larger, swap it with the next number
			{
				tem = arr[j];
				arr[j] = arr[j+1];
				arr[j+1] = tem;
				count = 1;
			}
		}
		if (count == 0)			// If no positions were swapped in a certain pass, it means it is already sorted, exit the loop directly
				break;	
	}

Chapter 2 Linear Lists#

Linear List: A finite sequence composed of n (n≥0) elements with the same data characteristics.

Storage structures of linear lists: sequential storage structure and linked storage structure.

Basic operations of data structures: modification, insertion, deletion, searching, sorting.

Knowledge Point 10 Sequential Storage Dynamic Allocation Linear List#

Advantages and disadvantages of sequential list storage structure:

Advantages: Logical adjacency, physical adjacency; can randomly access any element; compact use of storage space;

Disadvantages: Insertion and deletion operations require moving a large number of elements;

Storage Structure#

typedef struct{
	ElemType *elem;// First address
     int length;// Current length
     int listsize;// Current occupied storage capacity
}

Initialization#

/* Algorithm Idea
*1. Space allocation
*2. Check if allocation is successful
*3. Initialize other parameters
*/
status Initlist_sq(Sqlist &L){
    // Allocate space
    L.elem=(ElemType)* malloc(L * sizeof(ElemType));
   // Check if allocation is successful 
    if(!L.elem) retrun -2;
   // Initialize other parameters
    L.length=0;
    L.listszie=0;
    return 0;
}

Insertion#

/* Algorithm Idea
*1. Check the validity of i
*2. Check for overflow and reallocate space if necessary
*3. Check if reallocation is successful
*4. Take the element at the insertion position and shift the whole array back
*5. Increase the length of the list by 1
*/
status ListInsert_sq(Sqlist* L,int i,int e){
    int* newbase,q,p;
    // Check if i is valid and will not go out of bounds
    if(i<1||i>(L->length+1))
    	return ERROR;
	// Allocate new memory for storage when length exceeds limit
	if(L->length>=L->listsize){
		newbase=(int*)realloc(L->elem,(L->listsize+LISTINCREMENT)*
		sizeof(int));
     // Check if storage allocation failed
     if(!newbase)
		return (OVERFLOW);
	L->elem=newbase;
	L->listsize+=LISTINCREMENT;
	}
     q=&(L->elem[i-1]);// q is the insertion position
    if(L->length>0){
		for(p=&(L->elem[(L->length)-1]); p>=q; --p){
		*(p+1)=*p;
        }
    }
	*q=e;
	++(L->length);
	return OK;
};

Insertion

q=&(L->elem[i-1]);// q is the insertion position
    if(L->length>0){// Check if it is empty
		for(p=&(L->elem[(L->length)-1]); p>=q; --p){
		*(p+1)=*p;// Shift elements back
        }
    }
*q=e;// Insert element

Deletion#

/* Algorithm Idea
*1. Check the validity of i
*2. Take the position of the element to be deleted
*3. Assign the deleted element to e
*4. Shift the elements forward from the deleted element's position
*5. Decrease the length of the list by 1
*/
// Core code: Delete and shift
q=&(L->elem[i-1]);// Take the position of the deleted element
*e=*p;// Assign the deleted element to e
q=L->elem+L->length-1;// End of the list
for(++p;p<=q;++p)
	*(p-1)=*p;// Shift forward

Knowledge Point 11 Linked Storage Backward Singly Linked Linear List#

Characteristics of Singly Linked List:

It is a dynamic structure, the entire storage space is shared by multiple linked lists; does not require pre-allocation of space; pointers occupy additional storage space; cannot be accessed randomly, search speed is slow.

Storage Structure#

typedef struct LNode{
	Elemtype data;// Data field
	struct LNode *next;// Pointer field
}

Initialization#

Status InitList_L(LNode *L){
    L->next=NULL;// It is very important to set the next field of the head node to NULL
    return OK;
};

Since C++ reference parameters are not used and the structure pointer is passed in, no space allocation is needed.

Insertion#

/* Algorithm Idea
*1. Take the previous node of the insertion position
*2. Check the value of i
*3. Generate the insertion node
*4. Disconnect the old node and insert the new node
*/
Status ListInsert_L(LNode *L,int i,Elemtype e){
    int j=0;
    LNode *p;
    p=L;
    while(p&&j<i-1){
    p=p->next;
    j++;
    }
    if(!p||j>i-1){
    return ERROR;// Check i value, negative or greater than list length + 1
    }
    
    LNode *s;
    s=(LinkLIst) malloc(sizeof(LNode));// Space allocation!!! Generate insertion node
    s->data=e;
    // Start insertion
    s->next=p->next;// Must be done for middle insertion
    p->next=s;
    return OK;
};

The diagram is as follows:

Deletion#

/* Algorithm Idea
*1. Take the previous node of the deletion position
*2. Check the value of i
*3. Generate the deletion node
*4. Disconnect the old node and link to the next node
*5. Return the deleted value and free the deletion node
*/
Status ListDelete_L(LNode *L,int i,Elemtype *e){
    int j=0;
    LNode *p;
    p=L;
    
    while(p&&j<i-1){
        p=p->next;
        j++;
    }
    if(!(p->next)||j>i-1){
        return ERROR;// Check i value, negative or greater than list length + 1
    }
    
    LNode *q;
    q=(LinkLIst) malloc(sizeof(LNode));// Space allocation!!!
    
    q=p->next;// Get the deleted node
    p->next=q->next;// Bypass the deleted node to achieve deletion
    
    *e=q->data;
    free(q);
    return OK;
};

The diagram is as follows:

Chapter 3 Stacks and Queues#

From the perspective of data structures, stacks and queues are also linear lists, and their operations are a subset of linear list operations, belonging to restricted linear lists. However, from the perspective of data types, they are important abstract data types that are quite different from linear lists.

Knowledge Point 12 Structure, Definition, and Operations of Stacks#

Definition: A linear list that restricts insertion or deletion operations only at the tail.

Structure:

typedef struct stack{ 
    int stacksize; // Capacity of the stack 
    struct stack *head; // Pointer to the top of the stack 
    struct stack *base; // Pointer to the bottom of the stack
}

Operations:

Push push mnemonic: Stack pointer top "push first then add": S[top++]=an+1

Pop pop mnemonic: Stack pointer top "subtract first then pop": e=S[--top]

// Insertion Push
Status Push(SqStack *S,SElemType e){
    if(S->top-S->base>S->stacksize){
        S->base=(SElemType *)realloc(S->base,(S->stacksize+STACKINCREMENT)*sizeof(SElemType));// Stack full, allocate additional space
        if(!S->base)return -2;// Allocation failed
        S->top=S->base+S->stacksize;// Move the top pointer of the stack
        S->stacksize+=STACKINCREMENT;
    }
    *(S->top)++=e;// After insertion, the top pointer of the stack moves up, this is actually two operations
}
// Deletion Pop
Status Pop(SqStack *S,SElemType *e){
    if(S->base==S->top){
        printf("error: SqStack NULL");
        return 0;// Stack empty
    }
    *e=*(S->top-1);
    S->top--;
    return 1;
}

Characteristics: Last In First Out (LIFO), when there are no elements in the list, it is called an empty stack.

Knowledge Point 13: Structure Definition and Operations of Queues#

Definition: Insertion is only allowed at one end (the front of the queue), while deletion is performed at the other end (the rear of the queue). First In First Out (FIFO).

Structure:

typedef struct QNode{
    QElemType data;
    struct QNode *next;   
}QNode,*QueuePtr;

typedef struct 
{
    QueuePtr front;// Pointer to the front of the queue
    QueuePtr rear;// Pointer to the rear of the queue
    int len;
}LinkQueue;

Operations:

Enqueue EnQueue (insert at the rear): rear->next=S; rear=S;

Dequeue DEQueue (delete from the front): front->next=p->next;

// Insertion EnQueue Deletion DEQueue
Status EnQueue(LinkQueue *Q,QElemType e){
    QNode *p=(QueuePtr)malloc (sizeof(QNode));
    p->data=e;
    p->next=NULL;
    if(Q->front==NULL){
        Q->front=Q->rear=p;
    }else{
        Q->rear->next=p;
        Q->rear=p;
    }
    Q->len++;
    return 1;
    }
Status DeQueue(LinkQueue *Q,QElemType *e){
    int isEmpty=QueueEmpty(Q);
    if(!isEmpty){
    QNode *p=Q->front;
    Q->front=p->next;
    printf("info: Queue delete head elem %d success\n",p->data);
    free(p);
    return 1;
    }{
    printf("error:Queue Empty \n");
    return 0;
    }
    Q->len--;
}

Characteristics: First In First Out (FIFO), the characteristic of an empty linked queue: front=rear. The condition for an empty linked queue is that the front and rear pointers are equal, while the condition for a full circular queue can be determined by two methods: the rear plus one equals the front and setting a flag.

Knowledge Point 14: Similarities and Differences Between Linear Lists, Stacks, and Queues:#

Similarities:

The logical structure is the same; they are all linear;
They can be stored using sequential storage or linked storage;
Stacks and queues are two special linear lists, i.e., restricted linear lists (only restrictions on insertion and deletion operations).

Differences:

Different operational rules:
- Linear lists allow random access; stacks only allow insertion and deletion operations at one end, thus they are Last In First Out (LIFO);
- Queues only allow insertion at one end and deletion at the other end, thus they are First In First Out (FIFO).
Different uses: linear lists are more general; stacks are used for function calls, recursion, and simplifying designs; queues are used for discrete event simulation, OS job scheduling, and simplifying designs.

Chapter 5 Arrays and Generalized Lists#

Knowledge Point 15 Array Address Calculation Formula#

20190201170604387

ji represents the coordinate of the element in the i-th dimension of the n-dimensional array
ai represents the starting coordinate of the i-th dimension of the n-dimensional array
bi represents the length of the i-th dimension
L represents the number of bytes occupied by an element

Example: The address of the first element of a sequential list is 100, and each element has a length of 2, then the address of the fifth element is 108.

In one dimension, the above formula is loc(j1)=loc(a1)+L(j1-a1)*

loc=100+2*(0+(5-1))=108

Example: In a two-dimensional array, if each element has a length of 4 bytes, the row index i ranges from 1 to 8, and the column index ranges from 1 to 10, starting from the base address SA and stored continuously in row order, then the starting address of the element A[8][6] is SA+300.

Formula? No need for a dog, just draw it out and count which one, A[8][6] is the 76th element.

loc=SA+4*(76-1)=SA+300

Starting from the base address SA and stored continuously in column order, then the starting address of the element A[8][6] is SA+168.

By column count, A[8][6] is 48 elements.

In summary: For a two-dimensional array A[X][Y], if the number of row elements is a and the number of column elements is b, and the address of the first element is A, and the length of the element is s, the address in row order is A+s*((X-1)*b+Y-1), and in column order it is A+s*((Y-1)*a+X-1).

Knowledge Point 16 Storage Structure of Triples#

typedef struct
{
    int i,j;
    ElemType e;
}Triple
# C language description of the storage structure of triples, sparse matrix
#define MAXSIZE 125000
typedef struct
{
Triple data[MAXSIZE+1];// The triple table contained in the matrix, data[0] is unused
int mu,nu,t;// mu is the number of rows, nu is the number of columns, t is the number of non-zero elements
}TSMatrix

Chapter 6 Trees and Binary Trees#

Knowledge Point 17 Trees#

Concept and Terminology of Trees
- Tree: A finite set of n (n≥0) nodes. When n=0, it is called an empty tree; any non-empty tree satisfies the following conditions:
  - There is exactly one specific node called the root;
  - When n>1, the remaining nodes, except for the root node, are divided into m (m>0) mutually disjoint finite sets T1, T2, …, Tm, where each set is also a tree and is called a subtree of this root node.
- Degree of a node: The number of subtrees owned by a node.
  
  Degree of a tree: The maximum degree of all nodes in the tree.
- Leaf node: A node with a degree of 0, also called a terminal node.
  
  Branch node: A node with a degree not equal to 0, also called a non-terminal node.
- Children, parents: The root node of a subtree of a certain node in the tree is called the child node of that node, and that node is called the parent node of its child nodes;
  
  Siblings: Child nodes that have the same parent are called siblings.
- Path: If the sequence of nodes in the tree is n1, n2, …, nk, and the relationship is such that node ni is the parent of ni+1 (1<=i<k), then n1, n2, …, nk is called a path from n1 to nk; the number of edges traversed on the path is called the path length.
- Ancestors and descendants: In a tree, if there is a path from node x to node y, then x is called an ancestor of y, and y is called a descendant of x.
- The layer number of a node: The layer number of the root node is 1; for any other node, if a certain node is at layer k, then its child nodes are at layer k+1. The depth of a tree: The maximum layer number of all nodes in the tree, also called height.
- Level-order numbering: The nodes in the tree are numbered sequentially from 1, starting from the upper layer to the lower layer, and from left to right within the same layer.
- Ordered tree, unordered tree: If the subtrees of each node in a tree are ordered from left to right, the tree is called an ordered tree; otherwise, it is called an unordered tree. Generally, the trees discussed in data structures are ordered trees.
- Trees usually have three traversal methods: pre-order (root) traversal, post-order (root) traversal, and level-order (sequence) traversal (trees, not binary trees, do not have in-order traversal).

About the height and depth of trees

The depth of a tree is the number of edges traversed from the root node to that node, so the height equals depth plus 1.

Knowledge Point 18 Definition and Properties of Binary Trees#

Definition of Binary Trees#

A binary tree is a finite set of n (n≥0) nodes, which is either an empty set (called an empty binary tree) or consists of a root node and two mutually disjoint binary trees called the left subtree and right subtree of the root node.

Full Binary Tree: In a binary tree, if all branch nodes have both left and right subtrees, and all leaves are on the same level.

(Characteristics of a full binary tree: Leaves can only appear at the lowest level; there are only nodes of degree 0 and degree 2.)

Complete Binary Tree: For a binary tree with n nodes, if the nodes are numbered in level order, the position of the node with index i (1≤i≤n) is exactly the same as that of the node with index i in a full binary tree of the same depth.

Characteristics of a complete binary tree:

In a full binary tree, starting from the last node, any number of nodes can be continuously removed, resulting in a complete binary tree.
Leaf nodes can only appear in the last two layers, and the leaf nodes in the last layer are all concentrated on the left side of the binary tree.
In a complete binary tree, if there is a node of degree 1, there can only be one, and that node can only have a left child.
A complete binary tree of depth k must be a full binary tree at level k-1.
A full binary tree must be a complete binary tree, but a complete binary tree is not necessarily a full binary tree; in other words, a full binary tree is stricter.

Properties of Binary Trees#

Property 1: The maximum number of nodes at the i-th level of a binary tree is 2^(i-1) (i≥1).
Property 2: In a binary tree of depth k, the maximum number of nodes is 2^k-1, and the minimum number of nodes is k. A binary tree of depth k with 2^k-1 nodes must be a full binary tree.
Property 3: In a binary tree, if the number of leaf nodes is n0 and the number of nodes of degree 2 is n2, then: n0 = n2 + 1. (The degree of a node refers to the number of edges it emits.)
Property 4: The depth of a complete binary tree with n nodes is log2n + 1.
Property 5: For a complete binary tree with n nodes, numbered in level order starting from 1, for any node with index i (1≤i≤n), the following holds:
- (1) If i > 1, the index of the parent node of node i is i/2; if i = 1, node i is the root node and has no parent node.
- (2) If 2i ≤ n, the index of the left child of node i is 2i; if 2i > n, node i has no left child.
- (3) If 2i + 1 ≤ n, the index of the right child of node i is 2i + 1; if 2i + 1 > n, node i has no right child.

Knowledge Point 19 Traversal of Binary Trees#

Pre-order traversal means that for any node in the tree, the node is printed first, then its left subtree is printed, and finally its right subtree is printed.
In-order traversal means that for any node in the tree, its left subtree is printed first, then the node itself, and finally its right subtree is printed.
Post-order traversal means that for any node in the tree, its left subtree is printed first, then its right subtree, and finally the node itself.

Level-order traversal is performed by traversing each level from left to right.

Knowledge Point 20 Storage of Binary Trees#

/*------ Binary Tree's Binary Linked List Storage Structure ------*/
typedef struct BiTNode
{
    TElemType data;
    struct BiTNode* left;
    struct BiTNode* right;
}BTNode,*BiTree;

Knowledge Point 21 Threaded Binary Trees and Storage#

When creating a binary tree, we find that there are two pointer fields, some of which point to empty structures, which wastes a lot of space. Therefore, to avoid wasting this space, we take a measure: use those empty addresses to store the addresses of the predecessor and successor nodes in a certain traversal order. The pointers that point to the predecessors and successors are called threads, and the corresponding binary linked list is called a threaded linked list, and the corresponding binary tree is called a threaded binary tree.

The following diagram shows a threaded binary tree in in-order traversal, where the in-order traversal is 1, 3, 6, 8, 10.

Threading is an operation on binary trees, meaning to thread the binary tree, which aims to make the threaded binary tree have the characteristic of being easy to traverse, that is, to perform in-order traversal of the threaded tree without using recursion and stacks.

The process of traversing a binary tree in a certain order to make it a threaded binary tree is called threading.

Storage of Threaded Binary Trees

struct Tree
{
    char  data;        // Data to be stored 
    int num;
    int LisThread;// Left thread flag, value is 0 or 1
    int RisThread; 
    struct Tree *lchild,*rchild;    // Pointers to left and right children
};

Knowledge Point 22 Applications of Binary Trees - Huffman Trees and Huffman Coding#

Basic Concept of Huffman Trees:#

Huffman Tree: Given a set of leaf nodes with determined weights, it is the binary tree with the minimum weighted path length.

Characteristics of Huffman Trees:#

Leaf nodes with larger weights are closer to the root node, while leaf nodes with smaller weights are farther from the root node.
There are only nodes of degree 0 (leaf nodes) and degree 2 (branch nodes), and there are no nodes of degree 1.

Traversal of Huffman Trees:#

Steps to Construct a Huffman Tree (i.e., Huffman Algorithm):#

(1) From the given n weights { w1, w2, …, wn }, construct a set of n binary trees F = { T1, T2, …, Tn } (i.e., a forest), where each binary tree Ti has only one root node with weight wi, and both left and right subtrees are empty.

(2) Select two trees with the smallest root weights from F as the left and right subtrees to construct a new binary tree, and let the weight of the new binary tree's root node equal the sum of the weights of the roots of its left and right subtrees.

(3) Remove these two trees from F and add the newly obtained binary tree to F.

(4) Repeat (2) and (3) until F contains only one tree. This tree is the Huffman tree.

Calculation of Weighted Path Length (WPL)#

Weighted path length = weight of leaf nodes * path length, for the above Huffman tree, the leaf nodes are 7, 8, 5, 3, 11, 23, WPL=7*4+8*4+3*4+5*4+11*3+23*2=167.

Calculation of Huffman Coding#

Definition of Huffman Coding

Huffman coding specifies that the left branch of the Huffman tree is 0 and the right branch is 1, and the sequence of 0s and 1s corresponding to the branches traversed from the root node to each leaf node forms the code for the character corresponding to that node. Such coding is called Huffman coding. Huffman coding is a type of prefix-free coding. During decoding, there is no confusion. Its main applications are in data compression, encryption, and decryption.

Knowledge Point 23 Conversion Between Trees and Forests#

Tree to Binary Tree#

Add lines: Add a line between all adjacent siblings in the tree.
Remove lines: For each node in the tree, keep only the line between it and its first child, and remove the lines to other child nodes.
Rotate: Rotate the entire tree clockwise around the root node as the axis, making the structure clear, thus obtaining a binary tree.

Conversion Between Forests and Binary Trees#

Converting a Binary Tree to a Forest

Add lines: Connect each node in the binary tree to the right child of its left child.
Remove lines: Remove the lines between each node and its right child.
Rotate: Rotate the entire tree counterclockwise around the root node as the axis, making the structure clear, thus obtaining a forest.

Conversion of Forests to Binary Trees

Add lines: Add a line between all adjacent siblings in the forest.
Remove lines: For each node in the forest, keep only the line between it and its first child, and remove the lines to other child nodes.
Rotate: Rotate the entire tree clockwise around the root node as the axis, making the structure clear, thus obtaining a binary tree.

Knowledge Point 24 Traversal of Trees and Forests#

Traversal of Trees#

Trees only have pre-order, post-order, and level-order traversals; there is no in-order traversal.

Pre-order Traversal

Like binary trees, it visits the root first, then left, then right, with decreasing priority. If the tree is non-empty, it first visits the root node, then in order from left to right, it pre-order traverses each subtree of the root node. (For example, the pre-order traversal of the tree in the above diagram is A B E F C D G I H.)

Post-order Traversal

If the tree is non-empty, it post-order traverses each subtree of the root node in order from left to right, then visits the root node. (For example, the post-order traversal of the tree in the above diagram is E F B C I G H D A.)

Traversal of Forests#

It can be directly traversed in pre-order as a binary tree, or it can be converted to a binary tree and then traversed.

For example, the pre-order traversal of the forest in the above diagram is A B C D E F G H J I.

Chapter 7 Graphs#

Knowledge Point 25 Definition and Properties of Graphs#

Definition of Graph: A graph (Graph) G is composed of two sets V(G) and E(G), denoted as: G=(V,E).

Where: V(G) is a non-empty finite set of vertices; E(G) is a finite set of edges, which are unordered or ordered pairs of vertices.

Directed Graph, Undirected Graph

Degree of a vertex:

In an undirected graph, the degree of a vertex is the number of edges connected to each vertex;

In a directed graph, the degree of a vertex is divided into in-degree and out-degree.

In-degree: The number of arcs with that vertex as the head.

Out-degree: The number of arcs with that vertex as the tail.

Path: A sequence of vertices.

Path Length: The number of edges along the path or the sum of the weights of the edges along the path.

Cycle: A path where the first vertex and the last vertex are the same.

Simple Path: A path where the vertices do not repeat.

Simple Cycle: A cycle where all vertices except the first and last do not repeat.

Connected: If there is a path from vertex V to vertex W, then V and W are said to be connected.

Connected Graph: Any two vertices in the graph are connected.

Connected Component: Each connected part of a disconnected graph.

Strongly Connected Graph: In a directed graph, if there exists a path from Vi to Vj and from Vj to Vi for every pair Vi, Vj ≠ V, Vi ≠ Vj, then G is called a strongly connected graph.

Knowledge Point 26 Storage Structure of Graphs#

Adjacency Matrix#

The adjacency matrix representation of a graph, also known as array representation. It uses two arrays to represent the graph:

A one-dimensional array used to store vertex information;
A two-dimensional array used to store the relationships between vertices in the graph, called the adjacency matrix.

Adjacency Matrix of Unweighted Graphs

Adjacency Matrix of Weighted Graphs

# Graph structure adjacency matrix storage structure code implementation
#define MAXVEX 20					// Maximum number of vertices
#define INFINITY 65535 				// Represents infinity 
typedef struct
{
	datatype data[MAXVEX];			// Vertex table 
	edgetype arc[MAXVEX][MAXVEX];	// Adjacency matrix storing edge weights 
	int numVertexes,numEdges;		// Number of vertices, number of edges or arcs 
}MGraph;// Graph structure adjacency matrix
int visited[MAXVEX]; // Access array

Adjacency List#

The adjacency list representation is a linked storage structure of graphs. The basic idea is to only store the information of the edges that exist in the graph.

#define MAX_VERTEX_NUM 20 /* Maximum number of vertices */
typedef enum{DG, DN, UDG, UDN} GraphKind; /* Types of graphs */
 
/* Define the type of the header node */
typedef struct VertexNode {
    VertexData data;    // Data field, stores vertex information
    ArcNode * firstarc; // Pointer field, points to the first node in the edge list
} VertexNode;
 
/* Define the type of edge nodes */
typedef struct ArcNode {
    AdjType adjvex;           // Adjacent point field, the index of the edge's endpoint in the vertex table
    OtherInfo info;           // Data field, stores other related information
    struct ArcNode * nextarc; // Pointer field, points to the next node in the edge list
} ArcNode;

Similarities and Differences Between Adjacency Lists and Adjacency Matrices#

Connection: Each linked list in the adjacency list corresponds to a row in the adjacency matrix, and the number of nodes in the linked list equals the number of non-zero elements in that row.
Difference: For any given undirected graph, the adjacency matrix is unique (the row and column numbers correspond to the vertex numbers), but the adjacency list is not unique (the linking order is unrelated to the vertex numbers).
Usage: Adjacency matrices are more commonly used for storing dense graphs, while adjacency lists are more commonly used for storing sparse graphs.

Knowledge Point 27 Graph Traversal#

Depth-First Search (DFS) of Graphs#

Method: Starting from a certain vertex V0, visit this vertex; then, sequentially start from the unvisited adjacent vertices of V0 to perform a depth-first traversal of the graph until all vertices connected to V0 have been visited; if there are still unvisited vertices in the graph, select another unvisited vertex in the graph as the starting point and repeat the above process until all vertices in the graph have been visited.

DFS means starting from the starting point, finding a direction until it cannot go further, then returning along the same path, and continuing to go until a walkable place is found.

The result of depth-first traversal is: A B E F C D G H I.

It must be noted that if no storage structure is specified, the result of depth-first traversal is not unique. Because which vertex is the first adjacent vertex is not determined. After specifying the storage structure, the result of depth-first traversal is unique.

Breadth-First Search (BFS) of Graphs#

Breadth-first search can be defined as follows: First, visit the starting point v, then sequentially visit all adjacent vertices w1, w2, ..., wt of v, and then sequentially visit all unvisited vertices adjacent to w1, w2, ..., wt. This continues until all vertices that are path-connected to the source point v have been visited. At this point, the search process starting from v ends.

The result of breadth-first traversal is: A B C D E F G H I.

Breadth-first traversal searches for the nearest nodes layer by layer, one layer at a time.

Knowledge Point 28 Applications of Graphs - Minimum Spanning Tree#

The properties of the Minimum Spanning Tree (MST) are as follows: If U is a non-empty subset of V, and (u0, v0) is an edge with the minimum weight, where u0 is in U and v0 is in V-U; then: (u0, v0) must be on the minimum spanning tree.

The most commonly used algorithms to find the MST are Kruskal's algorithm and Prim's algorithm.

Kruskal's algorithm characteristics: Merging edges, suitable for finding the minimum spanning tree of sparse networks.

Prim's algorithm characteristics: Merging vertices, independent of the number of edges, suitable for dense networks.

Prim's Algorithm (Adding Vertices One by One)#

Prim's algorithm is more suitable for dense graphs (fewer vertices, more edges).

Process diagram

Kruskal's Algorithm (Adding Edges One by One)#

The minimum spanning tree of a connected graph is not unique, but the sum of the weights on the minimum spanning tree is unique. You should be proficient in Prim's and Kruskal's algorithms, especially in manually simulating the generation process of the spanning tree step by step.

Knowledge Point 29 Applications of Graphs - Topological Sorting#

Topological sorting is a linear sequence of all vertices in a directed acyclic graph (directed graph, arcs do not form loops). In this linear sequence, each vertex in the graph appears only once, and if there exists a directed arc <v1, v2> from vertex A to vertex B, then vertex A must be before vertex B.

When accessing the vertices of the graph, ensure that there are no other vertices before the vertex being accessed (in-degree is 0), meaning the vertex being accessed is only the head of the arc. The prerequisite for satisfying topological sorting is that the graph has no cycles.

Select the node with in-degree 0, output A. Delete the edges AB and AC (B's in-degree becomes 1-1=0, C's in-degree becomes 2-1=1).
Select the node with in-degree 0, output B. Delete the edge BC and BE (C's in-degree becomes 1-1=0, E's in-degree becomes 1-1=0).
Select the node with in-degree 0, output C. Delete the edges starting from C (decreasing the in-degrees of the corresponding nodes).

Continue repeating to obtain the topological sorting ABCDEF (it can also be ABCEDF).

Note: The result of topological sorting is not unique (there can be multiple nodes with in-degree 0), so the question may require you to write out all topological sorts.

Knowledge Point 30 Applications of Graphs - Critical Path#

The critical path: If the vertices in a directed graph represent events and the directed edges represent the duration of activities, then the graph is an activity edge network, abbreviated as AOE network. The critical path in the AOE network is the longest path required to complete the entire network, which is the longest path from the starting vertex to the last vertex.

No further explanation, just list the earliest occurrence time of events ve (and maximum), and the latest occurrence time vl (minimum difference).

First write the topological sorting.

Positive topological sorting: v1 v2 v3 v5 v4 v6

Reverse topological sorting: V6 V5 V4 V2 V3 V1

Then calculate the vertex ve and vl.

ve is calculated using the positive topological sorting to sum the routes and maximum.

vl is calculated using the reverse topological sorting to sum the routes with minimum differences.

Event	v1	v2	v3	v4	v5	v6
ve(i)	0	3	2	6	6	8
vl(i)	0	4	2	6	7	8

Finally, calculate the edge (event) earliest start time of the tail vertex, which is the earliest start time of the activity e, and the latest start time of the head vertex of the edge (event) minus the weight, which is the latest start time of the activity (as long as it is the latest, it is calculated backwards), l time slack l-e.

Activity	a1	a3	a4	a5	a6	a7	a8
e(i)	0	3	3	2	2	6	6
l(i)	1	4	4	2	5	6	8-1
d(i)	1	1	1	0	3	0	1

The critical path is v1 v3 v4 v6.

A time slack of 0 indicates critical activities a2 a5 a7.

Knowledge Point 31 Applications of Graphs - Shortest Path#

Dijkstra's algorithm is a typical single-source shortest path algorithm used to calculate the shortest path from one node to all other nodes. The main characteristic is to expand layer by layer from the starting point until it reaches the endpoint. Dijkstra's algorithm is a representative shortest path algorithm.

Algorithm Idea:#

Let G=(V,E) be a weighted directed graph, dividing the vertex set V into two groups: the first group is the set of vertices for which the shortest path has been found (denoted as S, initially containing only the source point, and after each shortest path is found, it is added to the set S until all vertices are included in S, at which point the algorithm ends), and the second group is the remaining vertices for which the shortest path has not yet been determined (denoted as U). The vertices in the second group are added to S in increasing order of the shortest path length. During the addition process, the shortest path length from the source point v to each vertex in S must not exceed the shortest path length from the source point v to any vertex in U. In addition, each vertex corresponds to a distance; the distance of vertices in S is the shortest path length from v to that vertex, and the distance of vertices in U is the current shortest path length from v to that vertex, which only includes vertices in S as intermediate vertices.

Algorithm Steps:#

a. Initially, S contains only the source point, i.e., S={v}, and the distance of v is 0. U contains all other vertices except v. If there is an edge <u,v> in the graph, it normally has a weight; if u is not an adjacent vertex of v, then the weight of <u,v> is ∞.

b. Select a vertex k from U with the minimum distance to v, and add k to S (the selected distance is the shortest path length from v to k).

c. Using k as the new intermediate point, modify the distances of the vertices in U; if the distance from the source point v to vertex u (via vertex k) is shorter than the original distance (not via vertex k), then modify the distance value of vertex u, which is the modified distance value of vertex k plus the weight on the edge.

d. Repeat steps b and c until all vertices are included in S.

Solution Process:#

The single-source shortest path solving algorithm Dijkstra, with non-negative weights, see the video process link.

Starting Point	End Point	Path	Distance
u	a	u	2
	b	a	5
	c	u	1
	d	a	3
	v	d	5

According to the above table, the shortest path from u to v is u-a-d-v with the shortest distance of 5.

Selected Exercises from Super Star#

Stack and Queue Section#

When the maximum length of the stack is difficult to estimate, the stack is best implemented using ( linked ) storage structure.
The two storage structures commonly used for stack structures are sequential storage structure and linked storage structure.
Stack can be used as a data structure to implement recursive function calls.
The postfix expression of the expression a*(b+c)-d is (abc+*d-)

Note: The idea of converting expressions to prefix and postfix expressions:

Taking a+b-a*((c+d)/e-f)+g as an example

Postfix:

Parenthesis method (recommended):

① Add parentheses to all operation units according to the priority of operators.

Thus it becomes: (((a+b)-(a*(((c+d)/e)-f)))+g)

② Starting from the innermost parentheses, sequentially move the operators to the corresponding parentheses' back.

Thus it becomes: (((ab)+(a(((cd)+e)/f)-)*)-g)+

③ Finally, remove all parentheses.

Thus it becomes: ab+acd+e/f-*-g+

Algorithm based on stack:

Traverse from left to right.

If it is an operand, output directly.

If it is a left parenthesis '(', push it onto the stack directly (parentheses have the highest priority, no need to compare; after being pushed onto the stack, the priority drops to the lowest to ensure that other symbols are pushed onto the stack normally).

If it is a right parenthesis ')', it means the parentheses have ended. Continuously pop the top operator from the stack and output it until a left parenthesis is encountered; this left parenthesis is popped but not output.

If it is an operator ('+', '-', '*', '/'), there are three cases:

① If the stack is empty, push it directly onto the stack.

② If the top element of the stack is a left parenthesis '(', push it directly onto the stack.

③ If the top element of the stack is an operator, comparison is needed:

1- If the priority is greater than the top operator, push it onto the stack;

2- If the priority is less than or equal to the top operator, pop the top operator and output it, then compare the new top operator until the priority is greater than the top operator or the stack is empty, then push the operator onto the stack.

If the object is processed, pop and output all operators in the stack in order.

Prefix:

Parenthesis method (recommended):

① Add parentheses to all operation units according to the priority of operators.

Thus it becomes: (((a+b)-(a*(((c+d)/e)-f)))+g)

② Starting from the innermost parentheses, sequentially move the operators to the corresponding parentheses' front.

Thus it becomes: +(-(+(ab)*(a-(/(+(cd)e)f)))g)

③ Finally, remove all parentheses.

Thus it becomes: +-+ab*a-/+cdefg

Algorithm based on stack

Traverse from right to left.

If it is an operand, output directly.

If it is a right parenthesis ')', push it onto the stack directly (parentheses have the highest priority, no need to compare; after being pushed onto the stack, the priority drops to the lowest to ensure that other symbols are pushed onto the stack normally).

If it is a left parenthesis '(', it means the parentheses have ended. Continuously pop the top operator from the stack and output it until a right parenthesis is encountered; this right parenthesis is popped but not output.

If it is an operator ('+', '-', '*', '/'), there are three cases:

① If the stack is empty, push it directly onto the stack.

② If the top element of the stack is a right parenthesis ')', push it directly onto the stack.

③ If the top element of the stack is an operator, comparison is needed:

1- If the priority is greater than or equal to the top operator, push it onto the stack;

2- If the priority is less than the top operator, pop the top operator and output it, then compare the new top operator until the priority is greater than or equal to the top operator or the stack is empty, then push the operator onto the stack.

If the object is processed, pop and output all operators in the stack in order.

In a sequential storage circular queue sq, assuming front and rear are the front pointer and rear pointer respectively, the enqueue operation is ( )

A、sq.rear=sq.rear+1;sq.data[sq.rear]=x;
B、sq.data[sq.rear]=x;sq.rear=sq.rear+1;
C、sq.rear=(sq.rear+1)%maxsize; sq.data[sq.rear]=x;
D、sq.data[sq.rear]=x;sq.rear=(sq.rear+1)%maxsize;

In a sequential storage circular queue with m units, assuming front and rear are the front pointer and rear pointer respectively, the condition for judging the queue is full is ( )

A、rear % m==front
B、(front+1) % m==rear
C、(rear－1)%m==front
D、(rear+1) % m==front

The condition for judging a circular queue is that the difference between rear and front is one element, rear+1=front. This is to avoid ambiguity caused by the circular queue structure, where the conditions for judging empty and full are the same (rear = front), intentionally leaving one element's space to change the condition for judging fullness.

In a sequential storage circular queue with m units, assuming front and rear are the front pointer and rear pointer respectively, the condition for judging the queue is empty is ( )

A、rear ==front
B、(front+2) % m==rear
C、(rear－2)%m==front
D、(rear+2) % m==front

Compared with sequential stacks, linked stacks have a significant advantage, which is ( )

A、Insertion operation is more convenient
B, Typically does not encounter stack full situations
C、Will not encounter stack empty situations
D、Deletion operation is more convenient

In a queue represented by a linked list, when performing a deletion operation ( )

A、Only modify the head pointer
B、Only modify the tail pointer
C、Both head and tail pointers need to be modified
D、Both head and tail pointers may need to be modified

When there is only one element, deletion requires modifying both.

The head of a queue represented by a singly linked list is at the ( ) position of the list.

A、Head of the list
B、Tail of the list
C、Middle of the list
D、Any position

Other Selected Exercises#

Stack S can hold a maximum of 4 elements, and now there are six elements ABCDEF in order. Which of the following sequences could be the output sequence (C)?

A) EDCBAF B) BCEFAD C) CBEDAF D) ADFEBC

A full binary tree of height h has at least 16 nodes.
If the pre-order and post-order traversals of a non-empty binary tree are opposite, then (C)

A) Any node of the binary tree has no left child

B) Any node of the binary tree has no right child

C) The binary tree has only one leaf node

D) The binary tree is any binary tree

In a two-dimensional array A[20][10] using row order as the main order for storage, with each element occupying two bytes, if the address of A[10][5] is 1000, then the address of A[18][9] is 1168.

According to the two-dimensional array formula mentioned earlier, loc=1000+(8*10+9-5)*2=1168.

In a singly linked list, to insert the node pointed to by p after the node pointed to by s, you should execute, (s->next=p->next) (p->next=s).
Given the pre-order and in-order sequences of a binary tree are ABCDEFGHIJ and BDCEAHGIJF respectively, the post-order traversal of this binary tree is (DECBHIJGFA).

Directly using clever techniques, we can use lists or drawing methods to vertically arrange the pre-order sequence and horizontally arrange the in-order sequence. The positions where the two sequences overlap indicate the positions of the nodes in the tree, with the most central node being the root node. Then, the nodes closer to the root are connected first, and the remaining nodes are connected. The actual binary tree is shown in the diagram below.

A					A
B	B
C			C
D		D
E				E
F										F
G							G
H						H
I								I
J									J
	B	D	C	E	A	H	G	I	J	F

For post-order traversal from the in-order traversal, we need to write the left side of the above table's post-order sequence in reverse (to ensure the root node is at the top), and then similarly obtain the result.

Given leaf node values 2, 3, 5, 6, 9, 11, construct a Huffman tree and its weighted path length.

Solution: Construct the Huffman tree by taking values from small to large.

The weighted path length is wpl=(2+3)*4+5*3+(11+6+9)*2=87.

Note: When constructing the Huffman tree, it is important to ensure that the weights of the previous level are always greater than those of the lower level, and to use all node values. Additionally, Huffman trees are not unique, but the WPL obtained from different Huffman trees is equal.

Convert the arithmetic expression ((a+b)+c*(d+e)+f)*(g+h) into a binary tree and write its postfix expression.

Solution: Converting to a binary tree means treating operators as root nodes and combining values as leaf nodes according to the order of operations.

The postfix expression using the previously mentioned parenthesis method is ab+cde+*+f+gh+*.

Draw the graph corresponding to the following adjacency matrix and write the corresponding adjacency list.

A=\begin{Bmatrix} 0&1 &0 & 0 & 1\\ 0 & 0& 4& 0 & 0\\ 0&0 & 0&6 &0 \\ 0 & 3& 0&0 & 8\\ 5& 0 & 0 &7 &0 \end{Bmatrix}

Convert the following forest into a binary tree.

According to the previously written add lines, remove lines, and rotate, we can obtain the binary tree as follows:

Using Prim's algorithm, the minimum spanning tree of the following graph will be obtained (write the steps U to represent the selected vertex set, U-V to represent the unselected vertex set).

U	U-V	1	2	3	4	5	6
{1}	{2，3，4，5，6}	/	(1,2) 16	inf	(1,4) 21	(1,5) 19	inf
{1,2}	{3,4,5,6}	/	/	(2,3) 5	(2,4) 11	(1,5) 19	(2,6) 6
{1,2,3}	{4,5,6}	/	/	/	(2,4) 11	(1,5) 19	(2,6) 6
{1,2,3,6}	{4,5}	/	/	/	(2,4) 11	(5,6) 18	/
{1,2,3,4,6}	{5}	/	/	/	/	(5,6) 18	/

The resulting minimum spanning tree is as follows:

The logical structure of data is divided into set structure, linear structure, tree structure, and network structure.
A complete binary tree of height 5 has at least 16 nodes.
The number of elements in a circular queue is (rear - front + MaxQSize) % MaxQSize.
In a two-dimensional array A[1…5,1…6], each element occupies four storage units. If the address of A[1,1] is 1024, and if stored in row order as the main order, the storage address of element A[3,2] is (1076); if stored in column order as the main order, the storage address of A[3,2] is (1052).
Two methods to determine whether a directed graph has a cycle are (Depth-First Search Algorithm) and (Topological Sorting).
A connected undirected graph with n vertices has at least n-1 edges and at most n(n-1)/2 edges. A connected directed graph with n vertices has at least n edges and at most n(n-1) edges.
Starting from vertex A, the sequences of vertices visited in breadth-first search and depth-first search are as follows:

clip_image002

The depth-first traversal sequence is ABDHECFG.

The breadth-first traversal sequence is ABCDEFGH.

For the AOV network of professional courses, please arrange the order in which students should learn the professional courses, i.e., write any topological order sequence of this directed graph.

clip_image002-16761057320082

A possible topological sequence is A, B, C, D, E, G, I, J, K, F, L, H (there are many possible answers, not unique).

Appendix#

This section contains various code implementations of data structures from the textbook, all written in C language using pointers, without using C++ reference parameters, and are written and debugged by me during the experimental class, for reference only.

Sequential List with Dynamic Space Allocation#

#include <stdio.h>
#include <malloc.h>
#define LIST_INIT_SIZE 100
#define LISTINCREMENT 10
#define OVERFLOW -2  
#define OK 1
#define ERROR 0
#define TRUE 1
#define FALSE 0
/*
Dynamic linear list storage structure
*/
typedef struct{
	int* elem;
	int length;
	int listsize;
}Sqlist;
/*
Initialization function
*/
int InitList_Sq(Sqlist *L){
	L->elem=(int*) malloc(LIST_INIT_SIZE*sizeof(int));
	if (!(*L).elem)
		return (OVERFLOW);
	L->length=0;
	L->listsize=LIST_INIT_SIZE;
	printf("L.length=%d\tL.listsize=%d\n",L->length,L->listsize);
	return OK;
};
/*
Deletion function
*/
int ListDElete_Sq(Sqlist *L,int i,int* e){
	int* p;
	int* q;
	// Check if i is valid and will