This article is aimed at people who understand the meaning of the terms “cycle”, “dataset” and other basic aspects of any popular programming language.
So, what the hell are the data structures?
A data structure is a storage format that simplifies data processing. To put it simply, if you have a certain data set, to simplify its processing it can be structured so that searching, editing, increasing or deleting it was effective in terms of memory and time consumption. We’ll consider the most common types of data structures and will study popular variants of them.
Now, let’s take a look at “the simpliest” of data structures — list.
List (list) — is a set of same-type data (not always) that follow each other in a chain. Let’s look at one-way and two-way linked lists.
Each element of the list consists of data, next element pointer, and, in case of a two-way list, previous element pointer.
If we are at the start of a list, the first element gets a null link to the previous one. Vice versa with the last element – the next element pointer is marked as null.
Thus, regardless of element count in the list, they can all be crawled one-by-one.
If the last element gets the first element pointer as the next one (or the first gets the last, depending in the direction, or both in a two-way list), this type of list is called the ring. As an example, we can create a ring with days of the week and cycle it to generate a calendar.
A stack is a type of list in which data access operations can be described as FILO (First-In-Last-Out).
It is the list we can insert elements in or extract them. To understand how the stack works, just imagine yourself picking up books at the library: you find book 1 on the shelf, take it, then you find book 2 and put it on the first one, then repeat for book 3, etc.
After that, when you put a stack of books on the table, you pick up the book on top (which is book 3), then the second-to-last, then the one under it and so on, until you reach book 1.
A queue is a type of list in which data access operations can be described as FIFO (First-In-First-Out). It’s even easier to imagine because it’s exactly like a shop line. You come to the cashier, and three people are already waiting there. You take the 4th position and can pay for your goods only when the previous 3 elements are “processed”.
Now, let’s make a quick dive into the graph theory.
A graph is a data structure consisting of vertices and edges (links between vertices). Graphs are used not only for data storage but also for indicating dependence between them.
Graph vertices are usually marked as v1, v2, v3…, and edges are named as couples of indexes they connect (1,2), (2,3). и т. д. The edges connecting the same vertices are called loops. Graphs can be oriented and not oriented. The edges of oriented graphs point at a certain direction.
To store the graph one can use a list to for vertices’ data storage, and two-dimensional array for storing links between them. There are two types of such arrays: adjacency matrix and incidence matrix.
An adjacency matrix is a square matrix used to represent a finite graph. The elements of the matrix indicate whether pairs of vertices are adjacent or not in the graph.
In the special case of a finite simple graph, the adjacency matrix is a (0,1)-matrix with zeros on its diagonal. If the graph is undirected, the adjacency matrix is symmetric.
The relationship between a graph and the eigenvalues and eigenvectors of its adjacency matrix is studied in spectral graph theory. Incidence matrix is a matrix that shows the relationship between two classes of objects. If the first class is X and the second is Y, the matrix has one row for each element of X and one column for each element of Y.
The entry in row x and column y is 1 if x and y are related (called incident in this context) and 0 if they are not. It’s worth noting that the adjacency matrix is more memory efficient.
Adjacency matrix “list of lists”, where for each vertice in the list a separate list of adjacent vertices is created. The length of the list is accordingly predetermined by the number of vertices and edges. Even more optimized is a list of edges, where each element referring to the graph edge receives data which vertices are connected by this edge.
A graph is called connected when there is a path between every pair of vertices.
Path in a graph is a finite or infinite sequence of edges which connect a sequence of vertices which, by most definitions, are all distinct from one another. In a directed graph, a directed path is again a sequence of edges (or arcs) which connect a sequence of vertices, but with the added restriction that the edges all be directed in the same direction.
Eulerian trail (or Eulerian path) is a trail in a finite graph which visits every edge exactly once.
Hamiltonian path (or traceable path) is a path in an undirected or directed graph that visits each vertex exactly once. If a graph is connected and has no cycles, it’s called a tree. A tree is one of the most common types of graphs. Using this structure, we can present, for instance, catalogue categories of e-shop: “Categories” element, its subcategories “clothes”, “shoes”, “accessories”, which have their own sub-elements: for “clothes” it’s going to be “men’s clothes”, “women’s clothes”, “clothes for kids”. Visually it can be presented as:
Using JSON format helps implement the “descendants” as a list, the link to which is preserved in the “ancestor”. If a tree vertice has no “descendants”, it’s called a leaf.
One of the most interesting tree variants is a binary search tree.
To implement it 3 conditions must be met:
- Both sub-trees have to be binary trees
- All the left sub-tree keys have to be less than the root key
- All the right sub-tree keys have to be greater than or equal to the root key
Let’s assume we need to find an element with key 4. Starting from the root we crawl the vertices and check whether the key value is greater, less or equal to the value we’re looking for. In this particular example to find 4 we’ll have to take the route 8 → 3 → 6 → 4.
In conclusion, we have learned 2 basic data structures, which also happen to be the most popular ones — lists and trees. Also, as a bonus, we looked at the basics of graph theory to understand the general concept of using graphs.
Lists can be used in many situations where linear data storage is used, while many list variants provide a need for them. Trees are a more complicated and specific structure that can vary extremely from project to project. Their use is, accordingly, extremely variable.
Of course, that’s not everything you need know about data structures, but describing all at once is fairly impossible. So go ahead and keep studying!