COMP324 (Complex Information Networks)
Preferential Attachment
Preferential Attachment Method
- Start with any type of network.
-
Additinonal nodes get added one at a time.
Each node in the network (except the existing nodes) have $c$ directed links out of them.
-
Each link is directed to a randomly chosen, previously generated, node.
The chance that a node with in-degree $d$ is the end-point of a directed link is:
\[\frac{d+a}{\sum(d+a)}\]
This produces the result that old nodes become more popular as they have a greater preferential attachment.
Preferential Attachment Example
Consider we start with a network with a single node:
graph LR
1
We can then continue the example by calculating the preferential attachment in a table, where $a=0.5$:
Iteration |
1 |
2 |
3 |
4 |
1 |
$\frac11$ |
|
|
|
2 |
$\frac34$ |
$\frac{1}{4}$ |
|
|
3 |
$\frac{1.5}{3.5}$ |
$\frac{1.5}{3.5}$ |
$\frac{.5}{3.5}$ |
|
4 |
$\frac{1.5}{5}$ |
$\frac{1.5}{5}$ |
$\frac{1.5}{5}$ |
$\frac{.5}{5}$ |
flowchart LR
2 --> 1
3 --> 2
4 --> 2
Additional nodes are chosen by random number, weighted by the distribution of preferential attachment in the table.
Given a collection of $n$ points in space, its Voronoi diagram is a way of diving space in to $n$ convex regions, or Voronoi cells.
We generally have the following distances to work with:
-
Euclidean:
\[\text{dist}_2(X,Y)=\sqrt{(x_1-y_1)^2+(x_2-y_2)^2}\]
-
Manhattan:
\[\text{dist}_1(X,Y)=\mid x_1-y_1\mid +\mid x_2-y_2\mid\]
-
Maximum:
\[\text{dist}_{\infty}(X,Y)=\max(\mid x_1-y_1\mid,\mid x_2-y_2\mid)\]
Pen & Paper Method
PaperAndPencilVoronoi($P_1,\ldots,P_n$)
for each i in {1, ..., n} do
define cell(P_i) by intersecting the n - 1 bisector
half-places h_i(P_j)
end for
This method is tedious to draw by hand. There are alternative methods that can be computed.
Delaunay Triangulations
- Connect the points on the convex hell in a cycle.
-
Iteratively remove any cycle of length at least four inside the convex hull by adding arbitrary lines connecting pairs of vertices.
Leave the convex hull untouched and keep it planar.
-
A pair of triangles sharing an edge is bad if the sum of the angles $\alpha$ and $\gamma$ is larger than 180 degrees.
Remove all the bad pairs of triangles by replacing the diagonal $BD$ by $AC$.
Approximate
- Given the $n$ sites $S_1,\ldots,S_n$, in a rectangular region $R$, repeatedly sample points $P$ in $R$.
- Compute the distances $\text{dist}(P,S_i)$ and assign $P$ to the cell of minimum distance.
We can also complete the same thing using the following simpler algorithm:
- Superimpose a grid on $R$.
- Let $P$ denotes the centre of an arbitrary grid square.
- Assign $P$ to the cell of minimum distance.
Non-Network Measures
Convex Hull
Calculate the total area of the region delimited by the outermost nodes.
This can be calculated by triangulating and computing the sum of the area of all the triangles in the network.
Given three side lengths $a,b$ and $c$ the area of a triangle can be calculated by:
\[A_\Delta = \sqrt{\frac p2\left(\frac p2-a\right)\left(\frac p2-b\right)\left(\frac p2-c\right)}\]
Therefore the area of the convex hull is the sum of all these triangles:
\[A_{\text{CH}}=\sum_\Delta A_\Delta\]
Distance
We can use distance measures (using either Manhattan distance or euclidean distance) to measure on a grid:
Network Measures
Occupancy
- Larger occupancy (more area taken up) = more numerous.
- Smaller occupancy (less area taken up) = less numerous.
Setting Up a Graph Geometrically
Given a set of points in an array, you can convert this array into a network by creating links between nodes based on a measure. Generally we can say, create a link between node $u$ and $v$ if:
\[\text{dist}(u,v)\leq s\]
Sparse Random Networks
The binomial model can be tweaked to generate sparse networks $\mathcal G_{n,\frac cn}$:
- By changing the linking probability we bias the seelction process.
-
In $\mathcal G_{n,p}$ the probability of picking a network with $m$ lines is:
\[{N\choose m}p^m(1-p)^{N-m}\]
where $N={n\choose2}$.
When $p$ is small, this expression gives a higher change to graph with few links.
Random Networks with Arbitrary Degrees
Take the following method:
- Assign each node the desired number of degree stubs.
- For each node’s stub, assign an arc to another node’s stub using a random number.
Binomial Model
A random binomial model on $n$ nodes, denoted by the expression $\mathcal G_{n,\frac12}$, is a network on $n$ nodes in which each line exists, independently of all other lines with probability of $\frac12$.
To generate one of these networks, we randomly fill an adjacency matrix with 1’s and 0’s with equal distribution.
Counting Binomial Networks
In a binomial network there are $n\choose 2$ values in the adjacency matrix (for an undirected graph). Therefore there are:
- $2^{n\choose 2}$ possible network combinations for this network type.
Properties of $\mathcal G_{n,p}$
We can generalise $\mathcal G_{n,\frac12}$ to have any probability $p$ of a particular arc being filled.
The generation of a uniform random network on $n$ nodes is equivalent to a random experiment in which we independently toss exactly ${n\choose2}=\frac{n(n-1)}{2}$ fair coins. Each coin toss corresponds to the decision about one of the $n\choose 2$ possible links in the network.
Each coin toss is called a Bernoulli trial and counting heads (or tails) in a finite sequence of Bernoulli trials gives raise to the well studies binomial probability distribution.
Vertex Degrees
The degree of any vertex in a random binomial model on $n$ nodes is equivalent to a random variable counting the number heads in $n-1$ independent coin tosses:
- The vertex degree has binomial distribution and its expected value is $\frac{n-1}2$.
- The expected value in the general $\mathcal G_{n,p}$ is $p(n-1)$.
Expected Degree Distribution
In $\mathcal G_{n,p}$, the expected value of $f_d$ is:
\[\text{Pr}[\deg(v)=d]=\binom{n-1}{d}p^d(1-p)^{n-1-d}\]
as $\deg(v)$ is a binomial random variable.
Number of Links
The number of links in a random binomial model on $n$ nodes is a random variable counting the number of heads in $n\choose 2$ independent coin tosses:
- The number of links has binomial distribution and its expected value is $\frac12{n\choose2}$.
- The expected value in the general $\mathcal G_{n,p}$ is $p{n\choose2}$.
Diameter
The diameter is generally 2 for large values of $n$.
Acquaintances
It is possible to derive acquaintances by analysis of the relation between people and events.
If two people attend the same event it is likely that they know each-other.
We can use this information to create a new graph where individuals have connections when they attend a shared event.
Small Words
It is claimed that society can be described by a network characterised by small shortest path lengths.
We can conduct an experiment where we send a message. Each person involved must:
- Direct the message to the target if they know them personally.
- Direct the message to a friend they perceive to be closer to the target.
From experiments like this we know society is highly connected (with average $k=5.5$), however, social searching is slow, expensive and prone to failure.
To avoid failure:
- Motivation matters more than the access of an individual.
- Ease of access is also very important as the motivation required is reduced.
Geweke Diagnostic
The Deweke diagnostic can quantify how representative of a graph a random walk is:
\[z=\frac{\bar X_a-\bar X_b}{\sqrt{\text{Var}X_a+\text{Var}X_b}}\]
where:
- $X_a$ is the first 10% of samples.
- $X_b$ is the last 50% of samples.
An ideal random walk would have the difference of averages be 0 (hence $z=0$) and variance of 1.
Degree Distribution
Let $f_d$ denote the fraction of vertices of degree $d$ in a given network (where the total number of vertices are $n\times f_d$).
- Degree Distribution
- The sequence of $(f_0, f_1, \ldots f_{n-1})$ of a given network.
This is an array of values, not a scalar.
Therefore, given a network with $n=20$, $f_{12}$ means that there is one vertex with degree 12:
\[n\times f_{12}=20\times0.05=1\]
Power Laws
We may believe that there is a linear relationship between the log of $f_d$ and the value of $d$ for a given network:
\[\log f_d=-\alpha\times\log d+c\]
this implies:
\[10^{\log f_d}=10^{-\alpha\times\log d}\times 10^c\]
and therefore:
\[f_d=\frac{10^c}{d^\alpha}\]
Mathematical laws stating that a function $f(x0$ is proportional to $x^{-\alpha}$ (for some positive constant $\alpha$) are called power laws.
Model Fitting
We should be careful when noticing linear trends in data we should say:
The data we have fits (with a certain confidence) a power law.
Power Law Detection
We may want to check if a set of data fits a power-law distribution.
Power Law Detection General Method
- Find the degree distribution.
- Plot the points $(\log d+1, \log f_d)$ having $f_d\neq 0$.
- Use statistical methods to asses the statement: “the data at hand shows a degree distribution that obeys a power-law”.
Power Law Detection with Linear Regression
In general data is given as array of values $X$ and $Y$:
- This is usually points $(X_i, Y_i)$ which equals $(\log(i+1),f_i)$ for all $i$ where the given network has at least one vertex of degree $i$.
Find the line of best fit in the form:
\[y = c_1x+c_0\]
using:
\[\begin{aligned}
c_1&=\frac{\sum X_iY_i-(\sum X_i\sum Y_i)/N}{\sum X_i^2-(\sum X_i)^2/N}\\
c_0&=\frac1N\left(\sum Y_i-c_1\sum X_i\right)
\end{aligned}\]
This line minimises error which is calculated by:
\[E=\sum(Y_i-y_i)^2\]
Small Worlds
It is often the case than you can get from one node to another in a small number of steps.
We can use the diameter as an indicator of how many steps it will take to move information from one node to another.
Average Distances
The closeness centrality of a node $v$ is:
\[\ell_v=\frac1n\sum_{u\in V}\text{dist}(v,u)\]
where $\text{dist}(v,u)$ is the length of a shortest path from $v$ to $u$.
Then we can consider the average value of this measure to be:
\[\ell=\frac1n\sum_{v\in V}\ell_v\]
In real complex networks $\ell$ is typically small.
Average Distance Examples
Consider the following examples:
Network |
$n$ |
$m$ |
$\ell$ |
Film Actors |
449,913 |
25,516,482 |
3.48 |
Company Directors |
7,673 |
55,392 |
4.6 |
WWW |
269,504 |
1,497,135 |
11.27 |
Many of these networks are not connected. Distances between nodes in different components are set to zero. Averages are computed over existing links only.
Path Algorithms
A-Star
I couldn’t quite figure this out but I believe we only need the Floyd-Warshall algorithm for the exam.
Floyd-Warshall Algorithm
To compute all values of $\ell_v$ (and hence $\ell$) we can use the following method:
let dist be a |V| x |V| array of minimum distances
for each vertex v
dist[v][v] = 0
for each (u, v)
if there is an edge (u, v)
dist[u][v] = w(u, v) // the weight of the edge (u, v)
else
dist[u][v] = infty
for k from 1 to |V|
for u from 1 to |V|
for v from 1 to |V|
if dist[u][v] > dist[u][k] + dist[k][v]
dist[u][v] = dist[u][k] + dist[k][v]
end if
Floyd-Warshall Algorithm Example
Consider that we have the following graph:
graph LR
1 --- 2
2 --- 3
3 --- 4
4 --- 1
-
We create an initial table:
$v$ |
1 |
2 |
3 |
4 |
1 |
0 |
1 |
$\infty$ |
1 |
2 |
1 |
0 |
1 |
$\infty$ |
3 |
$\infty$ |
1 |
0 |
1 |
4 |
1 |
$\infty$ |
1 |
0 |
-
Then complete the loop for $k=1$:
$v$ |
1 |
2 |
3 |
4 |
1 |
0 |
1 |
$\infty$ |
1 |
2 |
1 |
0 |
1 |
2 |
3 |
$\infty$ |
1 |
0 |
1 |
4 |
1 |
2 |
1 |
0 |
-
And finally, in this case, for $k=2$:
$v$ |
1 |
2 |
3 |
4 |
1 |
0 |
1 |
2 |
1 |
2 |
1 |
0 |
1 |
2 |
3 |
2 |
1 |
0 |
1 |
4 |
1 |
2 |
1 |
0 |
The maximum value in this table is 2, this is the diameter.
-
To calculate $\ell_v$ and $\ell$ we sum and average:
$v$ |
1 |
2 |
3 |
4 |
$\ell_v$ |
1 |
0 |
1 |
2 |
1 |
1 |
2 |
1 |
0 |
1 |
2 |
1 |
3 |
2 |
1 |
0 |
1 |
1 |
4 |
1 |
2 |
1 |
0 |
1 |
\[\ell = 1\]
Components in Undirected Networks
Most networks are formed by a single giant (a component connecting most of the nodes) and a number of dwarves.
The internet has a single giant encompassing the whole network. This is required for the internet to function as the pages should be interconnected.
Generally you can’t have two giants in a network as adding a new node tends to join components rather than keep them apart.
Components in Directed Networks
graph LR
subgraph 1
a
b
e
end
subgraph 2
f
g
end
subgraph 3
c
d
h
end
a --> b
b --> e
b --> c
b --> f
e --> f
e --> a
f --> g
g --> f
c --> g
c --> d
d --> c
d --> h
h --> d
h --> g
A directed network is called strongly connected if there is a directed path from each node to every other node.
Therefore, for every $u$ and $v$ there is a path from $u$ to $v$ and another from $v$ to $u$.
The strongly connected components of a directed network $G$ are its largest strongly connected sub-networks in $G$.
According to Harary (1969) a directed network $G$ can be:
- Disconnected
- If there is at least two nodes in $G$ that are not connected by any path ever disregarding directions.
- Weakly Connected
- If every two nodes in $G$ are connected by a path, disregarding directions.
- Unilaterally Connected
- If every two nodes $u$ and $v$ in $G$ are connected by ate least one directed path, starting at either $u$ or $v$.
- Strongly Connected
- A directed network that has a directed path from each node to every other node.
Kosaraju’s Algorithm
The strongly connected components of a directed network can be found using Kosaraju’s algorithm:
set the stack S to be empty
while S does not contain all nodes do
choose an arbitrary node v not in S
perform a DFS starting at v
each time the search finishes expanding u, push u onto S
end while
reverse the direction of all links
while S is not empty do
pop v from S
perform a DFS starting at v
/* all visited vertices are in the same strong connected component as v */
remove all these from the network and the stack
end while
These are methods for finding important groups of nodes rather than individual nodes.
Dense Subgraphs
It may be useful to obtain a dense part of the network rather than a set of nodes with high degree.
$k$-kernel
The $k$-kernel of a network is the largest induced sub-network with no node of degree less than $k$.
For example, given the following network:
graph TD
1 --- 12
1 --- 14
1 --- 15
1 --- 17
1 --- 23
2 --- 10
3 --- 6
3 --- 14
3 --- 23
4 --- 12
5 --- 10
6 --- 10
6 --- 11
6 --- 23
7 --- 10
9 --- 10
11 --- 14
12 --- 15
12 --- 17
15 --- 17
17 --- 23
subgraph k=3
1
12
15
17
end
We can find this by iterativley completing the following:
- Search the graph for nodes that have less than $n$ neighbours.
- Remove them from a graph and treat the result as a new graph.
- Repeat again with the new graph until only the induce sub-network remains.
Cliques
- Clique
- A network in which every pair of nodes is joined by a link.
- Maximal Clique
- In a network $G$, a subnetwork of $G$ that is a clique and that is not part of a larger clique in $G$.
If a clique has more than two members, like ${5,6,7,8}$ we view this as a fully connected graph between all nodes.
Clique Listing
Using the Bron-Kerbosch algorithm we can get the size distribution of the largest cliques:
BK(R, P, X)
if P is empty and X is empty then
return R
else
// assume that P={u_1, ..., u_k}
for i = 1 to k do
remove u_i from P
call BK recursively on R*, P* and X* as defined:
R* = R ∪ {u_i}
P* = P ∩ N(u_i)
X* = X ∩ N(u_i)
add u_i to X
end for
end if
where the inputs are:
- R - Current clique we are investigating
- P - Vertices that can be added
- X - Vertices that cannot be added
A reminder that the function $N(u_i)$ gives all the neighbours of $u_i$.
To start we give then input:
\[\text{BK}(\emptyset, \{\text{all nodes}\},\emptyset )\]
Homophily
- Homophily
- The tenancy to associate with others whom you perceive to be similar to yourself.
We are able to classify nodes in a graph given a non-trivial classification:
\[\mathcal C = \{C_i:1\leq i \leq h\}\]
Measuring Homophily
The modularity of a network (with respect to the given partition $\mathcal C$ of its vertices) is defined as:
\[Q=\frac1{2m}\sum_{i,j}\left(A_{i,j}-\frac{\deg i\cdot\deg j}{2m}\right)\delta\left(c(i),c(j)\right)\]
additionally let:
\[Q_\max=1-\frac1{2m}\sum_{i,j}\frac{\deg i\cdot\deg j}{2m}\delta\left(c(i),c(j)\right)\]
The assortativity coefficient is defined as the ratio:
\[\frac Q{Q_\max}\]
where:
- $c(i)$ - the class of the node $i$
- $\delta(x, y)$ returns one if $x=y$ and zero otherwise.
We can calculate the assortivity coefficient using only the network links and node degrees by using the following formula:
\[\frac Q{Q_\max}=\frac{2m\cdot2\cdot\#(\text{edges inside classes})-\sum_C\left(\sum_{i\in C}\deg i\right)^2}{2m\cdot2m-\sum_C\left(\sum_{i\in C}\deg i\right)^2}\]
where:
- $i\in C$ - the node $i$ is in the class $C$ ($c(i)=C$).
- \(\#(\text{edges inside classes})\) - The number of edges connecting nodes of the same class.
- $m$ - the total number of arcs.
For the summing components, we sum for all classes. In each class, we calculate the squared sum of the degrees of each vertex.
One task with a network is to find which nodes are important. We can do this in several ways.
Degree Centrality
The simplest way to measure the importance of a node $v$ is to measure its degree $\deg v$.
Nodes with higher degree are more important. An example would be author’s citing papers:
- Generally the more people cite a paper, the more important the paper is.
We can also look at the example of friendship:
- A high in-degree denotes a very popular person.
- A high out-degree donates a very outgoing person.
This method is prone to spamming, like with initial versions of Google’s PageRank.
Local Clustering
Instead of looking for highly connected nodes we can look for nodes that control the flow of information around them. Consider the following measure:
\[\text{lc}_v=\frac{e(N(v))}{\deg v\choose 2}\]
Note that $\deg v \choose 2$ is $\deg v$ choose $2$.
where:
- $\text{lc}_v$ - is the local clustering coefficient of the node $v$.
- $N(v)$ - the set of neighbours of the node $v$.
- $e(N(v))$ - the number of connections between pairs of elements in $N(v)$.
The definition can be extended to the case of nodes of degree less than two by setting $\text{lc}_v=0$.
It is defined as the ratio between the number of pairs of neighbours of $v$ that are connected and the total possible number of such connections.
For this to work we assume that there are no loops or parallel links between nodes.
Local Clustering Example
Given the following graph:
graph TD
1 --- 6
1 --- 7
1 --- 9
2 --- 6
2 --- 10
3 --- 5
3 --- 7
4 --- 5
4 --- 6
4 --- 8
4 --- 9
4 --- 10
5 --- 6
5 --- 7
5 --- 10
6 --- 7
6 --- 9
7 --- 8
7 --- 10
8 --- 10
subgraph neighbours
9
7
6
end
To calculate $\text{lc}_1$ we look at the neighbours of 1 which are ${9, 6, 7}$:
- There are three neighbours $\deg v=3$.
- There are two internal connections between neighbours $e(N(v))=2$.
therefore:
\[\text{lc}_1=\frac23\]
Control
Global Clustering
You can calculate the clustering property of a whole network using the following measure:
\[\text{lc}=\frac1n\sum_v\text{lc}_v\]
Closeness Centrality
A vertex might be considered important if it’s got a small mean distance to any other vertex in the network.
The closeness centrality of a node $v$ is defined in a network $G$ having a node set $V$ and link set $E$ in terms of the following quantity:
\[\ell_v=\frac1n\sum_{u\in V}\text{dist}(v,u)\]
where:
- $\text{dist}(u,v)$ - is the length of a shortest path between $v$ and $u$.
This measures the average distance from $v$ as a global property.
For nodes within the same graph the values of $\ell_v$ are very close to each-other. You can scale the output by excluding the distance of a vertex from itself from the average:
\[\ell'_v=\frac1{n-1}\sum_{u\in V}\text{dist}(v,u)\]
From this we can calculate the closeness centrality:
\[C_v\frac1{\ell_v}\]
If two nodes $u$ and $v$ are not connected (as they lay in different components) then:
\[\frac1{\text{dist}(v,u)}=0\]
Degrees in a Network
The degree of a node $v$ in a network $G$ is the number of links attaches to that node.
This is denoted by:
\[\deg_Gv\]
We can omit the subscript $G$ when the graph is clear from the context.
In general the sum of degrees of the nodes of a network equals twice the number of links $m$ in that network:
\[\sum_v\deg v=2m\]
Average Degree
The average degree of a network $G$ is calculated as expected:
\[\bar d_G=\frac{\sum_v\deg_Gv}n\]
The average degree depends only on the number of links and nodes in the network:
\[\bar d_G=\frac{2m}n\]
The following networks have the same average degree:
graph LR
subgraph A
a --> b
a --> c
a --> d
a --> e
a --> f
a --> g
a --> h
end
subgraph B
direction LR
1 --> 2 --> 3 --> 4 --> 5 --> 6 --> 7 --> 8
end
Network Density
The density of a network is a number between zero and one.
The density of a network $G$ is the quantity:
\[\bar e=\frac m{\frac n2}\]
A normalised average degree is:
\[\bar e =\frac{\bar d_g}{n-1}\]
Sparse & Dense Networks
Informally we say:
- A network is sparse if it contains a number of links not much larger than its number of nodes.
- A network is dense if it contains a number of links close to the square of its nodes.
Given we have a network with many nodes:
A network $G$ on $n$ nodes is $c$-sparse if there exists a positive constant $c$ smaller than:
\[\frac{n-1}4\]
such that $G$ has at most $cn$ links.
A network is $c$-dense if there exists a positive $c$ such that $G$ has at least $c\times {n\choose 2}$.
This is $n$ choose 2.
Directed Graph Degree
For directed graphs you can also consider:
Connected Networks
A network is connected if either of the following are true:
- Every pair of nodes are connected by a path.
- You can walk along the links from any node to any other node.
If both of these are false then:
The network is split into a number of connected components.
Diameter
The diameter of a network is the longest distance between two nodes of the network. The diameter of the following network is five:
graph LR
a --- b
a --- c
a --- d
c --- b
d --- b
c --- e
d --- e
e --- f
f --- g
g --- h
Random Walks
Random walks provide an alternative to BFS and DFS. The expected exploration is:
\[O(n^3)\]
It can be proved that, in the long run, the sample tends to favour nodes with many links. The frequency of a node in the walk of degree $d$ tends towards:
\[\frac d{2m}\]
The methods is as follows:
- Enumerate the neighbours of a node and sort them in ascending order.
- Generate a random number.
- Scale the random number to choose a random node from the set of neighbours.
Metropolis Random Walk
A metropolis random walk is a variation of a random walk. It has the following method:
- Select a random candidate neighbour.
- Compute the chance of moving by calculating $\frac{\deg n}{\deg m}$.
- Where $n$ is the current node and $m$ is the candidate neighbour.
- If the probability is $\geq 1$ move to the candidate neighbour.
- Else compare the probability to another random number.
- If the probability is greater than the new random number, move to the candidate neighbour.
- Else try again with a new candidate.
This networks page from Barabási will be a useful external resource for the course.
Existing Definitions
A network is a structure formed by a collection of nodes (vertices) connected by links (edges).
- Path
- A sequence of distinct nodes, such that nay two successive elements are joined by a link.
The length of a finite path is the number of links in it.
- Cycle
- A finite path $x_1, x_2, \ldots, x_n$ such that $x_1=x_n$.
- Distance
- The length of the shortest path between two nodes.
Acyclic Networks
Networks without cycles are called acyclic:
- Trees are an example of acyclic networks.
Bipartite Networks
A network is bipartite if its nodes can be subdivided in two groups with no link joining two nodes in the same group.
Planar Networks
A network is planar if it can be drawn in the plane without lines crossing.
A region in a planar network is an area surrounded by links which do not contain smaller regions. For the following graph:
graph LR
1 --- 2
2 --- 3
1 --- 3
2 --- 5
2 --- 4
3 --- 4
3 --- 4
4 --- 5
5 --- 7
5 --- 6
7 --- 6
subgraph region2
5
6
7
end
subgraph not region
4
subgraph region1
1
2
3
end
end
The girth of a network is the length of the smallest cycle in the network.
For the network above, its girth is 3.
Every connected planar network with $n$ nodes, $m$ links and $r$ regions satisfies:
\[n - m + r = 2\]
You can visualise regions in a planar graph by counting the number of enclosed regions in the network when projected around a sphere:
This means that there is one additional region when counting the enclosed regions on a 2D projection due to the back of the sphere.
Maximal Networks
Assume that $G$ is a connected planar network with $n$ nodes and $m$ links, and all of its regions have at least $g$ sides. Each link belongs to two regions only.
Hence if $F$ is the collection of all regions:
\[\sum_F\left\vert f\right\vert=2m\]
Where:
- $\left\vert f\right\vert$ is the size of $f$.
As we know that the size of $f\geq g$:
\[gr\geq 2m\]
In this we can substitute the Euler formula:
\[\begin{aligned}
g(2-n+m)&\leq2m\\
m&\leq\frac g{g-2}(n-2)\\
m&\leq\frac{gn-2n}{g-2}
\end{aligned}\]
where:
- $n$ is the number of nodes.
- $m$ if the number of links.
Non-planar Networks
Consider the following two networks:
graph TD
subgraph K5
a --- b
a --- c
b --- c
a --- d
b --- d
c --- d
a --- e
b --- e
c --- e
d --- e
end
subgraph B33
1 --- 4
1 --- 5
1 --- 6
2 --- 4
2 --- 5
2 --- 6
3 --- 4
3 --- 5
3 --- 6
end
Where:
- $B_{3,3}$ is a fully connected bipartite graph with 3 nodes on each side.
- $K_5$ is a fully connected graph with 5 nodes.
To check for $K5$, select any 5 vertices in a graph and see if there are paths between them that match the $K5$ pattern. Paths are required as there may be additional nodes on the arcs.
Kuratowski’s Principle states:
Any network that is non-planar must contain $K_5$ or $K_{3,3}$ or a network obtained from one of these by replacing a link with a simple path.
Therefore we can identify these shapes in networks to avoid having to use Euler’s formula.