Cliques & Network Associativity
These are methods for finding important groups of nodes rather than individual nodes.
Dense Subgraphs
It may be useful to obtain a dense part of the network rather than a set of nodes with high degree.
$k$-kernel
The $k$-kernel of a network is the largest induced sub-network with no node of degree less than $k$.
For example, given the following network:
graph TD
1 --- 12
1 --- 14
1 --- 15
1 --- 17
1 --- 23
2 --- 10
3 --- 6
3 --- 14
3 --- 23
4 --- 12
5 --- 10
6 --- 10
6 --- 11
6 --- 23
7 --- 10
9 --- 10
11 --- 14
12 --- 15
12 --- 17
15 --- 17
17 --- 23
subgraph k=3
1
12
15
17
end
We can find this by iterativley completing the following:
- Search the graph for nodes that have less than $n$ neighbours.
- Remove them from a graph and treat the result as a new graph.
- Repeat again with the new graph until only the induce sub-network remains.
Cliques
- Clique
- A network in which every pair of nodes is joined by a link.
- Maximal Clique
- In a network $G$, a subnetwork of $G$ that is a clique and that is not part of a larger clique in $G$.
If a clique has more than two members, like ${5,6,7,8}$ we view this as a fully connected graph between all nodes.
Clique Listing
Using the Bron-Kerbosch algorithm we can get the size distribution of the largest cliques:
BK(R, P, X)
if P is empty and X is empty then
return R
else
// assume that P={u_1, ..., u_k}
for i = 1 to k do
remove u_i from P
call BK recursively on R*, P* and X* as defined:
R* = R ∪ {u_i}
P* = P ∩ N(u_i)
X* = X ∩ N(u_i)
add u_i to X
end for
end if
where the inputs are:
- R - Current clique we are investigating
- P - Vertices that can be added
- X - Vertices that cannot be added
A reminder that the function $N(u_i)$ gives all the neighbours of $u_i$.
To start we give then input:
\[\text{BK}(\emptyset, \{\text{all nodes}\},\emptyset )\]Homophily
- Homophily
- The tenancy to associate with others whom you perceive to be similar to yourself.
We are able to classify nodes in a graph given a non-trivial classification:
\[\mathcal C = \{C_i:1\leq i \leq h\}\]Measuring Homophily
The modularity of a network (with respect to the given partition $\mathcal C$ of its vertices) is defined as:
\[Q=\frac1{2m}\sum_{i,j}\left(A_{i,j}-\frac{\deg i\cdot\deg j}{2m}\right)\delta\left(c(i),c(j)\right)\]additionally let:
\[Q_\max=1-\frac1{2m}\sum_{i,j}\frac{\deg i\cdot\deg j}{2m}\delta\left(c(i),c(j)\right)\]The assortativity coefficient is defined as the ratio:
\[\frac Q{Q_\max}\]where:
- $c(i)$ - the class of the node $i$
- $\delta(x, y)$ returns one if $x=y$ and zero otherwise.
We can calculate the assortivity coefficient using only the network links and node degrees by using the following formula:
\[\frac Q{Q_\max}=\frac{2m\cdot2\cdot\#(\text{edges inside classes})-\sum_C\left(\sum_{i\in C}\deg i\right)^2}{2m\cdot2m-\sum_C\left(\sum_{i\in C}\deg i\right)^2}\]where:
- $i\in C$ - the node $i$ is in the class $C$ ($c(i)=C$).
- \(\#(\text{edges inside classes})\) - The number of edges connecting nodes of the same class.
- $m$ - the total number of arcs.
For the summing components, we sum for all classes. In each class, we calculate the squared sum of the degrees of each vertex.