Cliques & Network Associativity

COMP324 Lectures 2023-02-16

These are methods for finding important groups of nodes rather than individual nodes.

Dense Subgraphs

It may be useful to obtain a dense part of the network rather than a set of nodes with high degree.

$k$-kernel

The $k$-kernel of a network is the largest induced sub-network with no node of degree less than $k$.

For example, given the following network:

graph TD
1 --- 12
1 --- 14
1 --- 15
1 --- 17
1 --- 23
2 --- 10
3 --- 6
3 --- 14
3 --- 23
4 --- 12
5 --- 10
6 --- 10
6 --- 11
6 --- 23
7 --- 10
9 --- 10
11 --- 14
12 --- 15
12 --- 17
15 --- 17
17 --- 23
subgraph k=3
1
12
15
17
end

We can find this by iterativley completing the following:

Search the graph for nodes that have less than $n$ neighbours.
Remove them from a graph and treat the result as a new graph.
Repeat again with the new graph until only the induce sub-network remains.

Cliques

Clique: A network in which every pair of nodes is joined by a link.
Maximal Clique: In a network $G$, a subnetwork of $G$ that is a clique and that is not part of a larger clique in $G$.

If a clique has more than two members, like ${5,6,7,8}$ we view this as a fully connected graph between all nodes.

Clique Listing

Using the Bron-Kerbosch algorithm we can get the size distribution of the largest cliques:

BK(R, P, X)
	if P is empty and X is empty then
		return R
	else
		// assume that P={u_1, ..., u_k}
		for i = 1 to k do
			remove u_i from P
			call BK recursively on R*, P* and X* as defined:
				R* = R ∪ {u_i}
				P* = P ∩ N(u_i)
				X* = X ∩ N(u_i)
			add u_i to X
		end for
	end if

where the inputs are:

R - Current clique we are investigating
P - Vertices that can be added
X - Vertices that cannot be added

A reminder that the function $N(u_i)$ gives all the neighbours of $u_i$.

To start we give then input:

\[\text{BK}(\emptyset, \{\text{all nodes}\},\emptyset )\]

Homophily

Homophily: The tenancy to associate with others whom you perceive to be similar to yourself.

We are able to classify nodes in a graph given a non-trivial classification:

\[\mathcal C = \{C_i:1\leq i \leq h\}\]

Measuring Homophily

The modularity of a network (with respect to the given partition $\mathcal C$ of its vertices) is defined as:

\[Q=\frac1{2m}\sum_{i,j}\left(A_{i,j}-\frac{\deg i\cdot\deg j}{2m}\right)\delta\left(c(i),c(j)\right)\]

additionally let:

\[Q_\max=1-\frac1{2m}\sum_{i,j}\frac{\deg i\cdot\deg j}{2m}\delta\left(c(i),c(j)\right)\]

The assortativity coefficient is defined as the ratio:

\[\frac Q{Q_\max}\]

where:

$c(i)$ - the class of the node $i$
$\delta(x, y)$ returns one if $x=y$ and zero otherwise.

We can calculate the assortivity coefficient using only the network links and node degrees by using the following formula:

\[\frac Q{Q_\max}=\frac{2m\cdot2\cdot\#(\text{edges inside classes})-\sum_C\left(\sum_{i\in C}\deg i\right)^2}{2m\cdot2m-\sum_C\left(\sum_{i\in C}\deg i\right)^2}\]

where:

$i\in C$ - the node $i$ is in the class $C$ ($c(i)=C$).
$\#(\text{edges inside classes})$ - The number of edges connecting nodes of the same class.
$m$ - the total number of arcs.

For the summing components, we sum for all classes. In each class, we calculate the squared sum of the degrees of each vertex.