Math Modeling Blog – Spring 2010



Detecting Communities in Networks

Community may stand for class, group, cluster, etc. and is defined as a subset of vertices within the graph such that connections between the vertices are denser than connections within the rest of the network. The purpose of locating communities is usually to map the network into a tree where the “leaves” are vertices and “branches” join vertices or groups of vertices.

Hierarchical Clustering

For every pair i,j of vertices in the network, a weight Wij is computed that measures how closely connected the vertices are. Starting from the set of all vertices and no edges, edges are iteratively added between pairs of vertices in order of decreasing weight. Vertices are grouped into larger and larger communities, and the tree is built up to the root which represents the whole network.

Hierarchical clustering is often used together with other network measures to detect communities. Measures such as:

the number of vertex-independent graphs which are two paths that connect the same pair of vertices but have no common vertices other than the start and end points

the number of edge-independent graphs which share no common edges

total number of paths that run between i and j.

The number of paths between vertices i and j in a graph is equal to the minimum number of vertices/edges that must be removed from the graph to disconnect i and j and are therefore a measure of the robustness of the netowrk to deletion of vertices/edges. Problems with the hierarchical clustering method exist in single-edge-connected vertices where both the numbers of independent paths and the weighted path counts are small and thus the vertices often remain isolated from the network when communities are constructed.

Betweenness – Vertices that occur on many shortest paths between other vertices have a higher betweenness.

Edge betweenness is the number of shortest paths between pairs of vertices that run along a particular edge. Edges connecting communities will have high edge betweenness, and their removal would separate groups and reveal the underlying community structure of the graph.

Girven-Newman algorithm (2002)

1. Calculate betweenness for all m edges in a graph of n vertices (can be done in O(mn) time).

2. Remove the edge with the highest betweenness.

3. Recalculate betweenness for all edges affected by the removal.

4. Repeat from step 2 until no edges remain.

Because step 3 has to be done for all edges, the algorithm requires in the worst case O(m^2) operations.

Advertisement

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.