ClueNet: Clustering a temporal network using dynamic graphlets

CoNe lab (contact: tmilenko@nd.edu)

Network clustering is a very popular topic in the network science field. Its goal is to divide (partition) the network into groups (clusters or communities) of "topologically related" nodes, where the resulting topology-based clusters are expected to "correlate" well with node label information, i.e., metadata, such as cellular functions of genes/proteins in biological networks, or age or gender of people in social networks. Even for static data, the problem of network clustering is complex. For dynamic data, which is what we focus on, the problem is even more complex, due to an additional dimension of the data - their temporal (evolving) nature. Since the problem is computationally intractable, heuristic approaches need to be sought. Existing approaches for dynamic network clustering (DNC) have drawbacks. First, they ignore temporal information in their early steps, and when they do consider this information later on, they do so implicitly. We hypothesize that capturing temporal information earlier in the clustering process and doing so explicitly will improve results. Second, the existing approaches assume that nodes should be in the same cluster if they are densely interconnected within the network. We hypothesize that in some applications, it might be of interest to cluster) nodes that are topologically similar to each other instead of or in addition to requiring the nodes to be densely interconnected. We test these two hypotheses via our new approach called ClueNet. We evaluate ClueNet against six existing DNC methods on both social networks capturing evolving interactions between individuals (such as interactions between students in a high school) and biological networks capturing interactions between biomolecules in the cell at different ages. We find that ClueNet is superior in 75% of all evaluation tests. As more real-world dynamic data are becoming available, DNC and thus ClueNet will only continue to gain importance.

Reference: Joseph Crawford and Tijana Milenković (2017), ClueNet: Clustering a temporal network using dynamic, under review.


The above image is the high level overview of how ClueNet clusters nodes in a temporal network based on their topological similarities (where topological similarities are captured through the notion of dynamic graphlet degree vectors (D-GDVs)).

Download code and data
ClueNet code and the data used in our study can be downloaded here, which also includes a readme on how to use the code and the required format of the data.
To compute the dynamic graphlet degree vectors (D-GDVs), please refer to the web site of the original dynamic graphlets publication located here.
The Enron time-series data can be accessed here.
The hospital and high school time series data can be accessed here.
More details regarding the biological data used in this study can be found here.