Data-driven network alignment

Shawn Gu and Tijana Milenkovic

Network alignment (NA) aims to find a node mapping between compared networks that uncovers regions of high topological similarity, thus allowing for the transfer of functional knowledge between aligned nodes. For example, one can align protein-protein interaction networks of yeast and human and infer functions of human proteins based on functions of their yeast counterparts. However, many current NA methods do not end up aligning functionally related nodes. A likely reason is that current NA methods assume that topologically similar nodes have high functional relatedness. However, in this study we provide evidence that this assumption does not hold well. As such, a paradigm shift is needed with how the NA problem is approached. So, we redefine NA as a data-driven framework, called TARA (data-driven NA), which attempts to learn the relationship between topological relatedness and functional relatedness without assuming that topological relatedness corresponds to topological similarity. TARA makes no assumptions about what nodes should be aligned, distinguishing it from existing NA methods. Specifically, TARA trains a classifier to predict whether two nodes from different networks are functionally related based on their network topological patterns (features). We find that TARA is able to make accurate predictions. TARA then takes each pair of nodes that are predicted as related to be part of an alignment. Like traditional NA methods, TARA uses this alignment for the across-species transfer of functional knowledge. Clearly, TARA as currently implemented uses topological but not protein sequence information for functional knowledge transfer. In this context, we find that TARA outperforms existing state-of-the-art NA methods that also use topological information, WAVE and SANA, and even outperforms or complements a state-of-the-art NA method that uses both topological and sequence information, PrimAlign. Hence, adding sequence information to TARA, which is our future work, is likely to further improve its performance.

Contact: tmilenko [at] nd [dot] edu

Software: The source code and data are available for download, along with detailed usage instructions.