Networks represent the processes of social, biological, or information systems. Researchers routinely collect data and represent this data using a network. There are many implicit and explicit choices in collecting and transforming this underlying data to a network representation.
The choice of representation affects downstream analysis: link and label prediction, hypothesis testing, etc. Furthermore, researchers often do not control the underlying data or its collection, and want to learn the best representation for their question on a given network. This tutorial examines the challenges in transforming data to a network representation, and extracting latent representations from networks. We compare the advantages of global models directly from data, inferred network models, and inferred latent space models.
Nowadays, larger and larger, more and more sophisticated networks are used in more and more applications. It is well recognized that network data is sophisticated and challenging. To process graph data effectively, the first critical challenge is network data representation, that is, how to represent networks properly so that advanced analytic tasks, such as pattern discovery, analysis and prediction, can be conducted efficiently in both time and space.
In this tutorial, we will review the recent thoughts and achievements on network embedding. More specifically, a series of fundamental problems in network embedding will be discussed, including why we need to revisit network representation, what are the fundamental problems of network embedding, how network embedding can be learned, and the latest progress and trend of network embedding.