About | Research | Events | People | Reports | Alumni | Contact | Home
January 22-23-24, 2014 Workshop on
Networks with community structure
SUMMARY This workshop brings together researchers working on
networks with community structure. This includes
ORGANISERS
WEDNESDAY JAN 22
THURSDAY JAN 23
FRIDAY JAN 24
|
09.00 - 09.45 | Ludo Waltman |
Two challenges in community detection: Large networks and overlapping communities |
|
09.50 - 10.35 | Nees Jan van Eck |
Applications of community detection in bibliometric network analysis |
|
10.35 - 11.00 | BREAK | ||
11.00 - 11.45 | Alexandre Proutiere/Yun Seyoung | Community Detection via Random and Adaptive Sampling | |
11.50 - 12.00 | CLOSING | ||
12.00 - 13.00 | LUNCH |
Konstantin Avratchenkov
Semi-supervised Learning Methods for Community Detection
Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with the similarity graph for classification of data points into predefined classes. For each class a semi-supervised method provides a classification function. The main idea of the semi-supervised methods is based on the assumption that the classification function should change smoothly over the similarity graph. Different semi-supervised learning methods have different kernels which reflect how the underlying similarity graph influences the values of the classification functions. In the present talk, we analyse a general family of semi-supervised methods, explain the differences between the methods and provide recommendations for the choice of the kernel parameters and labelled points. In particular, it appears that it is preferable to choose a method and a kernel based on the properties of the labelled points. We illustrate our general theoretical conclusions with real and synthetic data. We describe application to community detection problem. Finally, we discuss unsupervised learning approach for choosing the labelled nodes.
Nina Balcan
Finding Endogenously Formed Communities
An important unsupervised learning task which has received significant recent interest is identifying overlapping clusters, or communities, in networks ranging from professional contact networks to citation networks to product purchasing networks. While many heuristics and optimization criteria have been proposed, a lot of the previous work has disallowed natural communities such as those containing highly popular nodes, or have not given general guarantees on the computation time needed to find all overlapping communities meeting certain criteria. In this work, we develop effective methods for identifying natural self-determined communities in social networks and in more general affinity systems. These communities have the property that their members collectively prefer each other to anyone else outside the community. By contrast to previous work, our new formalization leads to discovering natural types of communities and enabled us to design efficient algorithms for identifying all such communities. Furthermore, for interesting settings of the parameters, we also provide a local algorithm with a strong stochastic performance guarantees that can find a community in time nearly linear in the of size the community (as opposed to the size of the network).
Sharmodeep Battacharyya
Community Detection in Networks Using Graph Distance
The study of networks has received increased attention recently not only from
the social sciences and statistics but also from physicists, computer scientists
and mathematicians. One of the principal problem in networks is community
detection. Many algorithms have been proposed for community finding [6] [7] but
most of them do not have have theoretical guarantee for sparse networks and
networks close to phase transition boundary proposed by physicists [3]. There
are some exceptions but all have incomplete theo- retical basis [2] [1] [5].
Here we propose an algorithm based on the graph distance of vertices in the
network. We give theoretical guarantees that our method works in identifying
communities for block models and some exten- sions like degree-corrected block
models [4] and block models with number of communities growing with number of
vertices. Despite favorable simula- tion results, we are not yet able to
conclude that our method is satisfactory for worst possible case. We illustrate
on a network of political blogs, Face- book networks and some other networks.
(joint work with Peter J. Bickel)
Emilie Coupechoux
Diffusion of innovations in random clustered networks with overlapping communities
We consider a threshold epidemic model on a clustered
random graph with overlapping communities. In other words, our epidemic model is
such that an individual becomes infected as soon as the proportion of her
infected neighbors exceeds the threshold q of the epidemic. In our random graph
model, each individual can belong to several communities. The distributions for
the community sizes and the number of communities an individual belongs to are
arbitrary.
We consider the case where the epidemic starts from a single individual, and we
prove a phase transition (when the parameter q of the model varies) for the
appearance of a cascade, i.e. when the epidemic can be
propagated to an infinite part of the population. More precisely, we show that
our epidemic is entirely described by a multi-type (and alternating) branching
process, and then we apply Sevastyanov's theorem
about the phase transition of multi-type Galton-Watson branching processes.
(joint work with Marc Lelarge)
Janos Kertesz
Communities in large communication networks: Static and dynamic aspects
Millions of users leave their "digital footprints" behind when using
information communication technology, which enables unprecedented investigations
of human interactions at the societal scale. We analyse time stamped records of
mobile phone calls and consider the network mapped out as a proxy of the network
of social interactions. By identifying the intensity of the relationships with
the duration or frequency of calls we have empirically proved a long standing,
basic conjecture of social network theory. This Granovetter hypothesis about the
"strength of weak ties" states that the society consists of communities, where
individuals are strongly connected to each other and these communities are
linked by weak ties, which hold the whole society together. These structure has
severe consequences for the dynamics as demonstrated by empirical investigations
and model
calculations: The spreading on such a weighted network with communities is
considerably slower than in a reference system without these stuctures and
proceeds in a remarkable, stepwise fashion.
Marc Lelarge
Optimal clustering for the edge-labeled stochastic block model
The labeled stochastic block model is a random graph model representing networks with community structure and interactions of multiple types. In its simplest form, it consists of two communities of approximately equal size, and the edges are drawn and labeled at random with probability depending on whether their two endpoints belong to the same community or not. It has been conjectured that this model exhibits a phase transition: reconstruction (i.e. identification of a partition positively correlated with the true partition into the underlying communities) would be feasible if and only if a model parameter exceeds a threshold. We prove this conjecture for ’not too sparse’ graphs.
Nelly Litvak
Detecting trends and popular entities in social media
In the first part of this talk I will discuss the problem of finding top k nodes with the largest degrees in very large on-line networks. If the adjacency list of the network is known (which is not the case, for example, in the Twitter follower graph), a deterministic algorithm to find the top k list of nodes with the largest degrees requires an average complexity of O(n), where n is the number of nodes in the network. Even this modest complexity can be very high for large complex networks such as Web or Twitter. Instead, we propose randomized Monte Carlo methods, and show theoretically and by numerical experiments that this algorithm find good quality top lists in the time that scales sublinearly with the network size. Next, I will discuss indicative properties of user behavior in Twitter to predict trending topics. For this purpose, we first extensively analyze tweet data sets and retweet graphs of several different events. We find that that the retweet graph for an `interactive' trending topic, for which off-line and on-line events influence each other (such as the Project X calamity in Haren, the Netherlands), has a relatively dense largest connected component (LCC). Based on the insights obtained from the datasets, we design a mathematical model that describes the evolution of a retweet graph by three main parameters and investigate the potential predictive value of these parameters for detecting an `interactive’ trend.
Giacomo Livan
Social networks: A perspective from Statistical Physics
Individuals in a society do not interact randomly, but rather preferentially
interact with a small local subset of the whole population, i.e. their friends
and neighbors. Such a picture naturally translates into a network description.
Social networks, i.e. the set of relationships in a social group, influence and
constrain, often in rather non-trivial ways, the choices of individuals:
interaction can change behavior, which in turn can promote proximity, allowing
for further interaction. Statistical Physics can prove to be a rather effective
tool to describe the interplay between interactions and topology: once the
details of the social interactions taking place on the network have been
specified, one can often categorize the different states of a given stylized
society as the different phases of the corresponding statistical model.
In this talk, I will present two examples of this line of reasoning, one
pertaining to the spectacular reduction in social mobility emerging in societies
whose individuals highly value social status, the other one related to social
learning, i.e. a society’s ability to properly aggregate the noisy information
spread across its individuals in order to make the right decisions on relevant
matters.
Konstantin Makarychev
Constant Factor Approximation for Balanced Cut in the PIE Model
We propose and study a new semi-random semi-adversarial model for Balanced
Cut, a planted model with permutation-invariant random edges (PIE). Our model is
much more general than planted models considered previously. Consider a set of
vertices V partitioned into two clusters L and R of equal size. Let G be an
arbitrary graph on V with no edges between L and R. Let E_random be a set of
edges sampled from an arbitrary permutation-invariant distribution (a
distribution that is invariant under permutation of vertices in L and in R).
Then we say that G + E_random is a graph with permutation-invariant random
edges.
We present an approximation algorithm for the Balanced Cut problem that finds a
balanced cut of cost O(|E_random|) + n polylog(n) in this model. In the most
interesting regime, this is a constant factor approximation with respect to the
cost of the planted cut.
(joint work with Yury Makarychev and Aravindan Vijayaraghavan)
Gergely Palla
Finding overlapping communities with k-clique percolation
Communities in real networks are often overlapping,
e.g., people can belong to several social groups in parallel, (family,
friendship circle, etc.), and also proteins can be members in different
functional modules at the same time in a protein interaction network.
An intuitive approach for uncovering overlapping communities in networks is
provided by k-clique percolation, where the communities are built up from
k-cliques, corresponding to complete sub-graphs with k nodes. In the talk we
discuss both the theoretical aspects and a few practical applications of
k-clique percolation, starting from the study of the k-clique percolation
transition in the Erdos-Renyi graph through the introduction of directed and
weighted k-cliques to the study of community evolution in real social networks.
Alessandro Panconesi
Random walks and sybil attacks
Sybil attacks, in which an
adversary forges a potentially unbounded number of fake identities, are a danger
to distributed systems and online social networks. The goal of sybil defense is
to accurately identify sybil identities. There is a deep connection between
sybil defense and the theory of random walks that we will survey in this talk.
The discussion will lead us to identify a community detection algorithm that,
for the first time, offers provable guarantees in the context of sybil defense.
(joint work with L.Alvisi, A.Clement, A.Epasto and S.Lattanzi)
Alexandre Proutiere
Community Detection in Networks via Random and Adaptive Sampling
We consider the problem of identifying communities in networks. This problem has been extensively studied in statistics, physics, and computer science, and has many important applications in diverse contexts such as biology, social networks, and distributed computing. A central question for such a problem is to characterize conditions under which communities can be efficiently detected using low complexity algorithms. We address this question (i) when the graph is generated through the so-called stochastic block model (also known as the planted partition model), and (ii) when the graph can be only partially observed, or sampled adaptively. We provide fundamental performance limits satisfied by any community detection algorithm (irrespective of its complexity), and design a spectral-based algorithm whose performance approaches these limits in some relevant cases.
Nees Jan van Eck
Applications of community detection in bibliometric network analysis
In this talk, we focus on the analysis of bibliometric networks, and in particular on the detection of communities in these networks. We start by demonstrating VOSviewer, a popular software tool for visualizing bibliometric networks. We discuss the techniques used by VOSviewer for visualizing bibliometric networks and for detecting communities in these networks. We pay special attention to the close relationship between visualization and community detection, and we discuss the unified approach to visualization and community detection that is implemented in VOSviewer. We then shift our attention to community detection in very large citation networks, including millions of publications and hundreds of millions of citation relations. We show how community detection techniques can be used to construct highly detailed classification systems of science. We also discuss applications of such classification systems to science policy questions. Finally, we demonstrate CitNetExplorer, a new software tool in which community detection techniques are used to support the large-scale analysis of citation networks. We use CitNetExplorer to analyze the citation network of publications on network science and in particular on community detection.
Nicolas Verzelen
Detecting a community in Random Networks
We formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph that is unusually dense. We observe an undirected and unweighted graph on N nodes. Under the null hypothesis, the graph is a realization of an Erdös-Rényi graph with probability p0. Under the (composite) alternative, there is a subgraph of n nodes where the probability of connection is p1 > p0. We derive a detection lower bound for detecting such a subgraph in terms of N, n, p0, p1 and exhibit a test that achieves that lower bound. We do this both when p0 is known and unknown. We also consider the problem of testing in polynomial-time. As an aside, we consider the problem of detecting a clique, which is intimately related to the planted clique problem.
Ludo Waltman
Two challenges in community detection: Large networks and overlapping communities
Community detection in networks has received a lot of attention during the past decade. Most work has focused on small and medium-sizes networks, but there is also an increasing interest in community detection in large networks, consisting of many millions of nodes and edges. In this talk, we discuss community detection based on the optimization of modularity functions. We first define a general class of modularity functions, and we discuss the strengths and limitations of these functions. We then present an algorithm for modularity optimization in large networks. The algorithm is referred to as a smart local moving algorithm. We demonstrate that the algorithm outperforms the popular ‘Louvain algorithm’. Next, we discuss the problem of generalizing modularity functions to detect overlapping communities. We argue that so far no principled solution has been proposed to this problem. We then present a proposal for such a solution. We also present an algorithm for optimizing our proposed generalized modularity functions. Finally, we demonstrate the performance of our approach to overlapping community detection.
Eurandom, Mathematics and Computer Science Dept, TU Eindhoven,
Den Dolech 2, 5612 AZ EINDHOVEN, The Netherlands
Eurandom is located on the campus of
Eindhoven University of
Technology, in the
Metaforum building
(4th floor) (about
the building). The university is
located at 10 minutes walking distance from Eindhoven main railway station (take
the exit north side and walk towards the tall building on the right with the
sign TU/e).
Accessibility TU/e campus and map.
Mrs. Patty Koorn, Workshop Officer, Eurandom/TU Eindhoven, koorn@eurandom.tue.nl
The organisers acknowledge the financial support/sponsorship of:
P.O. Box 513, 5600 MB Eindhoven, The Netherlands
tel. +31 40 2478100
e-mail: info@eurandom.tue.nl