European Institute for Statistics, Probability, Stochastic Operations Research
and their Applications

About | Research | Events | People | Reports | Alumni | ContactHome


January 22-23-24, 2014

Workshop on


Networks with community structure







This workshop brings together researchers working on networks with community structure. This includes
(1) the generation of random graph models of networks with community structure and their topology
(2) the behavior of processes on networks with community structure and how they are affected by the community structure
(3) the algorithmic and statistical aspects of community detection (or graph partitioning).

This workshop is the closing workshop of a thematic month on Combinatorics and Probability.




Nikhil Bansal TU Eindhoven
Rui Castro TU Eindhoven
Remco van der Hofstad TU Eindhoven / Eurandom
Ross Kang Utrecht University / Radboud University Nijmegen
Julia Komjathy TU Eindhoven



Konstantin Avratchenkov INRIA Sophia Antipolis
Nina Balcan Georgia Tech
Sharmodeep Bhattacharyya UC Berkeley
Emilie Coupecoux Université Nice Sophia Antipolis
Janos Kertesz CEU
Marc Lelarge INRIA-ENS
Nelly Litvak TU Twente
Giacomo Livan Abdus Salam International Centre for Theoretical Physics
Konstantin Makarychev Microsoft Research
Gergely Palla MTA-ELTE Statistical and Biological Physics Research Group
Alessandro Panconesi Sapienza, University of Rome
Alexandre Proutiere KTH
Yun Seyoung KTH
Nees Jan van Eck Centre for Science and Technology Studies , Leiden University
Ludo Waltman Centre for Science and Technology Studies , Leiden University




08.30 - 09.00 Registration    
09.00 - 09.10 Opening    
09.10 - 10.00   Marc Lelarge Optimal clustering for the edge-labeled stochastic block model
10.00 - 10.30 BREAK    
10.30 - 11.15   Giacomo Livan Social networks: A perspective from Statistical Physics
11.15 - 12.00   Nina Balcan Finding Endogenously Formed Communities
12.00 - 14.00 LUNCH    
14.00 - 14.45   Nelly Litvak Detecting trends and popular entities in social media
14.50 - 15.35   Konstantin Avratchenkov Semi-supervised Learning Methods for Community Detection
15.35 - 16.00 BREAK    
16.00 - 16.45   Konstantin Makarychev Constant Factor Approximation for Balanced Cut in the PIE Model


09-00 - 09.45   Janos Kertesz Communities in large communication networks: Static and dynamic aspects
09.50 - 10.35   Nicolas Verzelen Detecting a community in Random Networks
10.35 - 11.00 BREAK    
11.00 - 11.45   Gergely Palla Finding overlapping communities with k-clique percolation
12.00 - 14.00 LUNCH    
14.00 - 14.45   Alessandro Panconesi Random walks and sybil attacks
14.50 - 15.35   Emilie Coupechoux Diffusion of innovations in random clustered networks with overlapping communities
15.35 - 16.00 BREAK    
16.00 - 16.45   Sharmodeep Battacharyya Community Detection in Networks Using Graph Distance


09.00 - 09.45   Ludo Waltman

Two challenges in community detection: Large networks and overlapping communities

09.50 - 10.35   Nees Jan van Eck

Applications of community detection in bibliometric network analysis

10.35 - 11.00 BREAK    
11.00 - 11.45   Alexandre Proutiere/Yun Seyoung Community Detection via Random and Adaptive Sampling
11.50 - 12.00 CLOSING    
12.00 - 13.00 LUNCH    



Konstantin Avratchenkov

Semi-supervised Learning Methods for Community Detection

Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with the similarity graph for classification of data points into predefined classes. For each class a semi-supervised method provides a classification function. The main idea of the semi-supervised methods is based on the assumption that the classification function should change smoothly over the similarity graph. Different semi-supervised learning methods have different kernels which reflect how the underlying similarity graph influences the values of the classification functions. In the present talk, we analyse a general family of semi-supervised methods, explain the differences between the methods and provide recommendations for the choice of the kernel parameters and labelled points. In particular, it appears that it is preferable to choose a method and a kernel based on the properties of the labelled points. We illustrate our general theoretical conclusions with real and synthetic data. We describe application to community detection problem. Finally, we discuss unsupervised learning approach for choosing the labelled nodes.

Nina Balcan

Finding Endogenously Formed Communities

An important unsupervised learning task which has received significant recent interest is identifying overlapping clusters, or communities, in networks ranging from professional contact networks to citation networks to product purchasing networks. While many heuristics and optimization criteria have been proposed, a lot of the previous work has disallowed natural communities such as those containing highly popular nodes, or have not given general guarantees on the computation time needed to find all overlapping communities meeting certain criteria. In this work, we develop effective methods for identifying natural self-determined communities in social networks and in more general affinity systems. These communities have the property that their members collectively prefer each other to anyone else outside the community. By contrast to previous work, our new formalization leads to discovering natural types of communities and enabled us to design efficient algorithms for identifying all such communities. Furthermore,  for interesting settings of the parameters, we also provide a local algorithm with a strong stochastic performance  guarantees that can find a community in time nearly linear in the of size the community (as opposed to the size of the network).

Sharmodeep Battacharyya

Community Detection in Networks Using Graph Distance

The study of networks has received increased attention recently not only from the social sciences and statistics but also from physicists, computer scientists and mathematicians. One of the principal problem in networks is community detection. Many algorithms have been proposed for community finding [6] [7] but most of them do not have have theoretical guarantee for sparse networks and networks close to phase transition boundary proposed by physicists [3]. There are some exceptions but all have incomplete theo- retical basis [2] [1] [5]. Here we propose an algorithm based on the graph distance of vertices in the network. We give theoretical guarantees that our method works in identifying communities for block models and some exten- sions like degree-corrected block models [4] and block models with number of communities growing with number of vertices. Despite favorable simula- tion results, we are not yet able to conclude that our method is satisfactory for worst possible case. We illustrate on a network of political blogs, Face- book networks and some other networks.
(joint work with Peter J. Bickel)

Emilie Coupechoux

Diffusion of innovations in random clustered networks with overlapping communities

We consider a threshold epidemic model on a clustered random graph with overlapping communities. In other words, our epidemic model is such that an individual becomes infected as soon as the proportion of her infected neighbors exceeds the threshold q of the epidemic. In our random graph model, each individual can belong to several communities. The distributions for the community sizes and the number of communities an individual belongs to are arbitrary.
We consider the case where the epidemic starts from a single individual, and we prove a phase transition (when the parameter q of the model varies) for the appearance of a cascade, i.e. when the epidemic can be
propagated to an infinite part of the population. More precisely, we show that our epidemic is entirely described by a multi-type (and alternating) branching process, and then we apply Sevastyanov's theorem
about the phase transition of multi-type Galton-Watson branching processes.
(joint work with Marc Lelarge)

Janos Kertesz

Communities in large communication networks: Static and dynamic aspects

Millions of users leave their "digital footprints" behind when using information communication technology, which enables unprecedented investigations of human interactions at the societal scale. We analyse time stamped records of mobile phone calls and consider the network mapped out as a proxy of the network of social interactions. By identifying the intensity of the relationships with the duration or frequency of calls we have empirically proved a long standing, basic conjecture of social network theory. This Granovetter hypothesis about the "strength of weak ties" states that the society consists of communities, where individuals are strongly connected to each other and these communities are linked by weak ties, which hold the whole society together. These structure has severe consequences for the dynamics as demonstrated by empirical investigations and model
calculations: The spreading on such a weighted network with communities is considerably slower than in a reference system without these stuctures and proceeds in a remarkable, stepwise fashion.

Marc Lelarge

Optimal clustering for the edge-labeled stochastic block model

The labeled stochastic block model is a random graph model representing networks with community structure and interactions of multiple types. In its simplest form, it consists of two communities of approximately equal size, and the edges are drawn and labeled at random with probability depending on whether their two endpoints belong to the same community or not. It has been conjectured that this model exhibits a phase transition: reconstruction (i.e. identification of a partition positively correlated with the true partition into the underlying communities) would be feasible if and only if a model parameter exceeds a threshold. We prove this conjecture for ’not too sparse’ graphs.

Nelly Litvak

Detecting trends and popular entities in social media

In the first part of this talk I will discuss the problem of finding top k nodes with the largest degrees in very large on-line networks. If the adjacency list of the network is known (which is not the case, for example, in the Twitter follower graph), a deterministic algorithm to find the top k list of nodes with the largest degrees requires an average complexity of O(n), where n is the number of nodes in the network. Even this modest complexity can be very high for large complex networks such as Web or Twitter. Instead, we propose randomized Monte Carlo methods, and show theoretically and by numerical experiments that this algorithm find good quality top lists in the time that scales sublinearly with the network size. Next, I will discuss indicative properties of user behavior in Twitter to predict trending topics. For this purpose, we first extensively analyze tweet data sets and retweet graphs of several different events. We find that that the retweet graph for an `interactive' trending topic, for which off-line and on-line events influence each other (such as the Project X calamity in Haren, the Netherlands), has a relatively dense largest connected component (LCC). Based on the insights obtained from the datasets, we design a mathematical model that describes the evolution of a retweet graph by three main parameters and investigate the potential predictive value of these parameters for detecting an `interactive’ trend.

Giacomo Livan

Social networks: A perspective from Statistical Physics

Individuals in a society do not interact randomly, but rather preferentially interact with a small local subset of the whole population, i.e. their friends and neighbors. Such a picture naturally translates into a network description. Social networks, i.e. the set of relationships in a social group, influence and constrain, often in rather non-trivial ways, the choices of individuals: interaction can change behavior, which in turn can promote proximity, allowing for further interaction. Statistical Physics can prove to be a rather effective tool to describe the interplay between interactions and topology: once the details of the social interactions taking place on the network have been specified, one can often categorize the different states of a given stylized society as the different phases of the corresponding statistical model.
In this talk, I will present two examples of this line of reasoning, one pertaining to the spectacular reduction in social mobility emerging in societies whose individuals highly value social status, the other one related to social learning, i.e. a society’s ability to properly aggregate the noisy information spread across its individuals in order to make the right decisions on relevant matters.

Konstantin Makarychev

Constant Factor Approximation for Balanced Cut in the PIE Model

We propose and study a new semi-random semi-adversarial model for Balanced Cut, a planted model with permutation-invariant random edges (PIE). Our model is much more general than planted models considered previously. Consider a set of vertices V partitioned into two clusters L and R of equal size. Let G be an arbitrary graph on V with no edges between L and R. Let E_random be a set of edges sampled from an arbitrary permutation-invariant distribution (a distribution that is invariant under permutation of vertices in L and in R). Then we say that G + E_random is a graph with permutation-invariant random edges.
We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost O(|E_random|) + n polylog(n) in this model. In the most interesting regime, this is a constant factor approximation with respect to the cost of the planted cut.
(joint work with Yury Makarychev and Aravindan Vijayaraghavan)

Gergely Palla

Finding overlapping communities with k-clique percolation

Communities in real networks are often overlapping, e.g., people can belong to several social groups in parallel, (family, friendship circle, etc.), and also proteins can be members in different functional modules at the same time in a protein interaction network.
An intuitive approach for uncovering overlapping communities in networks is provided by k-clique percolation, where the communities are built up from k-cliques, corresponding to complete sub-graphs with k nodes. In the talk we discuss both the theoretical aspects and a few practical applications of k-clique percolation, starting from the study of the k-clique percolation transition in the Erdos-Renyi graph through the introduction of directed and weighted k-cliques to the study of community evolution in real social networks.

Alessandro Panconesi

Random walks and sybil attacks

Sybil attacks, in which an adversary forges a potentially unbounded number of fake identities, are a danger to distributed systems and online social networks.  The goal of sybil defense is to accurately identify sybil identities. There is a deep connection between sybil defense and the theory of random walks that we will survey in this talk.  The discussion will lead us to identify a community detection algorithm that, for the first time, offers provable guarantees in the context of sybil defense.
(joint work with L.Alvisi, A.Clement, A.Epasto and S.Lattanzi)

Alexandre Proutiere

Community Detection in Networks via Random and Adaptive Sampling

We consider the problem of identifying communities in networks. This problem has been extensively studied in statistics, physics, and computer science, and has many important applications in diverse contexts such as biology, social networks, and distributed computing. A central question for such a problem is to characterize conditions under which communities can be efficiently detected using low complexity algorithms. We address this question (i) when the graph is generated through the so-called stochastic block model (also known as the planted partition model), and (ii) when the graph can be only partially observed, or sampled adaptively. We provide fundamental performance limits satisfied by any community detection algorithm (irrespective of its complexity), and design a spectral-based algorithm whose performance approaches these limits in some relevant cases.

Nees Jan van Eck

Applications of community detection in bibliometric network analysis

In this talk, we focus on the analysis of bibliometric networks, and in particular on the detection of communities in these networks. We start by demonstrating VOSviewer, a popular software tool for visualizing bibliometric networks. We discuss the techniques used by VOSviewer for visualizing bibliometric networks and for detecting communities in these networks. We pay special attention to the close relationship between visualization and community detection, and we discuss the unified approach to visualization and community detection that is implemented in VOSviewer. We then shift our attention to community detection in very large citation networks, including millions of publications and hundreds of millions of citation relations. We show how community detection techniques can be used to construct highly detailed classification systems of science. We also discuss applications of such classification systems to science policy questions. Finally, we demonstrate CitNetExplorer, a new software tool in which community detection techniques are used to support the large-scale analysis of citation networks. We use CitNetExplorer to analyze the citation network of publications on network science and in particular on community detection.

Nicolas Verzelen

Detecting a community in Random Networks

We formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph that is unusually dense. We observe an undirected and unweighted graph on N nodes. Under the null hypothesis, the graph is a realization of an Erdös-Rényi graph with probability p0. Under the (composite) alternative, there is a subgraph of n nodes where the probability of connection is p1 > p0. We derive a detection lower bound for detecting such a subgraph in terms of N, n, p0, p1 and exhibit a test that achieves that lower bound. We do this both when p0 is known and unknown. We also consider the problem of testing in polynomial-time. As an aside, we consider the problem of detecting a clique, which is intimately related to the planted clique problem.

Ludo Waltman

Two challenges in community detection: Large networks and overlapping communities

Community detection in networks has received a lot of attention during the past decade. Most work has focused on small and medium-sizes networks, but there is also an increasing interest in community detection in large networks, consisting of many millions of nodes and edges. In this talk, we discuss community detection based on the optimization of modularity functions. We first define a general class of modularity functions, and we discuss the strengths and limitations of these functions. We then present an algorithm for modularity optimization in large networks. The algorithm is referred to as a smart local moving algorithm. We demonstrate that the algorithm outperforms the popular ‘Louvain algorithm’. Next, we discuss the problem of generalizing modularity functions to detect overlapping communities. We argue that so far no principled solution has been proposed to this problem. We then present a proposal for such a solution. We also present an algorithm for optimizing our proposed generalized modularity functions. Finally, we demonstrate the performance of our approach to overlapping community detection.



Eurandom, Mathematics and Computer Science Dept, TU Eindhoven,

Den Dolech 2, 5612 AZ  EINDHOVEN,  The Netherlands

Eurandom is located on the campus of Eindhoven University of Technology, in the Metaforum building (4th floor) (about the building). The university is located at 10 minutes walking distance from Eindhoven main railway station (take the exit north side and walk towards the tall building on the right with the sign TU/e).
Accessibility TU/e campus and map.




Mrs. Patty Koorn, Workshop Officer, Eurandom/TU Eindhoven, koorn@eurandom.tue.nl



The organisers acknowledge the financial support/sponsorship of:








Last updated 09-04-14,
by PK

 P.O. Box 513, 5600 MB Eindhoven, The Netherlands
tel. +31 40 2478100  
  e-mail: info@eurandom.tue.nl