European Institute for Statistics, Probability, Stochastic Operations Research
and their Applications

January 22-23-24, 2014

Workshop on

Networks with community structure

SUMMARY

REGISTRATION

SPEAKERS

PROGRAMME

ABSTRACTS

SUMMARY

This workshop brings together researchers working on networks with community structure. This includes
(1) the generation of random graph models of networks with community structure and their topology
(2) the behavior of processes on networks with community structure and how they are affected by the community structure
(3) the algorithmic and statistical aspects of community detection (or graph partitioning).

This workshop is the closing workshop of a thematic month on Combinatorics and Probability.

ORGANISERS

Nikhil Bansal	TU Eindhoven
Rui Castro	TU Eindhoven
Remco van der Hofstad	TU Eindhoven / Eurandom
Ross Kang	Utrecht University / Radboud University Nijmegen
Julia Komjathy	TU Eindhoven

LIST OF INVITED PARTICIPANTS

Konstantin Avratchenkov	INRIA Sophia Antipolis
Nina Balcan	Georgia Tech
Sharmodeep Bhattacharyya	UC Berkeley
Emilie Coupecoux	Université Nice Sophia Antipolis
Janos Kertesz	CEU
Marc Lelarge	INRIA-ENS
Nelly Litvak	TU Twente
Giacomo Livan	Abdus Salam International Centre for Theoretical Physics
Konstantin Makarychev	Microsoft Research
Gergely Palla	MTA-ELTE Statistical and Biological Physics Research Group
Alessandro Panconesi	Sapienza, University of Rome
Alexandre Proutiere	KTH
Yun Seyoung	KTH
Nees Jan van Eck	Centre for Science and Technology Studies , Leiden University
Ludo Waltman	Centre for Science and Technology Studies , Leiden University

PROGRAMME

WEDNESDAY JAN 22

08.30 - 09.00	Registration
09.00 - 09.10	Opening
09.10 - 10.00		Marc Lelarge	Optimal clustering for the edge-labeled stochastic block model
10.00 - 10.30	BREAK
10.30 - 11.15		Giacomo Livan	Social networks: A perspective from Statistical Physics
11.15 - 12.00		Nina Balcan	Finding Endogenously Formed Communities
12.00 - 14.00	LUNCH
14.00 - 14.45		Nelly Litvak	Detecting trends and popular entities in social media
14.50 - 15.35		Konstantin Avratchenkov	Semi-supervised Learning Methods for Community Detection
15.35 - 16.00	BREAK
16.00 - 16.45		Konstantin Makarychev	Constant Factor Approximation for Balanced Cut in the PIE Model
18.30 -	CONFERENCE DINNER

THURSDAY JAN 23

09-00 - 09.45		Janos Kertesz	Communities in large communication networks: Static and dynamic aspects
09.50 - 10.35		Nicolas Verzelen	Detecting a community in Random Networks
10.35 - 11.00	BREAK
11.00 - 11.45		Gergely Palla	Finding overlapping communities with k-clique percolation
12.00 - 14.00	LUNCH
14.00 - 14.45		Alessandro Panconesi	Random walks and sybil attacks
14.50 - 15.35		Emilie Coupechoux	Diffusion of innovations in random clustered networks with overlapping communities
15.35 - 16.00	BREAK
16.00 - 16.45		Sharmodeep Battacharyya	Community Detection in Networks Using Graph Distance

FRIDAY JAN 24

09.00 - 09.45		Ludo Waltman	Two challenges in community detection: Large networks and overlapping communities
09.50 - 10.35		Nees Jan van Eck	Applications of community detection in bibliometric network analysis
10.35 - 11.00	BREAK
11.00 - 11.45		Alexandre Proutiere/Yun Seyoung	Community Detection via Random and Adaptive Sampling
11.50 - 12.00	CLOSING
12.00 - 13.00	LUNCH

ABSTRACTS

Konstantin Avratchenkov

Semi-supervised Learning Methods for Community Detection

Semi-supervised learning methods constitute a category of machine learning methods which use labelled points together with the similarity graph for classification of data points into predefined classes. For each class a semi-supervised method provides a classification function. The main idea of the semi-supervised methods is based on the assumption that the classification function should change smoothly over the similarity graph. Different semi-supervised learning methods have different kernels which reflect how the underlying similarity graph influences the values of the classification functions. In the present talk, we analyse a general family of semi-supervised methods, explain the differences between the methods and provide recommendations for the choice of the kernel parameters and labelled points. In particular, it appears that it is preferable to choose a method and a kernel based on the properties of the labelled points. We illustrate our general theoretical conclusions with real and synthetic data. We describe application to community detection problem. Finally, we discuss unsupervised learning approach for choosing the labelled nodes.

Nina Balcan

Finding Endogenously Formed Communities

An important unsupervised learning task which has received significant recent interest is identifying overlapping clusters, or communities, in networks ranging from professional contact networks to citation networks to product purchasing networks. While many heuristics and optimization criteria have been proposed, a lot of the previous work has disallowed natural communities such as those containing highly popular nodes, or have not given general guarantees on the computation time needed to find all overlapping communities meeting certain criteria. In this work, we develop effective methods for identifying natural self-determined communities in social networks and in more general affinity systems. These communities have the property that their members collectively prefer each other to anyone else outside the community. By contrast to previous work, our new formalization leads to discovering natural types of communities and enabled us to design efficient algorithms for identifying all such communities. Furthermore, for interesting settings of the parameters, we also provide a local algorithm with a strong stochastic performance guarantees that can find a community in time nearly linear in the of size the community (as opposed to the size of the network).

Sharmodeep Battacharyya

Community Detection in Networks Using Graph Distance

The study of networks has received increased attention recently not only from the social sciences and statistics but also from physicists, computer scientists and mathematicians. One of the principal problem in networks is community detection. Many algorithms have been proposed for community finding [6] [7] but most of them do not have have theoretical guarantee for sparse networks and networks close to phase transition boundary proposed by physicists [3]. There are some exceptions but all have incomplete theo- retical basis [2] [1] [5]. Here we propose an algorithm based on the graph distance of vertices in the network. We give theoretical guarantees that our method works in identifying communities for block models and some exten- sions like degree-corrected block models [4] and block models with number of communities growing with number of vertices. Despite favorable simula- tion results, we are not yet able to conclude that our method is satisfactory for worst possible case. We illustrate on a network of political blogs, Face- book networks and some other networks.
(joint work with Peter J. Bickel)

Emilie Coupechoux

Diffusion of innovations in random clustered networks with overlapping communities

We consider a threshold epidemic model on a clustered random graph with overlapping communities. In other words, our epidemic model is such that an individual becomes infected as soon as the proportion of her infected neighbors exceeds the threshold q of the epidemic. In our random graph model, each individual can belong to several communities. The distributions for the community sizes and the number of communities an individual belongs to are arbitrary.
We consider the case where the epidemic starts from a single individual, and we prove a phase transition (when the parameter q of the model varies) for the appearance of a cascade, i.e. when the epidemic can be
propagated to an infinite part of the population. More precisely, we show that our epidemic is entirely described by a multi-type (and alternating) branching process, and then we apply Sevastyanov's theorem
about the phase transition of multi-type Galton-Watson branching processes.
(joint work with Marc Lelarge)

Janos Kertesz

Communities in large communication networks: Static and dynamic aspects

Millions of users leave their "digital footprints" behind when using information communication technology, which enables unprecedented investigations of human interactions at the societal scale. We analyse time stamped records of mobile phone calls and consider the network mapped out as a proxy of the network of social interactions. By identifying the intensity of the relationships with the duration or frequency of calls we have empirically proved a long standing, basic conjecture of social network theory. This Granovetter hypothesis about the "strength of weak ties" states that the society consists of communities, where individuals are strongly connected to each other and these communities are linked by weak ties, which hold the whole society together. These structure has severe consequences for the dynamics as demonstrated by empirical investigations and model
calculations: The spreading on such a weighted network with communities is considerably slower than in a reference system without these stuctures and proceeds in a remarkable, stepwise fashion.

Marc Lelarge

Optimal clustering for the edge-labeled stochastic block model

The labeled stochastic block model is a random graph model representing networks with community structure and interactions of multiple types. In its simplest form, it consists of two communities of approximately equal size, and the edges are drawn and labeled at random with probability depending on whether their two endpoints belong to the same community or not. It has been conjectured that this model exhibits a phase transition: reconstruction (i.e. identification of a partition positively correlated with the true partition into the underlying communities) would be feasible if and only if a model parameter exceeds a threshold. We prove this conjecture for ’not too sparse’ graphs.

Nelly Litvak

Detecting trends and popular entities in social media

In the first part of this talk I will discuss the problem of finding top k nodes with the largest degrees in very large on-line networks. If the adjacency list of the network is known (which is not the case, for example, in the Twitter follower graph), a deterministic algorithm to find the top k list of nodes with the largest degrees requires an average complexity of O(n), where n is the number of nodes in the network. Even this modest complexity can be very high for large complex networks such as Web or Twitter. Instead, we propose randomized Monte Carlo methods, and show theoretically and by numerical experiments that this algorithm find good quality top lists in the time that scales sublinearly with the network size. Next, I will discuss indicative properties of user behavior in Twitter to predict trending topics. For this purpose, we first extensively analyze tweet data sets and retweet graphs of several different events. We find that that the retweet graph for an `interactive' trending topic, for which off-line and on-line events influence each other (such as the Project X calamity in Haren, the Netherlands), has a relatively dense largest connected component (LCC). Based on the insights obtained from the datasets, we design a mathematical model that describes the evolution of a retweet graph by three main parameters and investigate the potential predictive value of these parameters for detecting an `interactive’ trend.

Giacomo Livan

Social networks: A perspective from Statistical Physics

Individuals in a society do not interact randomly, but rather preferentially interact with a small local subset of the whole population, i.e. their friends and neighbors. Such a picture naturally translates into a network description. Social networks, i.e. the set of relationships in a social group, influence and constrain, often in rather non-trivial ways, the choices of individuals: interaction can change behavior, which in turn can promote proximity, allowing for further interaction. Statistical Physics can prove to be a rather effective tool to describe the interplay between interactions and topology: once the details of the social interactions taking place on the network have been specified, one can often categorize the different states of a given stylized society as the different phases of the corresponding statistical model.
In this talk, I will present two examples of this line of reasoning, one pertaining to the spectacular reduction in social mobility emerging in societies whose individuals highly value social status, the other one related to social learning, i.e. a society’s ability to properly aggregate the noisy information spread across its individuals in order to make the right decisions on relevant matters.

Konstantin Makarychev

Constant Factor Approximation for Balanced Cut in the PIE Model

We propose and study a new semi-random semi-adversarial model for Balanced Cut, a planted model with permutation-invariant random edges (PIE). Our model is much more general than planted models considered previously. Consider a set of vertices V partitioned into two clusters L and R of equal size. Let G be an arbitrary graph on V with no edges between L and R. Let E_random be a set of edges sampled from an arbitrary permutation-invariant distribution (a distribution that is invariant under permutation of vertices in L and in R). Then we say that G + E_random is a graph with permutation-invariant random edges.
We present an approximation algorithm for the Balanced Cut problem that finds a balanced cut of cost O(|E_random|) + n polylog(n) in this model. In the most interesting regime, this is a constant factor approximation with respect to the cost of the planted cut.
(joint work with Yury Makarychev and Aravindan Vijayaraghavan)

Gergely Palla

Finding overlapping communities with k-clique percolation

Communities in real networks are often overlapping, e.g., people can belong to several social groups in parallel, (family, friendship circle, etc.), and also proteins can be members in different functional modules at the same time in a protein interaction network.
An intuitive approach for uncovering overlapping communities in networks is provided by k-clique percolation, where the communities are built up from k-cliques, corresponding to complete sub-graphs with k nodes. In the talk we discuss both the theoretical aspects and a few practical applications of k-clique percolation, starting from the study of the k-clique percolation transition in the Erdos-Renyi graph through the introduction of directed and weighted k-cliques to the study of community evolution in real social networks.

Alessandro Panconesi

Random walks and sybil attacks

Sybil attacks, in which an adversary forges a potentially unbounded number of fake identities, are a danger to distributed systems and online social networks. The goal of sybil defense is to accurately identify sybil identities. There is a deep connection between sybil defense and the theory of random walks that we will survey in this talk. The discussion will lead us to identify a community detection algorithm that, for the first time, offers provable guarantees in the context of sybil defense.
(joint work with L.Alvisi, A.Clement, A.Epasto and S.Lattanzi)

Alexandre Proutiere

Community Detection in Networks via Random and Adaptive Sampling

We consider the problem of identifying communities in networks. This problem has been extensively studied in statistics, physics, and computer science, and has many important applications in diverse contexts such as biology, social networks, and distributed computing. A central question for such a problem is to characterize conditions under which communities can be efficiently detected using low complexity algorithms. We address this question (i) when the graph is generated through the so-called stochastic block model (also known as the planted partition model), and (ii) when the graph can be only partially observed, or sampled adaptively. We provide fundamental performance limits satisfied by any community detection algorithm (irrespective of its complexity), and design a spectral-based algorithm whose performance approaches these limits in some relevant cases.

Nees Jan van Eck

Applications of community detection in bibliometric network analysis

In this talk, we focus on the analysis of bibliometric networks, and in particular on the detection of communities in these networks. We start by demonstrating VOSviewer, a popular software tool for visualizing bibliometric networks. We discuss the techniques used by VOSviewer for visualizing bibliometric networks and for detecting communities in these networks. We pay special attention to the close relationship between visualization and community detection, and we discuss the unified approach to visualization and community detection that is implemented in VOSviewer. We then shift our attention to community detection in very large citation networks, including millions of publications and hundreds of millions of citation relations. We show how community detection techniques can be used to construct highly detailed classification systems of science. We also discuss applications of such classification systems to science policy questions. Finally, we demonstrate CitNetExplorer, a new software tool in which community detection techniques are used to support the large-scale analysis of citation networks. We use CitNetExplorer to analyze the citation network of publications on network science and in particular on community detection.

Nicolas Verzelen

Detecting a community in Random Networks

We formalize the problem of detecting a community in a network into testing whether in a given (random) graph there is a subgraph that is unusually dense. We observe an undirected and unweighted graph on N nodes. Under the null hypothesis, the graph is a realization of an Erdös-Rényi graph with probability p0. Under the (composite) alternative, there is a subgraph of n nodes where the probability of connection is p1 > p0. We derive a detection lower bound for detecting such a subgraph in terms of N, n, p0, p1 and exhibit a test that achieves that lower bound. We do this both when p0 is known and unknown. We also consider the problem of testing in polynomial-time. As an aside, we consider the problem of detecting a clique, which is intimately related to the planted clique problem.

Ludo Waltman

Two challenges in community detection: Large networks and overlapping communities

Community detection in networks has received a lot of attention during the past decade. Most work has focused on small and medium-sizes networks, but there is also an increasing interest in community detection in large networks, consisting of many millions of nodes and edges. In this talk, we discuss community detection based on the optimization of modularity functions. We first define a general class of modularity functions, and we discuss the strengths and limitations of these functions. We then present an algorithm for modularity optimization in large networks. The algorithm is referred to as a smart local moving algorithm. We demonstrate that the algorithm outperforms the popular ‘Louvain algorithm’. Next, we discuss the problem of generalizing modularity functions to detect overlapping communities. We argue that so far no principled solution has been proposed to this problem. We then present a proposal for such a solution. We also present an algorithm for optimizing our proposed generalized modularity functions. Finally, we demonstrate the performance of our approach to overlapping community detection.

PRACTICAL INFORMATION

● Venue

Eurandom, Mathematics and Computer Science Dept, TU Eindhoven,

Den Dolech 2, 5612 AZ EINDHOVEN, The Netherlands

Eurandom is located on the campus of Eindhoven University of Technology, in the Metaforum building (4th floor) (about the building). The university is located at 10 minutes walking distance from Eindhoven main railway station (take the exit north side and walk towards the tall building on the right with the sign TU/e).
Accessibility TU/e campus and map.

● Contact

Mrs. Patty Koorn, Workshop Officer, Eurandom/TU Eindhoven, koorn@eurandom.tue.nl

European Institute for Statistics, Probability, Stochastic Operations Research
and their Applications

ORGANISERS

FRIDAY JAN 24

ABSTRACTS

PRACTICAL INFORMATION

● Venue

● Contact

SPONSORS

Last updated 09-04-14,
by PK

European Institute for Statistics, Probability, Stochastic Operations Research and their Applications

ORGANISERS

FRIDAY JAN 24

ABSTRACTS

PRACTICAL INFORMATION

● Venue

● Contact

SPONSORS

Last updated 09-04-14, by PK

European Institute for Statistics, Probability, Stochastic Operations Research
and their Applications

Last updated 09-04-14,
by PK