 This event has passed.
Statistics for Complex Networks: Theory and Applications
Jan 30, 2013  Feb 1, 2013
Summary
The dramatic improvement in data collection and acquisition technologies in the last decades has allowed for the monitoring and study of complex systems, such as biological, social and computer networks. The extremely complex and highdimensional nature of these systems, and the growth in dataset sizes give rise to important research questions: how to perform meaningful statistical inference from very large and potentially corrupted complex data sets? Which properties of these complex systems can be inferred from such data? Can sound inference methodologies be also made computationally feasible? These and other questions have recently attracted the attention of a large number of researchers worldwide and, although important progress has been made in recent years, there are still many open and emerging problems in the general area of statistical inference in complex and highdimensional systems.
The workshop will include a sequence of talks that illustrate advances in the theory and application of stochastic network models and statistical tools motivated by the study of complex networked systems. To promote strong and lasting collaboration between the workshop participants there will be ample time in the program for informal discussions.
This workshop is immediately preceded by the YES workshop on Statistics for Complex Networks and High Dimensional Systems. The YES workshop consists of tutorial courses and provides a solid introduction for this workshop on Statistics for Complex Networks.
Sponsors
Organizers
Fetsje Bijma  VU Amsterdam 
Rui Castro  TU Eindhoven 
Mathisca de Gunst  VU Amsterdam 
Speakers
Peter Buehlman  ETH Zurich 
Venkat Chandrasekaran  California Institute of Technology 
Geert Geven  Hubrecht Institute 
Alfred Hero  University of Michigan 
Eric Kolaczyk  Boston University 
Marloes Maathuis  ETH Zurich 
Karl Rohe  University of Wisconsin  Madison 
Kees Stam  Neuroscience Campus Amsterdam 
Tom Thorne  Imperial College London 
Martin Wainwright  University of California 
Lourens Waldorp  Uninversity of Amsterdam 
Ernst Wit  University of Groningen 
Program

Abstracts
Invited Speakers
Peter Bühlmann and Marloes Maathuis (ETH Zurich)
Highdimensional Graphical Models and Causal Inference
Understanding causeeffect relationships between variables is of great interest in many fields of science. An ambitious but highly desirable goal is to infer causal effects from observational data obtained by observing a system of interest without subjecting it to interventions. This would allow to circumvent severe experimental constraints or to substantially lower experimental costs. Our main motivation to study this goal comes from applications in biology.
We present recent progress for prediction of causal effects with direct implications on designing new intervention experiments, particularly for highdimensional, sparse settings with thousands of variables but based on only a few dozens of observations. In particular, we present the IDA algorithm to estimate bounds on causal effects. An important step in the IDA algorithm consists of learning the underlying causal graph, and we pay special attention to this step. First, we discuss the PC algorithm and its assumptions, as well as modifications of this algorithm that are more stable in highdimensional settings and/or allow for unmeasured variables. Second, we discuss penalized likelihood approaches. We point out that the latter framework requires different assumptions than the PC algorithm, and can be used to learn graphs from a combination of observational and interventional data.
Our talks will highlight exciting possibilities and fundamental limitations. In view of the latter, statistical modeling should be complemented with experimental validations: we discuss this in the context of molecular biology for yeast (Saccharomyces Cerevisiae) and the model plant Arabidopsis Thaliana.
Venkat Chandrasekaran (California Institute of Technology)
Latent Variable Graphical Model Selection via Convex Optimization
Suppose we have a Gaussian graphical model with sample observations of only a subset of the variables. Can we separate the extra correlations induced due to marginalization over the unobserved, hidden variables from the structure among the observed variables? In other words is it still possible to consistently perform model selection despite the unobserved, latent variables? As we shall see the key problem that arises is one of decomposing the concentration matrix of the observed variables into a sparse matrix (representing graphical model structure among the observed variables) and a low rank matrix (representing the effects of marginalization over the hidden variables). Such a decomposition can be accomplished by an estimator that is given by a tractable convex program. This estimator performs consistent model selection in the highdimensional scaling regime in which the number of observed/hidden variables grows with the number of samples of the observed variables. The geometric aspects of our approach are highlighted, with the algebraic varieties of sparse matrices and of lowrank matrices playing an important role.
(joint work with Pablo Parrilo and Alan Willsky)
Geert Geeven (Hubrecht Institute)
Nuclear organization in the 3D space  systematic mapping of networks of intrachromosomal and interchromosomal interactions
The spatial organization of chromosomes in the nucleus is considered to play an important role in nuclear processes. Over the past 10 years, the development of chromosome conformation capture (3C) technology and the subsequent genomic variants thereof have enabled the analysis of nuclear organization at an unprecedented resolution and throughput. We routinely use 3C based techniques in our lab to systematically map longrange intrachromosomal and interchromosomal interactions, which allows us to identify functional events like promoterenhancer interactions. I willl explain some of the data analysis problems involved and try to make a link between 3D nuclear organization and transcriptional gene networks.
Alfred Hero (University of Michigan)
High dimensional dependency network analysis with limited data
Dependency networks arise in many areas of engineering, social sciences, and natural sciences. For example, when rows of a random matrix are successive samples of a multivariate response, the sample correlation between the columns can reveal important dependency structure in the multivariate response, e.g., stars, hubs and triangles of codependency. However, when the number of samples is finite and the number p of columns increases such exploration becomes futile due to a phase transition phenomenon: spurious discoveries will eventually dominate. In this presentation I will present theory for predicting these phase transitions and present Poisson limit theorems that can be used to predict finite sample behavior of such correlation and partial correlation structure. I will also present extensions of this theory to multistage regression over a sparse network of dependencies.
(joint work with Bala Rajaratnam)
Eric Kolaczyk (Boston University)
Networkbased Statistical Models and Methods for Identification of Cellular Mechanisms of Action
Identifying biological mechanisms of action (e.g. biological pathways) that control disease states, drug response, and altered cellular function is a multifaceted problem involving a dynamic system of biological variables that culminate in an altered cellular state. The challenge is in deciphering the factors that play key roles in determining the cell's fate. In this talk I will describe some of the efforts by our group to develop statistical models and methods for identification of cellular mechanisms of action. More specifically, we assume gene expression data and treat the problem of determining mechanisms of action under perturbation (e.g., drug treatment, gene knockout, etc.) as a type of inverse problem. I will describe three approaches to solving this inverse problem. The first attempts to use only the gene expression data and to `filter' that data by an inferred network of gene regulatory interactions. The other two  one testingbased and the other regressionbased  use gene expression data in conjunction with information from biological databases. More specifically, gene expression is modeled as deriving from a perturbed latent network of pathways, where the interconnections among pathways is informed by shared biological function. Illustrations are given in the context of yeast experiments and human cancer.
Karl Rohe (University of Wisconsin)
Highest dimensional models; for transitivity in the asymptote
The Stochastic Blockmodel provides a statistical framework to study network clustering algorithms as statistical estimators. Extant literature has studied several estimators under an asymptotic
setting where the number of clusters (k) is fixed, or grows much more slowly than the number of nodes (n). This talk will 1) discuss why this asymptotic setting fails to align with most empirical networks, 2) propose a "highest dimensional Stochastic Blockmodel," where the number of clusters grows proportionally to the number of nodes and 3)propose a regularized version of the MLE that it is weakly consistent. Time permitting, algorithmic considerations will be addressed.
(joint work with Tai Qin, Haoyang Fan)
Kees Stam (VU Amsterdam)
The brain: a complex networks perspective
The human brain is one of the most complex objects known to man. One of the most challenging scientific problems is how the 1010 neurons interact in such a way that they can bring about normal brain function, and how this process becomes disrupted in neurological and psychiatric disease. With the introduction of models of smallworld and scalefree networks graph theory has become a very promising tool for the study of structural and functional networks in the brain (Stam and van Straaten, 2012). In less than a decade of research it has become clear that normal brain networks are characterized by smallworld and scalefree structure, the presence of hubs and hierarchical modularity, rich club backbones and assortative mixing. These topological features of brain networks arise during development under strong genetic control and are strongly correlated with cognitive measures such as IQ. In neurological and psychiatric disorders a loss of the optimal balance between local connectivity and global integration has been demonstrated. For instance, in Alzheimer’s disease network topology becomes more random and hub structures are preferentially destructed. Some of these phenomena can now be explained at the level of exact models of largescale structural and functional brain networks (De Haan et al., 2012). Very likely, similar explanations will also be found for other conditions such as epilepsy, brain tumours, neuropsychiatric conditions and Parkinson’s disease. These results underscore the value of a complex networks perspective for neuroscience.
References:
Stam CJ, van Straaten EC. The organization of physiological brain networks.
Clin Neurophysiol. 2012 Jun;123(6):106787.
de Haan W, Mott K, van Straaten EC, Scheltens P, Stam CJ. Activity dependent
degeneration explains hub vulnerability in Alzheimer's disease. PLoS Comput Biol.
2012;8(8):e1002582.
Tom Thorne (Imperial College London)
Going beyond static networks
In the field of Systems Biology we are often faced with the task of performing inference that involves network structures, one particular example being the inference of gene regulatory network structures. We might be interested in changes to the network structure in a single time series, between differing conditions or even on an evolutionary time scale. To this end we employ Bayesian nonparametrics to provide the flexibility to allow for such changes in the network structure. By doing so we can infer these changes from the data, whilst also combining information from multiple sources to derive more accurate predictions. We use graphical models to represent the regulatory network structure, and consider different graphical modeling schemes, comparing their relative merits from a computational and modeling point of view.
Martin Wainwright (UC Berkeley)
Structure estimation for discrete graphical models: Generalized covariance matrices and their inverses
We investigate a curious relationship between the structure of a discrete graphical model and the support of the inverse of a generalized covariance matrix. We show that for certain graph structures, the support of the inverse covariance matrix of indicator variables on the vertices of a graph reflects the conditional independence structure of the graph. Our work extends results that have previously been established only in the context of multivariate Gaussian graphical models, thereby addressing an open question about the significance of the inverse covariance matrix of a nonGaussian distribution. The proof exploits a combination of ideas from the geometry of exponential families, junction tree theory, and convex analysis. These populationlevel results have various consequences for graph selection methods, both known and novel, including a novel method for structure estimation for missing or corrupted observations.
(joint work with PoLing Loh: http://arxiv.org/abs/1212.0478)
Lourens Waldrop (University of Amsterdam)
Networks in psychology and neuroscience
Regarding networks as fundamental blocks of analysis has only recently found its way to psychology and neuroscience. This has proved successful in explaining, for instance, why the disorders of depression and anxiety often coincide, and that efficient brains can lead to higher intelligence scores. However, many of these results are based on largescale networks obtained from simple correlation measures to determine connectivity between symptoms or brain regions. Especially in neuroscience the use of simple or pairwise correlations is pervasive. Using pairwise correlations to estimate a largescale network (> 1000 nodes) from functional magnetic resonance imaging data generally results in a poor representation of the true underlying network. The reason is that pairwise correlations cannot distinguish between direct and indirect connectivity. As a result pairwise correlation networks can lead to fallacious conclusions; for example, one may conclude that a network is a smallworld when it is not. We have compared the performance of pairwise correlations in largescale networks (2000 nodes) against three other methods that are designed to filter out indirect connections in a simulation study and restingstate data. Recovery methods were evaluated in four simulated network structures (small world or not, scalefree or not) in scenarios where the number of observations is very small compared to the number of parameters. Simulations clearly show that pairwise correlation networks are fragmented into separate unconnected components with excessive connectedness within components. This often leads to erroneous estimates of network metrics, like smallworld structures or low betweenness, and produces too many lowdegree nodes. We conclude that using partial correlations, informed by a sparseness penalty, results in more accurate networks and corresponding metrics than pairwise correlation networks. Additionally, we show for restingstate fMRI that partial correlations are more robust to different parcellation sets and to different lengths of timeseries.
Ernst Wit (University of Groningen)
Sparse Graphs (from noisy data, obviously)
From genetics to finance, from homeland security to epidemiology the objective of several dataintensive studies is to infer the relationships between various actors under scrutiny. A graph is one possible way to describe these relationships. In many cases, the data comes from large monotoring systems with no prior screening. The actual set of relationships, therefore, tends to be sparse.
When data is obtained from noisy measurements of (some of) the nodes in the graph, then graphical models present an appealing and insightful way to describe graphbased dependencies between the random variables. Although potentially still interesting, the main aim of inference is not the precise estimation of the parameters in the graphical model, but the underlying structure of the graph.
Graphical lasso and related methods opened up the field of sparse graphical model inference in highdimensions. We show how extensions of such methods in more structured settings can improve interpretation. Moreover, in this presentation, we show how novel model selection criteria can deal with the determining the underlying graph in an efficient way.
Contributed Speakers
J.B. Leger, J.J. Daudin (1,2), C. Vacher
Comparison of clustering methods for graph analysis
Unsupervised classification of graph nodes is a growing topic with applications
in many areas.
Two definitions of clusters are possible.
The first definition, called community, is the most commonly used. It defines a
cluster as a set of nodes strongly linked inside the set, and poorly outside the
set.
The second definition, named structurally homogeneous subsets, defines a cluster
as a set of nodes which have a similar role in the graph structure. With this
definition two unconnected nodes may pertain to the same cluster if their
connectivity behavior with the other nodes are similar.
The structurally homogeneous definition includes the community definition as a
particular case and allows to recover other graph structures such as hubs or
bipartite organization.
The methods for community detection have recently been reviewed by Fortunato (Physics Reports, 2010). We extend his work in this communication with a review of methods for clustering graphs using the definition of structurally homogeneous subsets. The review includes the Markov Cluster Algorithm, different spectral clustering algorithms, modularity maximization, edgebetweeness and
statistical models such as the stochastic block model. This review underlines the difference between communities and structurally homogeneous subsets. A benchmark has been made for comparing the clustering methods. It is based on an ecological model of simulation generating binary graphs. The results of the benchmark is presented. The efficiency of clustering is measured by the Rand index between the true and the estimated clusters and some elements are given about the computing efficiencies of the algorithms which is important for dealing with large graphs.
Antoine Channarond, JeanJacques Daudin, Stéphane Robin
Clustering in a random graph model with a latent space
One way to add heterogeneity to networks consists in allocating positions $Z$ to the nodes in an unobserved latent space. In our approach the edge between nodes $i$ and $j$ has a probability $f_n(d(Z_i,Z_j))$ to appear, where $n$ is the graph size, $d$ is a distance of the latent space and $f_n$ is a decreasing probability function. Thus an edge is even more likely when the nodes are close in the latent space.
Hoff, Raftery and Handcock (2002) provide methods of unsupervised node classification and parameter estimation. While they assume the density of the positions to be distributed according to a finite Gaussian mixture, we are interested in a nonparametric framework, so that there is no predefined constraint on the density shape or clustering structure. In this setting, the clusters are defined as connected components of a level set of the density (Hartigan, 1975). The question is to test whether the density is clustered, and more generally, what can be inferred from the observed graph about the clustering structure of the position density in the latent space.
We propose a simple algorithm for the estimation of the number of such clusters, which generalizes the procedure of Biau, Cadre and Pelletier (2007). It extracts a graph induced by the observed graph, which is proved to have as many connected components as the density has clusters, with probability tending to one when n goes to infinity. The algorithm is linear with respect to the number of edges and hence can process large graphs. A simulation study is carried out to illustrate the behavior of the algorithm as a function of n. In addition to that, theoretical arguments are provided to prove the consistency.
Wessel van Wieringen, Mark van de Wiel
Penalized differential pathway analysis of integrative oncogenomics studies
Through integration of genomic data from multiple sources, we may obtain a more accurate and complete picture of the molecular mechanisms underlying tumorigenesis. We discuss the integration of DNA copy number and mRNA gene expression data from an observational integrative genomics study involving cancer patients. The two molecular levels involved are linked through the central dogma of molecular biology.
DNA copy number aberrations abound in the cancer cell. Here we investigate how these aberrations affect gene expression levels within a pathway using observational integrative genomics data of cancer patients. In particular, we aim to identify differential edges between regulatory networks of two groups involving these molecular levels.
Motivated by the rate equations, the interplay between DNA copy number aberrations and gene expression levels within a pathway is modeled by a simultaneousequations model. This is introduced for the onesample case. The model is fitted by means of penalized least squares using the lasso to achieve a sparse network topology. The simultaneousequations model is extended to the twosample situation. In the twosample case the topology of the regulatory network is allowed to differ between the two groups, as such facilitating the discovery of differential interactions. In the estimation of this extended model, the fused lasso penalty is included to regulate the number of differential interactions.
In the application to real data from integrative oncogenomic studies we show that inclusion of prior information on the regulatory network architecture benefits the reproducibility of all edges. Furthermore, analysis of the TP53 signalling pathway between ER+ and ER samples from an integrative genomics breast cancer study identifies a.o. differential regulation of the IGF complex. This corroborates with existing literature: the IGF complex is known to crosstalk with ER.
Eugen Pircalabelu, Gerda Claeskens, Lourens Waldorp
Model selection for Graphs using the Focused Information Criterion
A new method for model selection for Gaussian directed acyclic graphs (DAG) and Markov networks, with extensions towards ancestral graphs, is constructed to have good mean squared error properties.
The method is based on the focused information criterion (FIC) and unlike the traditional AIC or BIC, the FIC allows for selecting individual models, tailored to a specific purpose (the focus), as opposed to attempting an identification of a single model that should be used for all purposes.
For any particular node in a graph we start by defining a focus parameter μ as a function of the parameters of the model density, and potentially of a userspecified vector of covariate values. The focus is intended to be estimated as precisely (in the MSE sense) as possible. The FIC estimates MSE(μˆS) for each of a collection of models S, and selects the model with the smallest such value.
The focus of the research, that is, the purpose of the model, directs the selection and different focuses may thus lead to different selected models. In this way, one can obtain better selected models in terms of MSE than obtained from a global model search. For this particular application of FIC for structure learning in graphs, one example of focus is the expected value of a variable, reflecting interest in discovering a topology of the graph that produces a low MSE for this expected value.
An analysis on real and simulated data suggests that the FIC scoring approach is able to identify graphical models with better MSE and mean squared prediction values than obtained from existing methods, such hillclimbing using AIC and BIC, PC, SIN and the GLasso algorithm.
Alain Celisse, JeanJacques Daudin, Laurent Pierre
Consistency of maximumlikelihood and variational estimators in the stochastic block model
Networks and graphs have become a more and more popular tool in many scientific communities to describe objects and their relationships. For instance in biology, networks with hundreds or thousands of nodes are commonly used to describe regulatory pathways of genes. Moreover network complexity as well as large scale data have strengthened the need for (i) flexible and meaningful models, and for (ii) computationally efficient algorithms for making inference.
The stochastic block model (SBM) is a probabilistic model on graphs designed to allow heterogeneity in the betweenedge connectivity unlike the famous ErdosRenyi graph. Although SBM turns out to be powerful in describing the complex structure of real networks and providing some insight, inferring its parameters requires the use of variational approximation. However only too few theoretical results quantify the true performance of resulting variational estimators.
In the present work, we address the asymptotic inference in SBM by use of maximumlikelihood and variational approaches. The identifiability of SBM parameters is proved while asymptotic properties of maximumlikelihood and variational estimators are derived. In particular, the consistency of these estimators is settled for the probability of an edge between two vertices (and for the group proportions at the price of an additional assumption).
Jonas Peters
Recovering the Graph Structure of Restricted Structural Equation Models
Causal inference tackles the following problem: given iid observational data from a joint distribution, one tries to infer the underlying causal graph. This graph contains a directed arrow from each variable to its direct effects and is assumed to be acyclic.
Independencebased methods like the PC algorithm assume the Markov condition and faithfulness. These two conditions relate conditional independences and the graph structure; they allow us to infer properties of the graph by testing for conditional independences in the joint distribution. Those methods, however, can discover causal structures only up to Markov equivalence classes. Furthermore, conditional independence testing is very difficult in practice.
In structural equation models (SEMs) each variable is assumed to be a deterministic function of its direct causes and some noise variable (e.g. Z=f(X,Y,N)), and the noise variables are assumed to be jointly independent. SEMs display their full strength when making constraints on the function class. In additive noise models the structural equations are of the form Z=f(X,Y)+N. The subclass of linear functions and additive Gaussian noise does not lead to identifiability. This, however, constitutes an exceptional setting. If one assumes either (i) nonGaussian noise, (ii) nonlinear functions in the SEM or (iii) all noise variables to have the same variance, one can show that additive noise models are identifiable, i.e. given an observational distribution P, we can recover the underlying causal graph.
Based on these theoretical findings, we develop a causal inference method that does not require faithfulness and can identify causal relationships even within an equivalence class. One can further reason that this procedure allows for a model check. Therefore, the method may remain undecided. We present results on both synthetic and real data sets.
Alain Hauser
Learning causal models from interventions: inference and active learning
Causal relationships between random variables are commonly modeled using
DAGs, directed acyclic graphs. Causal models based on DAGs can provide
statements about
(1) conditional independence relations among the involved random
variables; and
(2) effects of interventions, that is, perturbations of the system in
which one or several random variables are forced to specific values.
DAGs can only be inferred up to Markov equivalence from observational data, that is, data produced by the undisturbed system. On the other hand, intervention effects in general differ between different DAGs, even if they are (observationally) Markov equivalent. This fact is the
motivation for estimating causal models from interventional data, that is, data produced under one or several interventions. We show to which extent interventional data arising from different interventions improves the identifiability of causal models. For this aim, we extend the notion of Markov equivalence of DAGs to the interventional case and present a graph theoretic characterization of the corresponding equivalence classes.
Furthermore, we consider Gaussian causal models and address the problem of calculating the maximum likelihood estimator from interventional data. We present a generalization of Chickering's Greedy Equivalence Search algorithm to interventional data that makes regularized maximum likelihood estimation computationally feasible. We demonstrate the performance of this algorithm in a simulation study. Finally, we address the question of active learning, that is, the question of finding interventions that lead to "optimal" identifiability of causal models.
Both the greedy search algorithm and the active learning approaches are based on the graph theoretic characterization of interventional Markov equivalence classes.
A. Lotsi
High dimensional Sparse Gaussian Graphical Mixture Model
We consider the problem of variable selection in Gaussian Graphical Mixture Model (GGMM) in high dimensional clustering set up. However, parameter estimation for this set up can be challenging due to large number of parameters that need to be estimated. We propose a penalized maximum likelihood technique by imposing an $l_{1}$ penalty on the precision matrix. Our approach shrinks the covariance matrices thereby resulting in better clustering and variable selection. We adopt an Expectation Maximization (EM) algorithm through the graphical least absolute shrinkage and selection operator (GLASSO) to estimate the mixing coefficients and the precision matrices. Models are compared with Extended Bayesian Information Criteria (EBIC). We demonstrate the performance of our work on a simulated data and we show the utility of our method for high dimensional data analysis.