Graphical models and inference algorithms

Cumulative distribution networks

Traditional graphical models express a decomposition of a joint distribution as a product of functions of subsets of variables. However, in some situations such as when learning to rank documents in information retrieval or ranking regulatory elements in computational systems biology, it is more natural to think of each network function as providing evidence about possible orderings of its variables. We formalize this by introducing a new type of graphical model called a `cumulative distribution network' (CDN), which uses a product of functions of subsets of variables to describe the joint cumulative distribution (CDF). The conditional independence properties of CDNs are quite different from other graphical models. Unlike Bayesian networks, Markov networks and factor graphs, in a CDN the marginal distribution of any variable can be computed in constant time whilst allowing for complex conditional dependencies.


An example of a CDN defined over 3 variables X,Y,Z. The product of the local CDN functions yields the joint CDF over X,Y,Z.

To compute conditional cumulative distributions in the network, we have developed the derivative-sum-product algorithm in which messages correspond to local conditional CDFs obtained from computing derivatives of the joint cumulative distribution. Due to the unique statistical independence properties of the CDN, these messages do not in general have a one-to-one correspondence with messages exchanged in standard algorithms, such as belief propagation.
Messages passed in the derivative-sum-product (DSP) algorithm for inference in CDNs. The DSP messages correspond to derivatives of the joint CDF with respect to local sets of variables, just as messages in standard belief propagation correspond to summations/integrals of the joint probability function of sets of variables.

 

Structured ranking learning using cumulative distribution networks

The joint CDF of many variables arises in problems of learning to rank objects, such as in document retrieval and in discovering regulatory sequences in computational systems biology. The problem of learning to rank from examples has recently gained significant interest in the machine learning community, with importance similarities and differences with the canonical problems of regression and classification. Unlike standard regression or classification in which we predict outputs independently, in ranking we must predict structured outputs so that misranking one object affects whether we correctly rank the other objects. In practice, the problem of ranking involves a large number of objects to be ranked and either approximate structured prediction methods are required, or assumptions of independence between object scores must be made in order to make the problem tractable. CDNs provide a natural and flexible class of loss functionals for learning to rank which allow for a compact representation of the dependencies between pairwise object preferences involved in ranking. By viewing loss functionals as CDNs, we can unify many previous methods for learning to rank, which were all developed in different fields for different applications. The underlying probabilstic models can be viewed as particular instances of CDNs, so that CDNs allow one to explore of a broad class of flexible structured loss functionals for learning to rank.

The structured ranking learning framework using cumulative distribution networks. Given a set of observations consisting of order graphs over nodes to be ranked, we wish to learn a ranking function which assigns a score to any given node in an order graph. Preferences between nodes can be represented as random variables in a CDN, so that the joint CDF over the preferences is the probability of generating the observed orderings in the observations. Learning a ranking function under a structured loss functional specified by the CDN is then equivalent to maximizing the probability of generating observed orderings.

Applications of this novel framework include document retrieval for web search [2] and ranking regulatory sequences in computational biology [1]. In the latter case, the formulation of the problem of discovering regulatory sequences as one of learning to rank allows us to integrate multiple types of datasets and diverse computational predictions under a single model without making explicit assumptions about relationships between different biological features and their impacts on orderings over sequences. The use of the CDN framework provides a unified probabilistic view of learning to rank and a tractable representation which can be learned using stochastic optimization methods.

Converting an order graph to a cumulative distribution network. The order graph shown over nodes V1,V2,V3,V4 encodes the set of preferences V1 > V2, V1 > V3, V1 > V4, V2 > V4, V3 > V4. Each of these preferences is then assigned to a random variable in a CDN. The connectivity of the CDN then allows one to express statistical dependencies between preference variables which arise when learning to rank many objects.

 

A walkthrough of cumulative distribution networks and their applications

Click here for more info...



 

References:

J.C. Huang and B.J. Frey. (2009) STORMSeq: A method for ranking regulatory sequences by integrating experimental datasets with diverse computational predictions
Submitted.

J.C. Huang and B.J. Frey. (2009) Structured ranking learning using cumulative distribution networks.
Advances in Neural Information Processing Systems (NIPS) 21. Supplementary File

J.C. Huang and B.J. Frey. (2008) Cumulative distribution networks and the derivative-sum-product algorithm.
Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI).

Factor graphs and the sum-product algorithm

A factor graph (Kschischang, Frey, Loeliger, IEEE Trans Info Theory 2001) is a graphical model that more clearly decomposes interactions between variables, compared to Bayesian networks (BNs) and Markov random fields (MRFs). While functional relationships between variables in BNs and MRFs must be determined by identifying parent-child clusters or maximal cliques, factor graphs explicitly identify functional relationships. Consequently, factor graphs are more useful for describing models that involve a large number of overlapping relationships between variables. Areas where factor graphs are now being used include: gene interaction networks, low-level vision, signal processing, error-correcting codes and n-SAT (satisfiability) problems. In each of these applications, a large number of sets of interacting variables are subsets of other sets of interacting variables, so that many of the relationships are hidden in the conditional probability tables and the clique potentials used in BNs and MRFs. In contrast, every relationship is identified by a function node in a factor graph. The following figure shows a simple example of a factor graph (a) that describes a function on x, y and z of the form f(x,y)g(y,z)h(z,x). To describe these relationships, the MRF on the same 3 variables must have the form shown in (b), but this MRF indicates a functional form m(x,y,z), which does not convey information about the decomposition f(x,y)g(y,z)h(z,x). The BN on the same 3 variables must have the form shown in (c) (in fact, there are 3 forms that are the same up to an exchange of variables). However, this BN indicates a functional form b(x)q(y)s(x,y,z), which does not identify the correct decomposition.

factor graphs, MRF, and Bayes nets

While this example is quite trivial, it illustrates the kind of problem that arises in larger-scale models. For example, in a gene interaction network or an error-correcting code, the overall complexity of the interactions may be so large that the BN or MRF is fully-connected. In contrast, the factor graph may be sparsely-connected. Additional dummy variables may be added to MRFs and BNs so that they can emulate the variable-function decomposition of factor graphs, but since there is no common notation for doing this, the resulting graphical models are not easily interpreted.

Another advantage of factor graphs is that optimization algorithms for inferring the values of hidden variables can take advantage of the extensive decomposition of relationships between variables. In particular, the sum-product algorithm (a.k.a. belief propagation, loopy belief propagation, etc.) and its max-product or min-sum variant are naturally described as message-passing algorithms between the variable nodes and function nodes in the factor graph. The message update rules for BNs and MRFs are more complex than for factor graphs. More importantly, if a BN or MRF is not extended by including dummy variables, optimization in the BN or MRF can (and in many applications does) produce significantly worse results than in the factor graph.

Until recently, the main advantage of BNs and MRFs over factor graphs was their ability to succinctly describe conditional- and marginal-independence relationships. However, an extension of the factor graph notation to include directed edges, has enabled factor graphs to express all conditional- and marginal-independence relationships that can be expressed by BNs and MRFs. Interestingly, the rules for determining independence in a directed factor graph can be viewed as the intersection of the rules for BNs and MRFs, so that the resulting set of rules is simpler, than for BNs or MRFs on there own. The following figure shows an example of a BN (a), its factor graph (b) and its directed factor graph (c). In (d), this directed factor graph is modified to capture an undirected relationship between u and v, a relationship that cannot be expressed by a BN. The MRF corresponding to this modified directed factor graph is shown in (e). The MRF captures the undirected relationship between u and v, but does not capture the directed relationships in the other part of the model. The MRF in (e) is converted to a factor graph in (f), which further illustrates that the MRF has lost information.

factor graphs and directed factor graphs

Interestingly, it turns out there are useful probability models whose independencies can be exactly expressed by directed factor graphs, but which cannot be expressed by any BN or MRF. It turns out that factor graphs for a strict super-set of BNs and MRFs, both in terms of expressing factorization and expressing independencies.

References:

  • F. R. Kschischang, B. J. Frey, H. A. Loeliger 2001 Factor graphs and the sum-product algorithm, IEEE Transactions on Information Theory 47:2, 498-519, February. [PDF]

  • B. J. Frey 2003 Extending factor graphs so as to unify directed and undirected graphical models, in Proceedings of Conference on Uncertainty in Artificial Intelligence 19 (UAI 03), Morgan Kaufmann, CA, Acapulco, Mexico, August