Graphical models and inference algorithms
Cumulative distribution networks
Traditional graphical models express a decomposition of a joint distribution as a product of functions of subsets of variables. However, in some situations such as when learning to rank documents in information retrieval or ranking regulatory elements in computational systems biology, it is more natural to think of each network function as providing evidence about possible orderings of its variables. We formalize this by introducing a new type of graphical model called a `cumulative distribution network' (CDN), which uses a product of functions of subsets of variables to describe the joint cumulative distribution (CDF). The conditional independence properties of CDNs are quite different from other graphical models. Unlike Bayesian networks, Markov networks and factor graphs, in a CDN the marginal distribution of any variable can be computed in constant time whilst allowing for complex conditional dependencies.
An example of a CDN defined over 3 variables X,Y,Z. The product of the local CDN functions yields the joint CDF over X,Y,Z.
The joint CDF of many variables arises in problems of learning to rank objects, such as in document retrieval and in discovering regulatory sequences in computational systems biology. The problem of learning to rank from examples has recently gained significant interest in the machine learning community, with importance similarities and differences with the canonical problems of regression and classification. Unlike standard regression or classification in which we predict outputs independently, in ranking we must predict structured outputs so that misranking one object affects whether we correctly rank the other objects. In practice, the problem of ranking involves a large number of objects to be ranked and either approximate structured prediction methods are required, or assumptions of independence between object scores must be made in order to make the problem tractable. CDNs provide a natural and flexible class of loss functionals for learning to rank which allow for a compact representation of the dependencies between pairwise object preferences involved in ranking. By viewing loss functionals as CDNs, we can unify many previous methods for learning to rank, which were all developed in different fields for different applications. The underlying probabilstic models can be viewed as particular instances of CDNs, so that CDNs allow one to explore of a broad class of flexible structured loss functionals for learning to rank.
Applications of this novel framework include document retrieval for web search [2] and ranking regulatory sequences in computational biology [1]. In the latter case, the formulation of the problem of discovering regulatory sequences as one of learning to rank allows us to integrate multiple types of datasets and diverse computational predictions under a single model without making explicit assumptions about relationships between different biological features and their impacts on orderings over sequences. The use of the CDN framework provides a unified probabilistic view of learning to rank and a tractable representation which can be learned using stochastic optimization methods.
References:
J.C. Huang and B.J. Frey. (2009) STORMSeq: A method for ranking regulatory sequences by integrating experimental datasets with diverse computational predictions
Submitted.
J.C. Huang and B.J. Frey. (2009) Structured ranking learning using cumulative distribution networks.
Advances in Neural Information Processing Systems (NIPS) 21. Supplementary File
J.C. Huang and B.J. Frey. (2008) Cumulative distribution networks and the derivativesumproduct algorithm.
Proceedings of the TwentyFourth Conference on Uncertainty in Artificial Intelligence (UAI).
Factor graphs and the sumproduct algorithm
A factor graph (Kschischang, Frey, Loeliger, IEEE Trans Info Theory 2001) is a graphical model that more clearly decomposes interactions between variables, compared to Bayesian networks (BNs) and Markov random fields (MRFs). While functional relationships between variables in BNs and MRFs must be determined by identifying parentchild clusters or maximal cliques, factor graphs explicitly identify functional relationships. Consequently, factor graphs are more useful for describing models that involve a large number of overlapping relationships between variables. Areas where factor graphs are now being used include: gene interaction networks, lowlevel vision, signal processing, errorcorrecting codes and nSAT (satisfiability) problems. In each of these applications, a large number of sets of interacting variables are subsets of other sets of interacting variables, so that many of the relationships are hidden in the conditional probability tables and the clique potentials used in BNs and MRFs. In contrast, every relationship is identified by a function node in a factor graph. The following figure shows a simple example of a factor graph (a) that describes a function on x, y and z of the form f(x,y)g(y,z)h(z,x). To describe these relationships, the MRF on the same 3 variables must have the form shown in (b), but this MRF indicates a functional form m(x,y,z), which does not convey information about the decomposition f(x,y)g(y,z)h(z,x). The BN on the same 3 variables must have the form shown in (c) (in fact, there are 3 forms that are the same up to an exchange of variables). However, this BN indicates a functional form b(x)q(y)s(x,y,z), which does not identify the correct decomposition.
While this example is quite trivial, it illustrates the kind of problem that arises in largerscale models. For example, in a gene interaction network or an errorcorrecting code, the overall complexity of the interactions may be so large that the BN or MRF is fullyconnected. In contrast, the factor graph may be sparselyconnected. Additional dummy variables may be added to MRFs and BNs so that they can emulate the variablefunction decomposition of factor graphs, but since there is no common notation for doing this, the resulting graphical models are not easily interpreted.
Another advantage of factor graphs is that optimization algorithms for inferring the values of hidden variables can take advantage of the extensive decomposition of relationships between variables. In particular, the sumproduct algorithm (a.k.a. belief propagation, loopy belief propagation, etc.) and its maxproduct or minsum variant are naturally described as messagepassing algorithms between the variable nodes and function nodes in the factor graph. The message update rules for BNs and MRFs are more complex than for factor graphs. More importantly, if a BN or MRF is not extended by including dummy variables, optimization in the BN or MRF can (and in many applications does) produce significantly worse results than in the factor graph.
Until recently, the main advantage of BNs and MRFs over factor graphs was their ability to succinctly describe conditional and marginalindependence relationships. However, an extension of the factor graph notation to include directed edges, has enabled factor graphs to express all conditional and marginalindependence relationships that can be expressed by BNs and MRFs. Interestingly, the rules for determining independence in a directed factor graph can be viewed as the intersection of the rules for BNs and MRFs, so that the resulting set of rules is simpler, than for BNs or MRFs on there own. The following figure shows an example of a BN (a), its factor graph (b) and its directed factor graph (c). In (d), this directed factor graph is modified to capture an undirected relationship between u and v, a relationship that cannot be expressed by a BN. The MRF corresponding to this modified directed factor graph is shown in (e). The MRF captures the undirected relationship between u and v, but does not capture the directed relationships in the other part of the model. The MRF in (e) is converted to a factor graph in (f), which further illustrates that the MRF has lost information.
Interestingly, it turns out there are useful probability models whose independencies can be exactly expressed by directed factor graphs, but which cannot be expressed by any BN or MRF. It turns out that factor graphs for a strict superset of BNs and MRFs, both in terms of expressing factorization and expressing independencies.
References:

F. R. Kschischang, B. J. Frey, H. A. Loeliger 2001 Factor graphs and the sumproduct algorithm, IEEE Transactions on Information Theory 47:2, 498519, February. [PDF]

B. J. Frey 2003 Extending factor graphs so as to unify directed and undirected graphical models, in Proceedings of Conference on Uncertainty in Artificial Intelligence 19 (UAI 03), Morgan Kaufmann, CA, Acapulco, Mexico, August