Traditional graphical models express a decomposition of a joint distribution as a product of functions of subsets of variables. However, in some situations such as when learning to rank documents in information retrieval or ranking regulatory elements in computational systems biology, it is more natural to think of each network function as providing evidence about possible orderings of its variables. We formalize this by introducing a new type of graphical model called a `cumulative distribution network' (CDN), which uses a product of functions of subsets of variables to describe the joint cumulative distribution (CDF). The conditional independence properties of CDNs are quite different from other graphical models. Unlike Bayesian networks, Markov networks and factor graphs, in a CDN the marginal distribution of any variable can be computed in constant time whilst allowing for complex conditional dependencies.

The joint CDF of many variables arises in problems of learning to rank objects, such as in document retrieval and in discovering regulatory sequences in computational systems biology. The problem of learning to rank from examples has recently gained significant interest in the machine learning community, with importance similarities and differences with the canonical problems of regression and classification. Unlike standard regression or classification in which we predict outputs independently, in ranking we must predict structured outputs so that misranking one object affects whether we correctly rank the other objects. In practice, the problem of ranking involves a large number of objects to be ranked and either approximate structured prediction methods are required, or assumptions of independence between object scores must be made in order to make the problem tractable. CDNs provide a natural and flexible class of loss functionals for learning to rank which allow for a compact representation of the dependencies between pairwise object preferences involved in ranking. By viewing loss functionals as CDNs, we can unify many previous methods for learning to rank, which were all developed in different fields for different applications. The underlying probabilstic models can be viewed as particular instances of CDNs, so that CDNs allow one to explore of a broad class of flexible structured loss functionals for learning to rank.

Applications of this novel framework include document retrieval for web search [2] and ranking regulatory sequences in computational biology [1]. In the latter case, the formulation of the problem of discovering regulatory sequences as one of learning to rank allows us to integrate multiple types of datasets and diverse computational predictions under a single model without making explicit assumptions about relationships between different biological features and their impacts on orderings over sequences. The use of the CDN framework provides a unified probabilistic view of learning to rank and a tractable representation which can be learned using stochastic optimization methods.
References:
J.C. Huang and B.J. Frey. (2009) STORMSeq: A method for ranking regulatory sequences by integrating experimental datasets with diverse computational predictions
Submitted.
J.C. Huang and B.J. Frey. (2009) Structured ranking learning using cumulative distribution networks.
Advances in Neural Information Processing Systems (NIPS) 21. Supplementary File
J.C. Huang and B.J. Frey. (2008) Cumulative distribution networks and the derivative-sum-product algorithm.
Proceedings of the Twenty-Fourth Conference on Uncertainty in Artificial Intelligence (UAI).