The contact information here is out of date — for current
info see www.cs.columbia.edu/~delbert.

Delbert Dueck
Ph.D. candidate with:
Probabilistic and Statistical Inference
group
(supervisor: Prof.
Brendan J. Frey)
Office:
BA 4165 (Bahen
Centre for Information Technology)
e-mail:
d
e l
b e
r t
@ p
s i
. t
o r
o n
t o
. e
d u
tel: +1 416 946 8809
Affiliation:
Electrical and Computer Engineering
University of Toronto
10 King's College Road
Toronto, ON M5S 3G4
Canada
Latest Research: Affinity Propagation
Here is my Ph.D. thesis, Affinity Propagation: Clustering Data by Passing Messages (smaller viewable PDF, large printable ZIP)
How would you identify a small number of face images that together accurately represent a data set of face images? How would you
identify a small number of sentences that accurately reflect the content of a document? How would you identify a small number of cities that are
most easily accessible from all other cities by commercial airline? How would you identify segments of DNA that reflect the expression properties
of genes? Data centers, or exemplars, are traditionally found by randomly choosing an initial subset of data points and then iteratively refining
it, but this only works well if that initial choice is close to a good solution. Affinity propagation is a new algorithm that takes as input measures
of similarity between pairs of data points and simultaneously considers all data points as potential exemplars. Real-valued messages are exchanged
between data points until a high-quality set of exemplars and corresponding clusters gradually emerges. We have used affinity propagation to
solve a variety of clustering problems and we found that it uniformly found clusters with much lower error than those found by other methods,
and it did so in less than one-hundredth the amount of time. Because of its simplicity, general applicability, and performance, we believe
affinity propagation will prove to be of broad value in science and engineering.
The following animated visualization shows affinity propagation gradually identifying three exemplars among a small set of two-dimensional data points:
For more information, see http://www.psi.toronto.edu/affinitypropagation.
References:
Frey, B.J. and Dueck, D. (2008) Affinity propagation and the vertex substitution heuristic. Science 319, 726d.
Dueck, D., Frey, B.J. (2008) Which algorithm finds the best exemplars quickly? Submitted to International Conference on Machine Learning (ICML), 05/07/2008 09/07/2008, Helsinki, Finland.
Dueck, D., Frey, B.J., Jojic, N. and Jojic, V. (2008) Constructing Treatment Portfolios Using Affinity Propagation. To appear at International Conference on Research in Computational Molecular Biology (RECOMB), 30/03/2008 02/04/2008, Singapore.
Dueck, D. and Frey, B.J. (2007) Non-metric affinity propagation for unsupervised image categorization. IEEE International Conference on Computer Vision (ICCV).
Frey, B.J. and Dueck, D. (2007) Clustering by Passing Messages Between Data Points. Science 315: 972976.
Frey, B.J. and Dueck, D. (2006) Mixture modeling by Affinity Propagation. Neural Information Processing Systems. 18: 379386.
Earlier Research: Probabilistic Sparse Matrix Factorization
We propose a new method of finding sparse matrix factorizations employing probabilistic inference techniques, and apply them to discover structures underlying gene expression data.
Given a data matrix XÎÂG×T, containing G T-dimensional data points, we perform matrix factorization (X»Y·Z). Specifically, we find a matrix of hidden factors (or causes), ZÎÂC×T, and a factor loading matrix, YÎÂG×C, under a structural sparseness constraint that each row of Y contains at most N (of C possible) non-zero entries. Intuitively, this corresponds to explaining each row-vector of X as a linear combination (weighted by the corresponding row in Y) of a small subset of factors (causes) given by rows of Z. Note that this framework includes simple clustering (N=1) and ordinary low-rank approximation (N=C) as special cases.
We construct a probability model presuming Gaussian sensor noise in X (i.e. X=Y·Z+noise) and uniformly distributed factor assignments. We utilize a factorized variational method to perform tractable inference on the latent variables, and also account for noise in the factor matrix (Z) and uncertainty in assigning factors (Y). We explore several different factorization depths to allow a tradeoff between increased model accuracy and decreased computational complexity.
Figure: Probabilistic Sparse Matrix Factorization of a
22709×55 gene expression array
Click for animation (1.2 MB)
We apply our algorithm to mouse gene expression data from the Hughes Lab, as shown in the figure above. We show a data set of G=22709 gene expression profiles (expressed across T=55 tissues) approximated by linear combinations of at most N=3 (of a possible C=50) hidden transcription factor profiles. Experimental results demonstrate that our method can identify genetic functional categories with higher statistical significance than the method currently in greatest use, hierarchical agglomerative clustering.
References:
Dueck, D., Morris, Q., and Frey, B. (2005) Multi-way clustering of microarray data using probabilistic sparse matrix factorization. Presented at the 13th Annual International conference on Intelligent Systems for Molecular Biology (ISMB 2005).
Dueck, D., Huang, J., Morris, Q.D., and Frey, B. (2004) Iterative Analysis of Microarray Data. 42nd Annual Allerton Conference on Communication, Control, and Computing. (oral presentation 30/09/2004)
Dueck, D. (2004) Probabilistic Sparse Matrix Factorization. Third Annual Information Processing Workshop, Toronto, Canada (orally presented 11/08/2004)
Dueck, D. and Frey, B. (2004) Probabilistic Sparse Matrix Factorization. University of Toronto Technical Report PSI-2004-23.
Dueck, D. (2004) MATLAB PSMF code
(updated 11/03/2009)