**Click here to analyze data using the affinity propagation web application**

**Affinity Propagation FAQ**

Interested in a **commercial license** for the extended software toolkit?

Send email to email to Brendan Frey

**Data sets and software are provided at the bottom of this page**

Related Publications

Clustering by Passing Messages Between Data Points.

**[PDF]**[BibTeX]

Brendan J. Frey and Delbert Dueck, University of Toronto

*Science*

**315**, 972–976, February 2007

A Binary Variable Model for Affinity Propagation [PubMed]

Inmar E. Givoni and Brendan J. Frey

*Neural Computation*,Vol 21, issue 6, pp 1589-1600, June 2009

**Applications and Extensions**

Constructing Treatment Portfolios Using Affinity Propagation.

Delbert Dueck, Brendan J. Frey, Nebojsa Jojic, Vladimir Jojic, Guri Giaever, Andrew Emili, Gabe Musso, Robert Hegele

International Conference on Research in Computational Molecular Biology (RECOMB), March 2008, Singapore

Non-metric affinity propagation for unsupervised image categorization.

Delbert Dueck and Brendan J. Frey

International Conference on Computer Vision (ICCV), October 2007, Rio de Janeiro, Brazil

FLoSS: Facility Location for Subspace Segmentation

Nevena Lazic, Inmar E. Givoni, Parham Aarabi and Brendan J. Frey

12th International Conference on Computer Vision (ICCV), October 2009, Kyoto, Japan.

Semi-supervised Affinity Propagation With Instance-Level Constraints

Inmar E. Givoni and Brendan J. Frey

12th International Conference on Artificial Intelligence and Statistics (AISTATS), April 2009, Clearwater, FL.

Solving the Uncapacitated Facility Location Problem Using Message Passing Algorithms.

Nevena Lazic, Brendan J. Frey, Parham Aarabi

International Conference on Artificial Intelligence and Statistics (AISTATS), May 2010, Sardinia

Hierarchical Affinity Propagation

Inmar E. Givoni, Clement Chung, Brendan J. Frey

27th Conference on Uncertainty in Artificial Intelligence (UAI), July 2011, Barcelona, Spain.

What is Affinity Propagation?

How would you identify a small number of face images that together accurately represent a data set of face images? How would you identify a small number of sentences that accurately reflect the content of a document? How would you identify a small number of cities that are most easily accessible from all other cities by commercial airline? How would you identify segments of DNA that reflect the expression properties of genes? Data centers, or exemplars, are traditionally found by randomly choosing an initial subset of data points and then iteratively refining it, but this only works well if that initial choice is close to a good solution.

Affinity propagation is a new algorithm
that takes as input measures of similarity between pairs of data points and
simultaneously considers *all* data points as potential exemplars.
Real-valued messages are exchanged between data points until a high-quality
set of exemplars and corresponding clusters gradually emerges. We have used
affinity propagation to solve a variety of clustering problems and we found
that it uniformly found clusters with much lower error than those found by
other methods, and it did so in less than one-hundredth the amount of time.
Because of its simplicity, general applicability, and performance, we
believe affinity propagation will prove to be of broad value in science and
engineering.

The following video shows affinity propagation gradually identifying three exemplars among a small set of two-dimensional data points.

Software

**OUR BEST SOFTWARE TO DATE**

MATLAB functions |
DOWNLOAD apcluster.m(easy to use/modify, but no sparse capability) apclusterSparse.m (includes sparse capability) |
||

R package |
APCluster is available through CRAN(ported from the Matlab code listed above) |
||

Compiled MEX-file for MATLAB(faster + sparse, but more complicated) |
Windows (32-bit) 2009 7.6 (R2008a) 7.5 (R2007b) 7.4 (R2007a) 7.2 (R2006a) 7.0 (R14) 6.5 (R13) |
Linux (32-bit) 7.5 (R2007b) 7.4 (R2007a) 7.2 (R2006a) |
Linux (64-bit) 2009 7.6 (R2008a) 7.5 (R2007b) 7.4 (R2007a) 7.2 (R2006a) |

Executables and Libraries(sparse + no MATLAB, but complicated) |
Windows (32-bit).EXE + .DLL |
Linux (32-bit).bin + .so |
Linux (64-bit).bin + .so |

**OTHER MATLAB FUNCTIONS**

Click here for MATLAB function

**preferenceRange.m**, which tells you what range of preferences is appropriate for your data set

Click here for MATLAB function

**apclusterK.m**, which finds a user-specified number of clusters (apcluster.m is called by this function).

**Warning**: This function calls apcluster.m many times, while searching for the preference value that gives the requested number of clusters. If you really want to find exactly

*K*clusters, this function may work for your problem, but if you're willing to play with the preference value yourself (recommended), we recommend you use the basic apcluster.m script.

Click here for an older (Feb. 2007) version of the MATLAB function or unsupported C-code (warning: does not handle sparse or infinite cases reliably).

Click here for an archive of other methods (e.g.

*k*-centers, quantized

*k*-means, etc.)

Click here for the Leveraged Affinity Propagation script (see FAQ for details)

Comparison between Affinity Propagation and the Vertex Substitution Heuristic (VSH).

**ULTRA-SHORT MATLAB SCRIPT**

Click here for MATLAB script

The only input to this 24-line version is the MATLAB matrix of similarities,

*S*, where

*S*(

*i*,

*k*) is the similarity that data instance

*k*is the exemplar for data instance

*i*(e.g.,

*S*(

*i*,

*k*) = -(

*x*-

_{i}*x*)

_{k}^{2}is the similarity based on squared error). The preferences should be placed on the diagonal of this matrix; if appropriate preferences are not known, usually, a good choice for the preferences is to set them all equal to the median of the other similarities. The MATLAB code executes 100 iterations of affinity propagation. After execution, the combined evidence

*r*(

*i*,

*k*)+

*a*(

*i*,

*k*) is stored in the

*N*×

*N*matrix

`E`

, the number of exemplars is stored in
`K`

, the data instance indices of the exemplars
are stored in the *K*-vector

`I`

, and the exemplar indices of the data
instances are stored in the *N*-vector

`idx`

.
(Note, instance *i*is assigned to the data instance with index

`idx(i)`

.)
Data Sets

**CLUSTERING TWO-DIMENSIONAL DATA POINTS**

The similarity between every pair of 2D data points was set to the negative squared distance between the points. To prevent degenerate solutions, where affinity propagation tries to place two points in one cluster, but both data points are equally good as cluster centers, Gaussian noise with σ=10

^{-12}was added to the similarities, before affinity propagation was applied.

- Text files containing 2-D data points, similarities, and preferences (median of
*S*) - MATLAB file containing data, similarities and preferences

**CLUSTERING IMAGES DERIVED FROM OLIVETTI FACE DATABASE**

Each 64×64 face image from the first 100 images in the Olivetti database was smoothed using a Gaussian kernel with σ=0.5 and then rotated by -10°, 0° and 10° and scaled by a factor of 0.9, 1.0 and 1.1 (using nearest-neighbor interpolation), to produce a total of 900 images. To avoid including the background behind each face, a central window of size 50×50 pixels was extracted. Finally, the pixels in each 50×50 image were normalized to have mean 0 and variance 0.1. The similarity between two images was set to the negative sum of squared pixel differences.

- Image showing all the data instances
- Text files containing similarities (5MB) and preferences
- MATLAB file containing similiarities based on squared error (19MB)

**FINDING GENES AND EXONS USING PUTATIVE EXON EXPRESSION DATA**

The microarray data reported in Frey et al. (

*Nature Genetics*, 2005) and available at http://www.psi.toronto.edu/genrate was used to construct the following set of pair-wise similarities for 75,066 putative exons from mouse Chromosome 1. By clustering similar putative exons, it is possible to detect bona fide exons and genes. 'Only' 13 million pair-wise similarities were computed, because it is biologically implausible that putative exons that are far apart in the genome are part of the same gene and thus the similarities between distant putative exons need not be computed. To vary the number of detected exons (and genes) increase or decrease all preferences by the same common value, before running affinity propagation. Make sure to use the 'sparse' version, or the software will try to create several 75,067×75,067 full matrices! Also, MATLAB requires 1.2 GB of available RAM for this data, so a clean workspace is important.

- MATLAB file containing similiarities based on proximity of putative exon probes in the genome and similarity in expression of putative exon probes across 12 tissue samples (174MB!)
- Click here for a script that computes the true positive and false positive exon detection rates

**DETECTING REPRESENTATIVE SENTENCES IN A MANUSCRIPT**

Sentences from the main text of the paper 'Clustering by passing messages between data points' (

*Science*

**315**, 972–976) were used to construct a similarity matrix, based on how useful one sentence was for encoding the words in another sentence.

- Text file containing sentences
- Text file containing
similarities based on information coding costs (see supporting online
material in
*Science*paper) - Text file containing the preference that each sentence is a summary sentence, based on information coding costs (longer summary sentences have lower preferences, because they take more bits to encode)
- MATLAB file containing similarities and summary sentence preferences, based on information coding costs

**DETECTING CITIES THAT ARE EASILY ACCESSED FROM OTHER CITIES BY COMMERCIAL AIRLINES**

A web site providing estimated airline travel times (including stop-overs, headwinds, etc) between pairs of Canadian and American cities was queried to obtain a pair-wise similarity matrix, where the similarity of city

*i*to city

*k*was set to the negative estimated travel time.

**Note: Results reported in**

*Science*were obtained using a damping factor of 0.8.- Text files containing city names, similarities, and preferences.
- MATLAB file containing similarities and preferences, based on commercial airline transit time between cities

**ADDITIONAL DATA SETS CAN BE FOUND HERE**