skip to content

Department of Chemistry


The analysis of large databases aims at obtaining a synthetic description of a system revealing its salient features.
We will describe an approach for charting complex and heterogeneous data spaces, providing a topography of the high-dimensional probability distribution from which the data are harvested. This topography includes information on the number and the height of the probability peaks, the depth of the "valleys" separating them, the relative location of the peaks and their hierarchical organization. The topography is reconstructed by using an unsupervised variant of Density Peak clustering[Science, 1492, vol 322 (2014)] exploiting a non-parametric density estimator[JCTC ,1206, vol 14 , (2018) ], which automatically measures the density in the manifold containing the data[Sci Rep. 12140, vol 7 (2017)]. Importantly, the density estimator provides an estimate of the error. This is a key feature, which allows distinguishing genuine probability peaks from density fluctuations due to finite sampling.

Further information


Dec 4th 2019
14:15 to 15:15


Department of Chemistry, Cambridge, Unilever lecture theatre


Theory - Chemistry Research Interest Group