skip to content

Department of Chemistry

 

The analysis of large databases aims at obtaining a synthetic description of a system revealing its salient features.
We will describe an approach for charting complex and heterogeneous data spaces, providing a topography of the high-dimensional probability distribution from which the data are harvested. This topography includes information on the number and the height of the probability peaks, the depth of the "valleys" separating them, the relative location of the peaks and their hierarchical organization. The topography is reconstructed by using an unsupervised variant of Density Peak clustering[Science, 1492, vol 322 (2014)] exploiting a non-parametric density estimator[JCTC ,1206, vol 14 , (2018) ], which automatically measures the density in the manifold containing the data[Sci Rep. 12140, vol 7 (2017)]. Importantly, the density estimator provides an estimate of the error. This is a key feature, which allows distinguishing genuine probability peaks from density fluctuations due to finite sampling.

Further information

Time:

04Dec
Dec 4th 2019
14:15 to 15:15

Venue:

Department of Chemistry, Cambridge, Unilever lecture theatre

Series:

Theory - Chemistry Research Interest Group