What do I do?
I am a Lecturer in Molecular Informatics (also known as Chem(o)informatics) in the Unilever Centre for Molecular Sciences Informatics. My work is focused on the prediction of properties of molecules; here mainly for small molecules (which could be of therapeutic interest), and primarily in the life science field.
Why should this be of relevance?
Currently we witness the advent of more and more available data both in the biological and the chemical fields, such as biological assays that can be performed on 100,000s or even millions of molecules, but also very precise data on a much smaller scale such as solubility data on only dozens or hundreds of molecules. Hence, questions arise how to store and analyze those data, and also how to use them for future predictions of properties. This can reduce the need of future experimentation – which saves time and costs in very expensive areas such as pharmaceutical research, where a new drug on the market costs maybe 1 billion USD in expenses and about 10 years of time to develop.
So which kind of properties do we focus on?
One important property one needs to be informed about in drug discovery at a very early stage is the (potential) toxicity of drugs. Only very recently, several patients died from drugs on the market since those drugs blocked an ion channel in the heart, (the hERG channel) which lead to arrhythmia and eventually death even induced by drugs on the market. Based on large datasets containing thousands of molecules, we can predict which compounds show this side-effect with reasonable reliability (and within some limitations, such as the chemical space covered by the dataset). Some of the molecules predicted correctly as hERG-blockers are shown below – and our models are also able to highlight the parts of the molecule predicted to cause this side effect (marked in red). More details are given in a very recent publication listed below.
Molecules predicted to be cardiotoxic due to blocking the hERG channel, with the part of the molecule assumed to be important for this effect marked in red
As an extension to this work, when considering a much larger number of protein targets in parallel, we also linked chemical features of drugs and their relation to particular side effects on a large scale while I was a postdoctoral fellow with Novartis. Pathway databases form an important extension of this work, since they arrange proteins in pathways, and hence put otherwise completely unrelated targets into relation to each other. These are only two examples of work where previous knowledge was used to predict toxic properties of novel molecules, which can be used as 'early warning signs' in a drug discovery project.
High-Throughput Screening Data
Since large pharmaceutical companies run a huge number of large-scale screens every year (amounting to several hundred million data points, linking molecules to potential activities against a protein target!), it is crucial to identify the right experimental method for an HTS screen, given a particular protein target of interest (such as a kinase, a GPCR, etc.). Analyzing dozens of HTS's, and also distinguishing between different readouts (fluorescence assays, high-content screens, reporter gene assays, etc.), we were able to show that some assays perform much better with certain target classes than others.
This is also shown in the figure below, which displays the likelihood of progressing into a certain phase in drug discovery (here, lead optimization) given a particular combination of target type and readout available. While this data is based on years of results (and cost millions of dollars to generate!) one should still keep in mind that the total number of screens is still quite low – so while one could certainly draw some conclusions from those results, there will still also certainly be exceptions to this analysis.
Analysis of the performance of high-throughput screens, depending on the target class screened and the readout used. Some combinations perform better than others, and this kind of analysis can be used to make informed future decisions on better screening experiment setups.
Apart from analyzing existing screens that are based on single protein targets, one can also combine phenotypic (cell-based) readouts with the predicted targets of small molecules. This is what we did on a screen of more than 6,000 compounds in combination with microscopy-based high content screening: Apart from the (often cryptic) response from a microscopy-based screen, we were able to generate a mode-of-action hypothesis from our in silico target predictions. This integration of data, merging phenotypic readouts with chemical structure similarity and predicted targets is shown below and it combined the best of two worlds: Relevant readouts from biological systems, with an informed target hypothesis to explain what is happening in the system.
Integration of phenotypic screening data with an analysis of chemical structure similarity and predicted protein targets. By merging phenotypic screening data and predicted targets the best of both worlds is combined – a relevant readout is merged with an informed mode-of action hypothesis
Similarity Searching of Molecules
Finally, one important area of our research is the description of molecules – and the calculation of similarities between molecules. (This is not trivial at all – which properties of a molecule should one look at? Shape? Or rather surface properties? Furthermore we are dealing with flexible systems of course…). In this context I was previously devising and implementing an algorithm, termed MOLPRINT 2D, as a free tool to fingerprint molecules and to allow for the comparison of molecules using the Tanimoto (similarity) coefficient. This method can be downloaded freely from http://www.molprint.com and it is used in several pharmaceutical companies worldwide. As an extension of this work we also compared how similar different descriptions of molecules actually behave – which means, how similar they judge pairs of molecules to be . This work is very relevant in particular due to the number of molecular descriptors that exist and where the user needs to know how different certain fingerprints behave – and, of course, whether they are able to identify molecules that behave similarity in a biological system, which is very often the application of similarity searching tools.
(for an up-to-date list see http://www.andreasbender.de/)