Analogue Split

What is a flower?
As the officer Lu Hsuan was talking with Nan Ch'uan, he said, “Master of the Teachings Chao said, 'Heaven, earth, and I have the same root; myriad things and I are one body.' This is quite marvelous.” Nan Ch'uan pointed to a flower in the garden. He called to the officer and said, “People these days see this flower as a dream.

Github Repo PyPI

Analogue Split is a Python package designed to test the robustness of machine learning models in cheminformatics by focusing on activity cliffs.

In machine learning for cheminformatics, random data splits are the standard for training and validation. While this approach appears objective, it lacks control over the biases inherent in chemical datasets. These biases—such as structural similarity leading to similar bioactivity or the presence of activity cliffs (small structural changes causing large bioactivity differences)—significantly influence model training and evaluation. A random split often obscures these critical factors, limiting our understanding of model performance under real-world conditions.

Analogue Split addresses this issue by introducing a controlled, biased splitting mechanism. It uses a parameter, γ (gamma), to adjust the test set composition. Specifically, γ determines the proportion of test set molecules that are structurally similar to training set molecules but have different activity labels. This approach creates a series of test sets that are explicitly designed to stress-test the model's ability to handle within-series and between-series predictions.

Why does this matter? Because gross performance metrics like accuracy and F1 score can be deceptive when the validation set doesn't challenge the model meaningfully. To truly assess utility, we need a validation set that balances novelty, diversity, and bias—one that mimics real-world challenges. I don't think there's anyone who believes that this would be easy in practice. Consequently, Analogue Split is designed specifically to test only one of these conditions. This allows us to craft a benchmark from a given dataset, focusing on what matters: robustness against key biases (in this case, activity cliffs).

I advocate for biased validation because I believe that models are only as good as the data they're validated against. With Analogue Split, we take a step closer to selecting models that don't just perform well but perform meaningfully. After all, in science, it's not about being right; it's about being useful.

Special thanks to Andreas Bender, Daniel Svozil, Martin Sicho, Wim Dehaen, Ivan Čmelo, Srijit Seal, and Yoann Gloaguen for inspiring and shaping these ideas.