Mario Boley - Trustworthy and Informative Machine Learning for Scientific Discovery

Machine learning promises to accelerate scientific theory development and discovery in a data-driven approach. However, to fulfil this promise, methods have to a) provide an explicit human-readable form of the modelled relations and b) extrapolate well to unseen cases from only a few expensive data points. Modern deep learning systems, while producing impressive results in some areas, are fundamentally unsuited to meet these two requirements, as they rely on vast quantities of parameters that interact in complicated ways and that need to be fitted using equally vast amounts of training data. In contrast, additive models of simple basis functions can provide not only very accurate predictions for important scientific questions, they are also readily understandable and testable akin to traditional empirical laws. I will demonstrate this using three examples of my applied work in chemistry and materials science: modelling propagation rates in radical polymerisation, morphological outcomes of polymer-induced self-assemblies, and crystal structure affinity of octet binary semi-conductors. Motivated by these successes, I will then discuss my recent methodological work on statistical and algorithmic challenges in producing such trustworthy and informative models. In particular, I will show how a Bayesian treatment of linear regression leads to parameter estimates that are both statistically more robust and typically faster to compute than the usual cross-validation-based approach. Moreover, I show how a novel objective function and optimisation approach lead to a better accuracy/interpretability trade-off when iteratively assembling additive models within the commonly used framework of “gradient boosting”.

Date and Time: 
Thursday, February 15, 2024 - 13:30 to 14:30
Speaker: 
Mario Boley
Location: 
A208
Speaker Bio: 

Mario Boley is a Senior Lecturer and the Deputy Director of Research at the Department of Data Science and AI at the Faculty of IT of Monash University in Melbourne, Australia. He is interested in trustworthy machine learning with a focus on efficient learning algorithms for interpretable models and their application to accelerate scientific discovery, in particular in materials science and polymer chemistry. Mario obtained his PhD in computer science in 2011 from the University of Bonn, Germany, for work in algorithmic order theory and the branch-and-bound algorithm. Subsequently, he held post-doctoral positions at the Fraunhofer Institute for Intelligent Analysis and Information Systems, the Max Planck Institute for Informatics, and the Fritz Haber Institute of the Max Planck Society for Materials Science. He joined the permanent academic staff of the Faculty of IT at Monash University in 2018.