Data Science | School of Mathematics

Data-driven models, inverse problems, uncertainty quantification, graphs and networks

Gaussian surrogate models for Bayesian inverse problems

Supervisor(s): Konstantinos Zygalakis, Aretha Teckentrup

Inverse problems are ubiquitous across a number of scientific fields, including geophysics, image processing and machine learning to name a few. They arise in situations where certain quantities of interest cannot be directly measured and must be inferred from observable effects. Solving inverse problems using Bayesian methods requires numerous simulations, which are often computationally expensive. To address this, one can use computationally cheap surrogate models, such as Gaussian process emulators, to approximate the behaviour of complex mathematical models.

A key challenge in using Gaussian process emulators is to maintain accuracy when only a limited number of simulations are available due to computational constraints. We introduce a method that integrates the physics model into the Gaussian emulator, thus enhancing approximation accuracy. This method is extended to solve Bayesian inverse problems by approximating the probability distribution of unknown parameters, greatly reducing computational cost.

Link to thesis online

Kinetic Langevin Monte Carlo methods

Supervisor(s): Ben Leimkuhler, Daniel Paulin, Neil Chada

This thesis is about studying the properties and introducing methodology for efficient methods for generating samples from probability distributions and estimating expected quantities of that probability distribution. This is particularly important in molecular dynamics, where one would want to calculate expected configurations for example for protein configurations, and in statistical computations such as Bayesian inference where one would want to estimate expected parameters of the posterior.

Typically these methods have computational cost that scale with the number of parameters in the model and properties of the model. In this thesis, we study how the cost scales with these quantities, and introduce more efficient methods for estimating these quantities.

Link to thesis online

Laplacians for structure recovery on directed and higher-order graphs

Supervisor(s): Des Higham, Konstantinos Zygalakis

In this thesis, we explore graph analysis using Laplacian matrices. These matrices provide valuable insights into how components are connected and grouped together. While there has been extensive research on undirected graphs, we identified a gap when it comes to analyzing more complex graphs when the connection is directional or when the relationship involves more than two components. To address this issue, we investigated mathematical frameworks for analyzing directed graphs, hypergraphs, and directed simplicial complexes using graph Laplacians. Specifically, we examined two existing Laplacian matrices for directed graphs which consider the direction of edges and developed associated generative models. We then extend this framework to hypergraphs where edges may connect more than one node. In addition, we defined and analyzed a new Magnetic Hodge Laplacian, which captures directional flows on triangles. Through case studies on triangles and tori, we demonstrated its ability to identify flow direction and discover direction-related patterns.

Link to thesis online

Change detection in spatiotemporal SAR data for deforestation monitoring

Supervisor(s): Stuart King

Forests play a vital role in the wellbeing of our planet. Large and small scale deforestation across the globe is threatening the stability of our climate, forest biodiversity, and therefore the preservation of fragile ecosystems and our natural habitat as a whole. With increasing public interest in climate change issues and forest preservation, a large demand for carbon offsetting, carbon footprint ratings, and environmental impact assessments is emerging. Satellite remote sensing is the only method that can provide global coverage at frequent intervals and is therefore the standard method for global forest monitoring. Most often, deforestation maps are created from optical images. Although such maps are of generally good quality, they cannot directly measure the amount of biomass and are not typically available at less than annual intervals due to persistent cloud cover in many parts of the world, especially the tropics where most of the world’s forest biomass is concentrated.

Radar images can fill this gap as they are unaffected by clouds. Radar is also reflected off the ground in a way that depends on the shape of the objects on the ground. Careful interpretation of these radar images therefore allows us to draw conclusions about the volume occupied by vegetation and in turn their biomass. Different radar satellites differ mainly in their wavelength, which dictates what size of objects they can observe. In general, forest biomass estimation works best with longer wavelengths. However, one of the most readily available radar data sources is the European Sentinel-1 satellite, which has a shorter wavelength radar instrument. In this thesis, the theory behind radar imaging for forest monitoring is discussed. The potential for Sentinel-1 data to distinguish forest and non-forest is then assessed for different regions of the world and existing methods for deforestation detection are reviewed. One of the biggest challenges in deforestation detection is often the lack of reliable reference data. For this reason, a robust method for deforestation detection in the absence of high quality reference data is proposed. This method achieves a high detection sensitivity although false positives lead to a low specificity. While further work is required to validate this method in different biomes and improve the deforestation detection result, including faster detection, the results show that Sentinel-1 has the potential to advance global deforestation monitoring.

Link to thesis online

Accelerating Bayesian computation in imaging

Supervisor(s): Konstantinos Zygalakis, Marcelo Pereyra

Decision-making processes require robust tools in domains as essential as medicine and astronomy to support and deliver reliable judgments in critical situations. For instance, if an atypical object were to appear on the CT image of a patient, it would be beneficial if a doctor also had tools that provided a degree of assurance that the object was indeed inside the patient’s body. This would further help the doctor make accurate diagnostic and treatment decisions. In this context, Bayesian computation provides efficient tools to quantify the uncertainty in these situations. These methods are usually admissible from the computational point of view in small dimensional applications, i.e., it takes a few minutes to obtain results, for example, in applications where one is interested in the time evolution of a few quantities, such as forecasting stock prices or the spread rates of COVID-19. However, this is not the case for imaging applications, where one wants to analyse images with hundreds of thousands of pixels, i.e., dimensions.

For performing the Bayesian analysis required in large imaging applications, it is necessary to have methods that provide accurate results in an allowable amount of time. The development of efficient Bayesian approaches for extremely high dimensional applications, such as imaging, has been one of the main focuses of the Bayesian imaging community. Despite the significant efforts of the scientific community in recent decades, the amount of data handled by new applications is expanding quickly, and the methods developed just a few years ago are starting to become obsolete with the sheer volume of information produced by next-generation applications. In February 2022, for instance, as part of the calibration and alignment procedure, the James Webb Space Telescope generated a picture mosaic of over 2 billion pixels.

In light of these current challenges, we present in this thesis three novel Bayesian methods, which significantly outperform existing state-of-the-art approaches in speed and/or accuracy, as demonstrated by the theory and numerical experiments developed in this work

Link to thesis online

Stochastic dynamics and partitioned algorithms for model parameterization in deep learning

Supervisor(s): Ben Leimkuhler

This thesis is about mathematics for deep learning. Deep learning and neural networks haveshown promising applicability to a variety of pressing real-world problems. They are enabling the development of self-driving cars, protein-folding, and nuclear fusion. However, the neural networks underpinning these breakthroughs often remain hard to interpret and understand. This thesis takes a step towards providing a stronger foundation for the mathematics behind neural networks, and in the process develops algorithms that improve the way in which they can be trained.

A neural network can be thought of as a machine that performs tasks by taking in an input, and returning an output. These inputs and outputs can be diverse and varied. For example, the input could be an image, and the output could be a classification of that image (“this image contains a cat”). The input could be a piece of text, and the output could be a continuation of that piece of text. The input could be a DNA sequence (“ACGTGTACGT”), and the output could be a 3D representation of a folded protein. The field of deep learning is therefore inherently interdisciplinary, due to its widespread applications. However, as we extend the scope of deep learning and neural networks it is crucial to also understand their limitations. For example, neural networks can easily be fooled and can misclassify images with high confidence (“this image of a cat is definitely a dog”). Therefore, taking a mathematical perspective is very important to allow us to obtain better guarantees when using neural networks for real-world applications.

The network’s behaviour when performing tasks is determined by its parameters. An important aspect of deep learning involves finding ways to change the parameters to make the network better at the task of interest. Unfortunately, a lot is still unknown about why certain parameterization (training) schemes perform better than others. In this thesis we study and tease out properties of neural network training, and use those findings to improve how neural networks perform on new (unseen) tasks.

Link to thesis online

Uncertainty quantification in seismic interferometry

Supervisor(s): Michal Branicki

Just as doctors use X-rays to look inside living tissue, geophysicists can use seismic waves to scan geological formations beneath the Earth’s surface that are too inaccessible to be studied directly. The speed of these waves changes when traversing different kinds of materials in the Earth’s crust, the same way that X-rays change speed when travelling through different tissues inside the human body. These changes help create images of our body in the case of X-rays, and images of the Earth’s subsurface in the case of seismic waves. There are many kinds of sources for waves travelling across the Earth, ranging from earthquakes to ocean noise to urban noise, and these waves are usually recorded on the surface using sensors known as geophones. Unfortunately, unlike an X-ray machine that emits X-rays on demand, the Earth does not produce seismic energy whenever geoscientists need it, posing significant challenges for creating images of the Earth’s subsurface. However, a recent technique known as wavefield interferometry allows geoscientists to create virtual sources at times and locations where no sources were originally physically recorded. Wavefield interferometry is in fact so powerful that it can even take advantage of ambient noise, previously considered a nuisance and discarded, to create these virtual sources. This technique has greatly advanced our ability to image the Earth’s subsurface and has many industrial and environmental applications, at scales ranging from the local regions to entire continents.

Wavefield interferometry works by taking seismic recordings from pairs of sensors and processing them using certain mathematical operations that effectively turn one of the sensors into a “virtual” source. However, the mathematical theory that makes wavefield interferometry work requires very idealised conditions that are not often found in practice. Therefore, practical studies of wavefield interferometry often need to make a series of assumptions and simplifications in order to be able to apply this mathematical machinery. Whenever these assumptions are not satisfied, or approximations are made, there is potential for error and uncertainty in the final result. In the case of wavefield interferometry, it can ultimately affect our understanding of sub-surface Earth structures. In this thesis, we study and quantify uncertainty in interferometric estimates as a consequence of errors introduced by violating some important assumptions of the mathematical theory and by making approximations commonly used in wavefield interferometry. We derive bounds for these errors, which can be a useful tool for geoscientists to estimate the uncertainty in their practical applications, and propose some strategies to mitigate this uncertainty.

Link to thesis online

This article was published on 2025-04-22