Understanding what AI doesn’t know: Probability theory can help AI make trustworthy decisions

EURECOM Communication
6 min readJun 1, 2023

The development of Artificial Intelligence (AI) has brought numerous advances in different fields, including healthcare and transportation. However, as AI continues to evolve and become more pervasive, concerns about its reliability and accuracy have arisen. The recent incidents involving self-driving cars and their inability to recognize certain objects and situations have raised important questions about the ability of AI models to handle uncertainty. In order to address these concerns, researchers are working on developing mechanisms that can accurately quantify the level of uncertainty in AI predictions. This is important because AI will likely be used as a support tool for domain experts for a long time to come, and it is crucial that they are able to rely on the system to provide accurate and trustworthy information.

To achieve this, probabilistic approaches are being explored to train AI models to understand when there is not enough evidence to make a confident prediction. This approach allows for a family of possible configurations of the parameters of the model that are all compatible with what is observed, rather than just one optimal solution. As AI becomes more complex and is applied to increasingly complex data sets, it is important to develop models that can accurately recognize and respond to uncertainty, in order to ensure the reliability and accuracy of AI in decision making.

EURECOM’s professor Maurizio Filippone, expert in Bayesian Machine Learning, explains in detail how probabilistic and statistical approaches can be used to address the critical issue of reliable decision-making in AI.

Q. Tell us a bit more on the approaches followed within your research group on AI in decision making.

MF. Our research group is focused on developing machine learning models that can understand when there is insufficient evidence to make a confident prediction. To achieve this, we adopt a probabilistic approach in order to train these models. In contrast to traditional machine learning approaches that seek to obtain a single optimal solution, we consider a family of possible configurations of model parameters that are all compatible with the observed data. This probabilistic treatment involves expressing everything in terms of probabilities over parameters and data, following the principles of Bayes that have been used in statistics for over 250 years. However, applying these principles to popular neural networks, which are commonly used to analyse images and complex structured objects, is challenging due to the large number of parameters leading to the need to operate in high-dimensional spaces. Our ultimate goal is to develop robust and accurate machine learning models that can make reliable predictions, even in situations where there is limited evidence available.

[4] Decoding post-stroke motor function from structural brain imaging, 2016

Q. What are the challenges you have to overcome?

MF. Let’s take the example of ChatGPT - which is a large language model, parametrised by billions of parameters - and imagine that we want to express a distribution over these parameters by means of a probabilistic formulation. Obtaining a meaningful distribution over these billions of parameters which is capable of producing a sensible modeling of natural language is extremely difficult computationally. Therefore, we explore several approximation and optimisation techniques that are designed to accelerate computations. We are also looking into ways in which we could deploy our theoretical techniques to sophisticated hardware architecture, such as those involving optical computing devices. One other challenge is how to find the best architecture and configuration of a neural network in order to obtain the best performance and confidence estimate.

For this, there are two aspects to consider:

a) the computational complexity of the implementation of existing statistical tools e.g. Bayesian methods and

b) the demonstration that certain approximation techniques are valid in terms of enabling accurate modeling of complex realistic problems.

Q. Could you explain the mathematical tools you are developing?

MF. My work is focused on developing a deep understanding of Bayes theorem and how it can be applied in the context of machine learning and statistical inference. Bayes theorem is a powerful tool for understanding the relationship between random variables. By using this theorem, we can, for example, calculate the probability of a cause given the effect. For instance, in the context of medical diagnosis, we can use Bayesian techniques to determine the probability that a patient has a particular disease based on observed symptoms. By using this approach, we can make accurate predictions and optimize treatment options for patients. My research involves using Bayesian techniques to understand the relationship between model parameters and data in the context of artificial intelligence. By treating both model parameters and data as random variables, we can calculate the probability of data given the parameters. By optimizing this probability, we can refine our model parameters to more accurately predicted outcomes. Additionally, we can also use Bayesian techniques to obtain the posterior distribution, which is the probability distribution over all the parameters of our model that fit the observed data. This is an incredibly powerful tool for understanding the relationship between model parameters and data and can lead to significant improvements in the accuracy and performance of artificial intelligence models.

[2] Model Selection for Bayesian Autoencoders, 2021

Q. What are the real — life applications for your ML statistical methods?

MF. The diverse range of applications that our solutions can be applied to and make significant contributions is fascinating. One such field is neuroscience, where we collaborate with neuroscientists to develop disease models and accurate predictions for diseases like Parkinson’s and Alzheimer’s. Our work also involves understanding which areas of the brain are most important for identifying the onset of these diseases and analyzing their progression. To address the challenge in determining the onset of such diseases, we look for a distribution over this quantity, which can ultimately help clinicians in understanding what kind of treatment or medication can be given to the patient. We have collaborated with labs in Switzerland and the Netherlands to advance this research.

Another area of interest for me is the integration of Physics and machine learning in problems such as climate science, earthquake modeling, and tsunami prediction. In such applications, there is a lot of known Physics, but there are still many unknowns that can be modeled using data-driven approaches. However, this can be challenging, as data is often noisy and complex, and we may not have the ability to describe the complexity of the processes accurately. This is where machine learning can be very useful, allowing us to capture the complexity of these phenomena and develop more accurate models. Our work in this area has led to significant improvements over previous methods and we believe that our methods have enormous potential to make a significant impact in a wide range of fields, from neuroscience to climate science.

Q. What are your future research goals?

MF. I have spent the past year exploring new ideas and directions for my work. This has involved traveling to various places across the world this year, including Stanford and UCI (Irvine, CA) in the US, as well as Alto and Finland where I interacted with machine learning teams. I have also visited Australia to meet with climate scientists and colleagues working in spatial Statistics. While I dedicated most of my efforts in the development of statistical methods for Machine Learning in the past few years, I believe it is time to shift focus towards applying these methods to high-impact applications. Along with one of my PhD students, I have already started exploring the use of functional MRI imaging and EEG data for mice in collaboration with UCI. However, the main challenge I currently face is securing funding to support these projects and hire additional staff. While I have a few new starting contracts with industry, I am in the process of applying for long-term sources of funding.

In our field, we are constantly faced with the challenge of developing models that can make accurate predictions quickly, as AI is used every day to carry out millions of predictions and decisions in various fields. However, as we move towards using AI for critical applications, it is important that we can trust the confidence in the predictions made by these models. Explainability is also crucial, as we need to understand why models make certain predictions and why they might be uncertain about these. While there have been advancements in this area, there is still a lot of work to be done, especially when dealing with models with billions of parameters. Hardware limitations also pose a challenge, as research institutes cannot match the resources available to big companies. Nevertheless, we must continue to attract good talents and develop smart methods to tackle these challenges. Through research and talent, we can make progress in developing machine learning models that are both statistically sound and computationally attractive.

References

  1. https://jmlr.org/papers/v23/20-1340.html
  2. https://proceedings.neurips.cc/paper/2021/hash/a41db61e2728ef963614a8c8755b9b9a-Abstract.html
  3. https://hal.science/hal-01617750
  4. https://pubmed.ncbi.nlm.nih.gov/27595065/

--

--

EURECOM Communication

Graduate school & Research Center in digital science with a strong international perspective, located in the Sophia Antipolis technology park.