The ability to measure uncertainty in an AI-powered system determines how safe the model or system behavior is. In contrast to the frequentist approach, the Bayesian approach can provide more significant value for predicting the uncertainty inherent in the model or caused by the system environment.

A prior in Bayesian modeling is essential in providing information about past experiences in our sample space. A certain degree of belief benefits Bayesian statistics over frequentist statistics. More available computing power in the new hardware developed for the machine learning applications enables the use of the Bayesian approach compared to the frequentist approach, which requires higher computing resources.

Uncertainty

How can we benefit from measuring uncertainty? Measuring the uncertainty can help us filter out the noisy input data from the training data distribution that is not part of the training data distribution. There are three types of uncertainties, epistemic, aleatoric, and distributional uncertainties. The source of the aleatoric uncertainty is the input data. A higher aleatoric uncertainty value means that there is insufficient information to estimate the neural network output. Epistemic uncertainty refers to the uncertainty in the model parameters. The difference in distribution between training and test data causes distributional uncertainty.

Applying more training data helps reduce epistemic uncertainty. There is always a gap, and it is impossible to achieve zero epistemic uncertainty because we do not have access to all the data. More training data cannot reduce the aleatoric uncertainty because of the lack of knowledge in the training data. This kind of uncertainty is caused due to the natural complexity of the training data. Distributional uncertainty is the most natural problem in machine learning, caused by an out-of-distribution that needs more attention and new measures.

Predictive uncertainty

The Bayesian neural network presents a predictive distribution of uncertainty compared to the deep neural network, which benefits from point estimation and cannot filter noisy input data. To estimate the uncertainty, we should first be able to distinguish between different types of data (aleatoric), model (epistemic), or distributional uncertainties. We can only reduce some uncertainty by knowing its source since the reactive measure should correspond to the type of uncertainty encountered.

Data uncertainty is the known-unknown part, as the model knows what limitation the model output has due to the natural lack of knowledge in the training data. The distributional uncertainty is the unknown-unknown part, and the unknown test data does not allow the creation of correct output.

Probabilistic reasoning

Probabilistic reasoning is applied to make better decisions in a system with uncertainty. Probabilistic language programming combines traditional programming with probabilistic modeling to take advantage of both. These languages ​​use Bayesian computing and inference for many statistical models.

Future

Applying Bayesian inference requires many of the assumptions we made in the frequentist approach, such as assumptions for the likelihood function. These assumptions sometimes make the Bayesian approach inefficient. All approaches aim to separate noise from the signal as part of the input data. There is another approach called fiducial inference that doesn't require as many assumptions as Bayesian, and we can dive deep into reference inference in another article.

Ultimately, data is the most significant part of an AI system, and the quality of the data is crucial. Qualified data from 5% of the population is much better than unqualified data from 95% of the population. There is a new concept called data minding instead of data mining to understand the data and analyze where the data came from and how accurate the data is.

Summary

New out-of-distribution detection methods help detect distributional uncertainties, which are possible through the application of Bayesian neural networks. Bayesian neural networks use domain knowledge as a prior distribution to improve the model's performance. Bayesian neural network architecture is more like the human brain, where we use our past experiences to predict the risk of success or uncertainty in the future.

The frequentist approach has no known data, but the Bayesian approach focuses on the known data. Probabilistic thinking helps create a model that uses available quantitative domain knowledge as a prior. All existing statistical data from our past experiences can be used as new knowledge to improve our decision and reduce uncertainties.