Do we only trust what we understand? This question arises from the fundamental assumption we make in constructing explainability methods for artificial intelligence-based models. This assumption leads to a predominant argumentative line in which explainability is seen as necessary for us to trust the results generated by the model. However, this may not necessarily be a decisive factor for increasing trust.
The statistician George Box says that “all models are wrong, but some are useful”. In the case of the explanations we have for AI-based models, we can say that “many explanations are wrong, but some are useful”.
An emblematic case was recorded with the launch of Google's Gemini model. The image generation mechanism attributed to the system aimed to solve a canonical problem observed in models of the same category: gender and racial bias. The results generated over time show that Generative AI models have these biases. The common solution for this type of problem is through the creation of more diverse datasets with a reduction of harmful biases. However, Gemini caused a negative repercussion by generating historically and socially inaccurate images, causing it to be taken offline. For example, when requesting an image of a German soldier in 1940, the image generated by the model would be of men of different ethnicities (black, indigenous, Asian, etc.) wearing Nazi army uniforms. Other inaccuracies were committed when requesting images, for example, of Vikings, the Pope, and the US founding fathers. In this sense, the model was falling into a situation called bias-accuracy trade-off, that is, considering that it is not possible to obtain results that have high accuracy and bias mitigation at the same time. Thus, it is necessary to find a balance between these two components.
Incidents of this kind make us question whether transparency about the results generated by a model is really important and, in a way, necessary to ensure user trust in a system. I would like to detail this question from some perspectives: (i) high-risk tasks, (ii) low-risk tasks, (iii) tasks where accuracy is more relevant than explainability, and (iv) tasks where accuracy is less relevant than explainability.
High-risk tasks
Examples: Medical diagnosis, loan approvals, self-driving cars, criminal sentencing.
Transparency: Critically important. Understanding why a model made a decision is essential for assessing potential negative consequences and building trust.
Accuracy vs. Explainability: While accuracy is vital, explainability often takes precedence. An incorrect decision needs to be traceable to its root to prevent future harm.
Low-risk tasks
Examples: Movie recommendations, online product suggestions, and image categorization.
Transparency: Less crucial. Users are likely more forgiving of minor errors as long as the overall experience is positive.
Accuracy vs. Explainability: Accuracy usually takes the lead. Users primarily desire a correct or pleasing outcome.
Accuracy > Explainability
Scenarios: Tasks where the output or decision is the primary focus, and the reasoning can be safely opaque.
Examples:
Image recognition: Identifying a species of bird in a photo.
Financial trading: Models maximizing profit may have a complex internal logic.
Real-time decision-making: A fraud detection system might not have time to explain itself, but accuracy is paramount.
Explainability > Accuracy
Scenarios: When understanding the model's reasoning is critical for trust, fairness, and future improvement.
Examples:
Denying a loan application: Explaining the factors involved helps the applicant understand how to improve their creditworthiness and ensures the model isn't biased.
Medical diagnosis: A doctor needs to understand the reasoning behind an AI's suggested treatment to combine it with their judgment.
Debugging and auditing: When an AI model has unexpected errors, explainability enables uncovering the flaws and improving it.