Apprentissage automatique de l’IA : In Bias We Trust ?

Trust Machine Learning Model’s Predictions
Faites confiance aux prédictions des modèles d'apprentissage automatique

Des chercheurs du MIT ont découvert que les méthodes d’explication conçues pour aider les utilisateurs à déterminer s’ils doivent faire confiance aux prédictions d’un modèle d’apprentissage automatique peuvent perpétuer les préjugés et conduire à des résultats moins bons pour les personnes issues de groupes défavorisés. Crédit : Jose-Luis Olivares, MIT, avec des images provenant d’iStockphoto.

Selon une nouvelle étude, les méthodes d’explication qui aident les utilisateurs à déterminer s’ils doivent faire confiance aux prédictions des modèles d’apprentissage automatique peuvent être moins précises pour les sous-groupes défavorisés.

Les algorithmes d’apprentissage automatique sont parfois employés pour aider les décideurs humains lorsque les enjeux sont importants. Par exemple, un modèle peut prédire quels candidats à une école de droit ont le plus de chances de réussir l’examen du barreau, aidant ainsi les responsables des admissions à décider quels étudiants admettre.

En raison de la complexité de ces modèles, qui comportent souvent des millions de paramètres, il est presque impossible pour les chercheurs en IA de comprendre pleinement comment ils font des prédictions. Un responsable des admissions n’ayant aucune expérience de l’apprentissage automatique peut n’avoir aucune idée de ce qui se passe sous le capot. Les scientifiques ont parfois recours à des méthodes d’explication qui imitent un modèle plus vaste en créant de simples approximations de ses prédictions. Ces approximations, qui sont beaucoup plus faciles à comprendre, aident les utilisateurs à décider s’ils doivent faire confiance aux prédictions du modèle.

Cependant, ces méthodes d’explication sont-elles justes ? Si une méthode d’explication fournit de meilleures approximations pour les hommes que pour les femmes, ou pour les blancs que pour les noirs, les utilisateurs peuvent être plus enclins à faire confiance aux prédictions du modèle pour certaines personnes mais pas pour d’autres.

MIT scientists carefully examined the fairness of some widely used explanation methods. They discovered that the approximation quality of these explanations can vary drastically between subgroups and that the quality is often significantly lower for minoritized subgroups.

In practice, this means that if the approximation quality is lower for female applicants, there is a mismatch between the explanations and the model’s predictions, which could lead the admissions officer to wrongly reject more women than men.

Once the MIT researchers saw how pervasive these fairness gaps are, they tried several techniques to level the playing field. They were able to shrink some gaps, but couldn’t eradicate them.

“What this means in the real world is that people might incorrectly trust predictions more for some subgroups than for others. So, improving explanation models is important, but communicating the details of these models to end users is equally important. These gaps exist, so users may want to adjust their expectations as to what they are getting when they use these explanations,” says lead author Aparna Balagopalan, a graduate student in the Healthy ML group of the MIT Computer Science and Artificial Intelligence Laboratory (CSAIL).

Balagopalan wrote the paper with CSAIL graduate students Haoran Zhang and Kimia Hamidieh; CSAIL postdoc Thomas Hartvigsen; Frank Rudzicz, associate professor of computer science at the University of Toronto; and senior author Marzyeh Ghassemi, an assistant professor and head of the Healthy ML Group. The research will be presented at the ACM Conference on Fairness, Accountability, and Transparency.

High fidelity

Simplified explanation models can approximate predictions of a more complex machine-learning model in a way that humans can grasp. An effective explanation model maximizes a property known as fidelity, which measures how well it matches the larger model’s predictions.

Rather than focusing on average fidelity for the overall explanation model, the MIT researchers studied fidelity for subgroups of people in the model’s dataset. In a dataset with men and women, the fidelity should be very similar for each group, and both groups should have fidelity close to that of the overall explanation model.

“When you are just looking at the average fidelity across all instances, you might be missing out on artifacts that could exist in the explanation model,” Balagopalan says.

They developed two metrics to measure fidelity gaps, or disparities in fidelity between subgroups. One is the difference between the average fidelity across the entire explanation model and the fidelity for the worst-performing subgroup. The second calculates the absolute difference in fidelity between all possible pairs of subgroups and then computes the average.

With these metrics, they searched for fidelity gaps using two types of explanation models that were trained on four real-world datasets for high-stakes situations, such as predicting whether a patient dies in the ICU, whether a defendant reoffends, or whether a law school applicant will pass the bar exam. Each dataset contained protected attributes, like the sex and race of individual people. Protected attributes are features that may not be used for decisions, often due to laws or organizational policies. The definition for these can vary based on the task specific to each decision setting.

The researchers found clear fidelity gaps for all datasets and explanation models. The fidelity for disadvantaged groups was often much lower, up to 21 percent in some instances. The law school dataset had a fidelity gap of 7 percent between race subgroups, meaning the approximations for some subgroups were wrong 7 percent more often on average. If there are 10,000 applicants from these subgroups in the dataset, for example, a significant portion could be wrongly rejected, Balagopalan explains.

“I was surprised by how pervasive these fidelity gaps are in all the datasets we evaluated. It is hard to overemphasize how commonly explanations are used as a ‘fix’ for black-box machine-learning models. In this paper, we are showing that the explanation methods themselves are imperfect approximations that may be worse for some subgroups,” says Ghassemi.

Narrowing the gaps

After identifying fidelity gaps, the researchers tried some machine-learning approaches to fix them. They trained the explanation models to identify regions of a dataset that could be prone to low fidelity and then focus more on those samples. They also tried using balanced datasets with an equal number of samples from all subgroups.

These robust training strategies did reduce some fidelity gaps, but they didn’t eliminate them.

The researchers then modified the explanation models to explore why fidelity gaps occur in the first place. Their analysis revealed that an explanation model might indirectly use protected group information, like sex or race, that it could learn from the dataset, even if group labels are hidden.

They want to explore this conundrum more in future work. They also plan to further study the implications of fidelity gaps in the context of real-world decision-making.

Balagopalan is excited to see that concurrent work on explanation fairness from an independent lab has arrived at similar conclusions, highlighting the importance of understanding this problem well.

As she looks to the next phase in this research, she has some words of warning for machine-learning users.

“Choose the explanation model carefully. But even more importantly, think carefully about the goals of using an explanation model and who it eventually affects,” she says.

“I think this paper is a very valuable addition to the discourse about fairness in ML,” says Krzysztof Gajos, Gordon McKay Professor of Computer Science at the Harvard John A. Paulson School of Engineering and Applied Sciences, who was not involved with this work. “What I found particularly interesting and impactful was the initial evidence that the disparities in the explanation fidelity can have measurable impacts on the quality of the decisions made by people assisted by machine learning models. While the estimated difference in the decision quality may seem small (around 1 percentage point), we know that the cumulative effects of such seemingly small differences can be life changing.”

Reference: “The Road to Explainability is Paved with Bias: Measuring the Fairness of Explanations” by Aparna Balagopalan, Haoran Zhang, Kimia Hamidieh, Thomas Hartvigsen, Frank Rudzicz and Marzyeh Ghassemi, 2 June 2022, Computer Science > Machine Learning.
arXiv:2205.03295

This work was funded, in part, by the MIT-IBM Watson AI Lab, the Quanta Research Institute, a Canadian Institute for Advanced Research AI Chair, and Microsoft Research.

Related Posts