Design and Development of an Ethical and Moral Values Audit Toolkit for Detecting Bias and Fairness in Machine Learning-based Text Classifiers

José Luis Benítez Santana. (2024). Design and Development of an Ethical and Moral Values Audit Toolkit for Detecting Bias and Fairness in Machine Learning-based Text Classifiers. Trabajo Fin de Titulación (TFM). Universidad Politécnica de Madrid, ETSI Telecomunicación.

Abstract:
Human language is one of the most powerful tools at our disposal. When using it, issues are often not objectively exposed. Instead, the message is contained in a frame that captures how the story is presented, including the moral and emotional perspective of the speaker. This frame directly relates to how people organize their beliefs and label their ideas. Using the language can then divulge emotions and moral values, leading to biased information. In this context, bias is defined as the tendency of models and algorithms to reflect, amplify, or perpetuate prejudices and inequalities. When training text classifiers, using biased corpora can propagate these biases into the classification task. In the same way, bias can also appear when models are developed following bad practices, such as using an imbalanced corpus. This work focuses on developing a framework and toolkit to audit text classifiers regarding ethics and moral values. It starts by thoroughly studying the presence and different types of existent bias. Then, we study a way to categorize and quantify moral values, finally selecting the Moral Foundation Theory, which outlines that human morality, regardless of the culture and social aspects, is built on a range of five universal moral foundations: care, fairness, loyalty, authority, and purity. We continue implementing several techniques to detect and mitigate bias by adapting state-of-the-art methods to our particular objectives or by designing new ones. Finally, a web microservice-oriented system is proposed to present a webapp through which auditors can easily interact with the techniques implementations. The evaluation of this work demonstrates that bias can be effectively detected and mitigated in most cases by combining the proposed techniques. However, we also conclude that moral bias in human language is a complex concept, with its presence, detection, and mitigation depending heavily on the specific context. This implies that the auditor must carefully select the appropriate techniques to achieve successful outcomes