In our last blog post, we established what explainability is and why it matters. Now comes the harder part: Figuring out how to make it work.
Like anything related to AI, the answer is not simple and there is no one-size-fits-all solution. Instead, there is a broad range of methods, processes, and design strategies that are used in combination by a wide range of actors to open the “Black Box”.
At the technical level, researchers have developed a range of methods aimed at interpreting how models function and make decisions. “Feature importance” techniques like LIME and SHAP help to identify the importance of specific input variables that particularly influence the output (Holzinger et al., 2022). This could mean for example that a person who was denied a loan could specifically inquire about what input features like credit score, age, employment, etc., influenced the decision the greatest. “Example-based methods” explain decisions by comparing them to similar training examples (McDermid et al., 2021). Yet other approaches aim to simplify complexity by, for example, approximating a black-box system with a more interpretable one, such as a decision tree or a linear model (Guidotti et al., 2018). Other researchers advocate for prioritising the use of inherently interpretable models like rule-based systems or decision trees over more complex models (Ali et al., 2023; McGrath & Jonker, 2024). Furthermore, there are a range of visualization techniques that can be employed to highlight which parts of an image or sentence a model was “focused on” during prediction (Holzinger et al., 2022; Miller, 2018).
Techniques are often used in conjunction with one another to approach AI explainability today. However, their use raises a deeper question in terms of the actual level of understanding that they can provide. Considering not everyone is an expert in the field, a good explanation is not just technically accurate; it is intelligible, context-aware, and tailored to the recipient. An explanation that might clarify a model’s logic for an engineer could leave a policymaker, doctor, or patient completely in the dark.
Apart from technical efforts, lawmakers have increasingly recognized the importance of explainability, particularly in high-stakes domains. Legal frameworks like the GDPR include a right to explanation for individuals subject to automated decision-making, and the EU AI Act introduces transparency requirements for high-risk AI systems, demanding that their outputs be interpretable and justifiable. However, while these frameworks provide crucial normative pressure, they often stop short of addressing the how. As Chung et al. (2024) point out, such frameworks frequently lack the technical specificity needed to meaningfully operationalize explainability in practice. Without concrete guidance on what counts as an explanation or how to generate one across different model types, legal compliance risks becoming a mere formalistic exercise and ticking of regulatory boxes without guaranteeing real transparency. To avoid ambiguous and vague legal standards and to support genuine explainability, they must be tightly linked to the technical realities of AI development.
In the end, explainability isn’t just about exposing what the model is doing; it’s about enabling us to ask the questions we want and receive answers that are genuinely comprehensible to us.
At alignAI, explainability is one of our core working principles. We view explainability as “a key enabler for all aspects of trustworthiness, accelerating development, promoting usability, and facilitating human oversight and auditing of LLMs”. From applications in education and mental health to media consumption, we explore how explainability can help to accomplish exactly those things.
References:
Ali, S., Abuhmed, T., El-Sappagh, S., Muhammad, K., Alonso-Moral, J. M., Confalonieri, R., Guidotti, R., Del Ser, J., Díaz-Rodríguez, N. & Herrera, F. (2023). Explainable Artificial Intelligence (XAI): What we know and what is left to attain Trustworthy Artificial Intelligence. Information Fusion, 99, 101805. https://doi.org/10.1016/j.inffus.2023.101805
Chung, N. C., Chung, H., Lee, H., Chung, H., Brocki, L. & Dyer, G. (2024). False Sense of Security in Explainable Artificial Intelligence (XAI). arXiv (Cornell University). https://doi.org/10.48550/arxiv.2405.03820
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Pedreschi, D. & Giannotti, F. (2018). A survey of methods for explaining black box models. arXiv (Cornell University). https://doi.org/10.48550/arxiv.1802.01933
Holzinger, A., Saranti, A., Molnar, C., Biecek, P. & Samek, W. (2022). Explainable AI Methods – A brief overview. In Lecture notes in computer science (P. 13–38). https://doi.org/10.1007/978-3-031-04083-2_2
McDermid, J. A., Jia, Y., Porter, Z. & Habli, I. (2021). Artificial intelligence explainability: the technical and ethical dimensions. Philosophical Transactions Of The Royal Society A Mathematical Physical And Engineering Sciences, 379(2207), 20200363. https://doi.org/10.1098/rsta.2020.0363
McGrath, A. & Jonker, A. (2024, 8. October). AI interpretability. What is AI interpretability? https://www.ibm.com/think/topics/interpretability
Miller, T. (2018). Explanation in artificial intelligence: Insights from the social sciences. Artificial Intelligence, 267, 1–38. https://doi.org/10.1016/j.artint.2018.07.007
Further reading/watching/listening:
Books & Articles:
Molnar, C. (2019). Interpretable Machine Learning: A Guide for Making Black Box Models Explainable. In Leanpub. https://originalstatic.aminer.cn/misc/pdf/Molnar-interpretable-machine-learning_compressed.pdf
Ribeiro, M. T., Singh, S. & Guestrin, C. (2016, 16. February). „Why Should I Trust You?“: Explaining the Predictions of Any Classifier. arXiv.org. https://arxiv.org/abs/1602.04938
Videos & Podcasts:
“Human-Centered Explainable AI: From Algorithms to User Experiences” Q. Vera Liao talks at Stanford.