“Fair Enough?” – Who Wins, Who Loses and Why AI Needs to do Better Than Just Working for Most

When people think about AI, they often imagine objectivity. They imagine algorithms that soberly follow data and numbers, unaffected by personal opinions, emotions, or prejudice. But here’s the problem: AI systems don’t fall out of the sky. Humans develop them, they’re trained on human-generated data, shaped by human choices, and deployed in human contexts – all of which are far from neutral.

Opening the Black Box – How AI Explainability Is Being Approached

At the technical level, researchers have developed a range of methods aimed at interpreting how models function and make decisions. “Feature importance” techniques like LIME and SHAP help to identify the importance of specific input variables that particularly influence the output (Holzinger et al., 2022). This could mean for example that a person who was denied a loan could specifically inquire about what input features like credit score, age, employment, etc., influenced the decision the greatest. “Example-based methods” explain decisions by comparing them to similar training examples (McDermid et al., 2021). Yet other approaches aim to simplify complexity by, for example, approximating a black-box system with a more interpretable one, such as a decision tree or a linear model (Guidotti et al., 2018). Other researchers advocate for prioritising the use of inherently interpretable models like rule-based systems or decision trees over more complex models (Ali et al., 2023; McGrath & Jonker, 2024). Furthermore, there are a range of visualization techniques that can be employed to highlight which parts of an image or sentence a model was “focused on” during prediction (Holzinger et al., 2022; Miller, 2018).

Can You Trust What You Don’t Understand? Why AI Needs to Explain Itself!

Most of the time, we don’t question the systems around us until they fail. When planes crash, treatments go wrong, or loans are denied, we ask, “What happened, why, and who’s responsible?”

As the development of AI is rapidly progressing and systems take on more and more power in deciding what we see, what we get, and what we do, a true understanding of how they get to those decisions is crucial in order for us not to lose control. Today’s systems provide outputs with a suspicious amount of confidence. However, when asked why, they often can’t or won’t tell us, revealing a complex, data-driven, and incomprehensible “black box” even for its creators (Maclure, 2021; Guidotti et al., 2018; Kosinski, 2024).

Making LLM Alignment Work – The Need for Collaborative Research

Ensuring that LLMs align with human values is not an easy task. Alignment is particularly challenging because human values are not static, universal, or easily quantifiable and codifiable. What is considered ethical, fair, or appropriate varies significantly across cultures, political ideologies, and social contexts, making it difficult to establish a one-size-fits-all alignment approach (Liu et al., 2023; Shen et al., 2023). An output considered neutral or factual in one country might be seen as biased or controversial in another, specifically when thinking about political values like democracy. Similarly, ethical priorities and dilemmas like whether AI should prioritise free speech or harm prevention are often in direct conflict with one another.

Why do LLMs Need Ethical Alignment? – The Risks of Misaligned AI.

why do LLMs need ethical alignment?

“As machine-learning systems grow not just increasingly pervasive but increasingly powerful, we will find ourselves more and more often in the position of the ‘sorcerer’s apprentice’: we conjure a force, autonomous but totally compliant, give it a set of instructions, then scramble like mad to stop it once we realize our instructions are imprecise or incomplete—lest we get, in some clever, horrible way, precisely what we asked for. How to prevent such a catastrophic divergence—how to ensure that these models capture our norms and values, understand what we mean or intend, and, above all, do what we want—has emerged as one of the most central and most urgent scientific questions in the field of computer science. It has a name: the alignment problem.” – Brian Christian, The Alignment Problem (2020, p.19-20).

Contact Us

FIll out the form below and we will contact you as soon as possible