“AI tools are cultural artifacts, not neutral software”

Published


How to keep information trustworthy when content is increasingly machine-made? EPFL professor Andrea Cavallaro shares his view.

“Hate speeches are moving targets to erode trust.”© iStock

Few forces have the power to shape our lives as profoundly as information. In our societies, publicly shared facts determine how citizens vote, how patients are treated, how communities respond to crises… But where reliable information reinforces trust and cooperation, corrupted information can also erode these bonds.

This has always been true. What has changed dramatically in recent years is the scale and speed at which information can be produced, manipulated, and spread. Artificial intelligence has supercharged the process: today, AI tools are generating and circulating text, images, and video that are increasingly indistinguishable from human-made content. And if the system will not fix itself, who will?

Andrea Cavallaro, professor at EPFL and head of the Laboratory of Multimodal Intelligent Systems, develops the multimodal systems needed to detect hate speech across text, image, audio, and video. He is also one of the instigators of AlignAI, an ambitious EU-funded doctoral network training seventeen PhD candidates across six universities to embed human values within large language models.

Andrea Cavallaro @DR

The AlignAI project aims to convey human values to AI systems. What does that mean in practice?

The idea is to provide both an intellectual framework and a practical platform to transfer individual and societal values into learning systems. How can we characterize the values and norms that people consider important? How can we define a process to transfer those values into systems? These are the questions we are tackling. At the moment, AI systems are primarily being designed by people with engineering backgrounds. But we are no longer dealing with systems that measure physical properties: large language models (LLMs) interact with humans, and humans are inherently difficult to characterize. That’s why AlignAI is primarily populated by non-technical PhD students: social scientists, cognitive psychologists, philosophers…

Values vary enormously across cultures, individuals, contexts… How do you begin to map them?

AlignAI tests its approach across three use cases: education, mental health, and online news consumption. These are domains where the impact of LLMs is already significant, and where the stakes of get-ting alignment right are very high. One of our PhD students is working across all three fields, building a conceptualization of values. We are starting with Europe – which is already a very diverse territory -, using existing legislation as a starting point, be-cause legislation embodies the values that a society considered important enough to codify. We are collaborating with a judge to get the right angle.

People tend to trust software by default. With LLMs, how warranted is that confidence?

Automation bias is a well-known phenomenon: be-cause something is software, we may assume it deserves our trust. But we have to remember that these tools are authored. Someone decided which data to use, how to train the model, and then how to fine-tune it to limit unsafe behaviors. And what counts as safe or unsafe is generally decided by a team of engineers, who impose their choices and, by extension, their underlying biases. All of this is passed on, from the properties of the dataset to the properties of the learned model. I call this distributed authorship. The important thing is to engage users in becoming not just passive customers, but active auditors, probing edge cases, questioning value biases. AI tools are not the neutral technical tools of the previous century, that we could calibrate with-in known operating conditions. They interact with us, we shape their behavior with our prompts. By design, they please us in order to increase engagement. That dynamic is entirely new.

What are some of the challenges about detecting hate speech?

Hateful content can be concealed across different modalities: in video frames, on-screen text, audio, or spoken words. Sometimes the meaning only be-comes clear when you combine these modalities. We have developed systems that cross-reference all of these simultaneously. But hate speech also evolves: it uses coded language, sarcasm, implicit references. It is a moving target, and it erodes trust in systems, in institutions, in democracies.

Aren’t these themes a bit political for a technical school?

Another way to say this is that they are cultural. AI tools we use everyday are cultural artifacts: they have been trained on the cultural productions of human beings, primarily of certain cultures, with a large imbalance. They are a compression of digital content produced over decades. Not a dry, neutral piece of software, but a container that absorbs what humanity has created and gives us answers with a fluency that, only a few years ago, we attributed exclusively to highly educated human beings. Recognizing this changes how you design them, how you evaluate them, and how you teach about them. Many of these students will go on to build tools that inter-face with humans. They need to understand what it means to co-design with the people who will actually use the technology, rather than imposing a techno-solutionist view from above.

How can your work reach the tech giants that produce the world’s popular LLMs?

In research, we practice open science. Our findings are readily accessible as open source for any developer to verify and, if they deem it useful, to adopt. Hopefully they will. But what gives me particular hope is something I didn’t expect. I’ve been invited to give talks at very large multinational companies on what it means to embed rights and values in AI tools. I was struck by how much this project resonates, even with very technical audiences I assumed wouldn’t necessarily be intrigued. Beyond AlignAI, there are research groups around the world who care about designing tools that support the flourishing of humans, not just tools that maximize engagement.

References
Read the full interview, as well as the interview with Philippe Stoll, former ICRC delegate, in C4DT Focus #11.

Author: Gregory Wicky

Source: EPFL

Share

You might be also interested in

Mice actively seek better views to make visual decisions

A study led by EPFL shows that when objects are difficult to see, mice don’t simply look harder. They move to find better viewpoints, adjusting their behavior according to how much visual information is available.

(more…)

Nine ERC Advanced Grants awarded to EPFL researchers

The European Research Council (ERC) awarded nine “ERC Advanced Grants” to EPFL researchers. This prestigious funding scheme gives senior researchers the opportunity to pursue ambitious, curiosity-driven projects that could lead to major scientific breakthroughs.

(more…)

EPFL researchers create an AI model that thinks like we do

An EPFL team has created a new Large Language Model that is structured similarly to a human brain, allowing users more control and moving away from “black box” AI.

When a standard Large Language Model (LLM) is confronted with a problem, it tries to solve it by matching it to similar information it has seen before, and then give an answer based on those past patterns. But how it decides which information to use and what value it gives to different pieces of information can be somewhat inscrutable from the outside.

The LLM MiCRo (Mixture of Cognitive Reasoners) is architecturally divided into four specialized areas that act like different parts of the human brain, allowing users to have more control over how it approaches a question, and to better understand how it comes to its answers. The model, which was presented at the International Conference on Learning Representations, comes from the NLP Lab, part of the School of Computer and Communication Sciences (IC), and the NeuroAI Lab, part of IC and the School of Life Sciences at EPFL.

The four experts

To create MiCRo, researchers identified four regions of the brains specializing in different functions, which they call ‘experts’: language, logic, social reasoning, and world knowledge.

“The brain is organized into specialized regions, each tuned to handle a specific function. So far, we don’t see this division of labor as clearly in current language models,” says Badr AlKhamissi, a PhD candidate leading this research. “We picked four brain regions that neuroscientists know well and gave the model its own specialized modules, each one trained to be analogous to one of those brain regions.”

An LLM usually functions as a stack of layers that a problem or question can be processed through. In the case of MiCRo, each layer is divided into the four different experts. You give a sentence to the model starting at layer one, for example “The cat is asleep”. Then within this layer, the router can choose one expert for the first word “the”, but a different epxert for second word “cat” and so on, making it modular and highly adaptable.

“Each word of a sentence can go to different experts,” AlKhamissi explains. “So one sentence can actually be processed by multiple experts at each layer.”

Consider a prompt like: “Emma wants to split a CHF 60 dinner bill among three friends, but she knows that Jake lost his job last week and is too proud to say he’s struggling.” A purely mathematical module handles the arithmetic: CHF 60 divided by three is CHF 20 each. But the social reasoning module picks up on something subtler: Emma’s awareness of Jake’s situation, his unspoken pride, and the implicit suggestion that she might quietly cover his share. Both kinds of reasoning are needed to fully understand what’s going on, and in MiCRo, each aspect of the prompt is routed to the expert best equipped to handle it.

“When we see how the model works, we can see that it routes the words that relate to the social aspects to the social expert, and when it does the mathematical part, it routes those numbers to the logic expert.”

This separation makes it easier to see how the model is ‘thinking’ and why it makes certain decisions. It also means decisions can be steered – for example, you can decide to increase the impact of the social expert, or suppress the logic expert, depending on what kind of model you want to use in a certain situation.

“In traditional LLMs, you can do this via prompting by telling the model to make the output more social or make it more related to emotions,” AlKhamissi says. “But here, this is done by intervening in the architecture itself without doing any prompting.”

“A virtuous circle”

To create MiCRo, the EPFL team worked with Greta Tuckute, a neuroscientist from Harvard and MIT, to understand which parts of the human brain are activated by different problems, and then applied that learning to the model.

To identify the region analogous to the ‘logic’ expert in the brain, neuroscientists give humans demanding tasks, such as hard mathematical equations, and less demanding tasks, like easy mathematical equations, and then recorded their brain activity to find which brain regions are the most active for the demanding tasks versus non-demanding tasks. AlKhamissi’s team then did the same for the model, giving it demanding mathematical equations to see which experts would be most activated.

“The cool thing is we just used exactly what they do in neuroscience, but in the model. And the model was able to identify those experts on its own.”

While neuroscience informs the model, the model also informs the understanding of the brain, potentially allowing neuroscientists to discover the contributions of different areas for a given problem or question; for example that a certain sentence activates the language areas 20%, the mathematical areas 50%, and the social reasoning areas 40%.

“For my PhD work, I have been interested in this virtuous circle between neuroscience and AI. In one direction, we use findings and insights from neuroscience about the brain and integrate them into language models,” AlKhamissi says, “and now, with models like MiCRo, we can explore the other direction and ask how we can use AI models to help us understand the brain in a better way.”

Author: Stephanie Parker
Source: EPFL