Unifying behavioral analysis through animal foundation models

Published

Behavioral analysis can provide a lot of information about the health status or motivations of a living being. A new technology developed at EPFL makes it possible for a single deep learning model to detect animal motion across many species and environments. This “foundational model”, called SuperAnimal, can be used for animal conservation, biomedicine, and neuroscience research.

Although there is the saying, “straight from the horse’s mouth”, it’s impossible to get a horse to tell you if it’s in pain or experiencing joy. Yet, its body will express the answer in its movements. To a trained eye, pain will manifest as a change in gait, or in the case of joy, the facial expressions of the animal could change. But what if we can automate this with AI? And what about AI models for cows, dogs, cats, or even mice? Automating animal behavior not only removes observer bias, but it helps humans more efficiently get to the right answer.

Strides of a horse detected with SuperAnimal; from Ye et al. 2024 Nature Communications.

Today marks the beginning of a new chapter in posture analysis for behavioral phenotyping. Mackenzie Mathis’ laboratory at EPFL publishes a Nature Communications article describing a particularly effective new open-source tool that requires no human annotations to get the model to track animals. Named “SuperAnimal”, it can automatically recognize, without human supervision, the location of “keypoints” (typically joints) in a whole range of animals – over 45 animal species – and even in mythical ones!

Image of a mystical Wolpertinger generated by GPT-4 by Mackenzie Mathis with SuperAnimal keypoints.

“The current pipeline allows users to tailor deep learning models, but this then relies on human effort to identify keypoints on each animal to create a training set,” explains Mackenzie Mathis. “This leads to duplicated labeling efforts across researchers and can lead to different semantic labels for the same keypoints, making merging data to train large foundation models very challenging. Our new method provides a new approach to standardize this process and train large-scale datasets. It also makes labeling 10 to 100 times more effective than current tools.”

The “SuperAnimal method” is an evolution of a pose estimation technique that Mackenzie Mathis’ laboratory had already distributed under the name “DeepLabCut™️.” You can read more about this game-changing tool and its origin in this new Nature technology feature.

“Here, we have developed an algorithm capable of compiling a large set of annotations across databases and train the model to learn a harmonized language – we call this pre-training the foundation model,” explains Shaokai Ye, a PhD student researcher and first author of the study. “Then users can simply deploy our base model or fine-tune it on their own data, allowing for further customization if needed.”

These advances will make motion analysis much more accessible. “Veterinarians could be particularly interested, as well as those in biomedical research – especially when it comes to observing the behavior of laboratory mice. But it can go further,” says Mackenzie Mathis, mentioning neuroscience and… athletes (canine or otherwise)! Other species – birds, fish, and insects – are also within the scope of the model’s next evolution. “We also will leverage these models in natural language interfaces to build even more accessible and next-generation tools. For example, Shaokai and I, along with our co-authors at EPFL, recently developed AmadeusGPT, published recently at NeurIPS, that allows for querying video data with written or spoken text. Expanding this for complex behavioral analysis will be very exciting.” SuperAnimal is now available to researchers worldwide through its open-source distribution (github.com/DeepLabCut).

References

SuperAnimal pretrained pose estimation models for behavioral analysis, by Ye et. al., Nature Communications, 12 June 2024, DOI: 10.1038/s41467-024-48792-2

Author: Emmanuel Barraud

Source: EPFL

Share

You might be also interested in

EPFL researchers create an AI model that thinks like we do

An EPFL team has created a new Large Language Model that is structured similarly to a human brain, allowing users more control and moving away from “black box” AI.

When a standard Large Language Model (LLM) is confronted with a problem, it tries to solve it by matching it to similar information it has seen before, and then give an answer based on those past patterns. But how it decides which information to use and what value it gives to different pieces of information can be somewhat inscrutable from the outside.

The LLM MiCRo (Mixture of Cognitive Reasoners) is architecturally divided into four specialized areas that act like different parts of the human brain, allowing users to have more control over how it approaches a question, and to better understand how it comes to its answers. The model, which was presented at the International Conference on Learning Representations, comes from the NLP Lab, part of the School of Computer and Communication Sciences (IC), and the NeuroAI Lab, part of IC and the School of Life Sciences at EPFL.

The four experts

To create MiCRo, researchers identified four regions of the brains specializing in different functions, which they call ‘experts’: language, logic, social reasoning, and world knowledge.

“The brain is organized into specialized regions, each tuned to handle a specific function. So far, we don’t see this division of labor as clearly in current language models,” says Badr AlKhamissi, a PhD candidate leading this research. “We picked four brain regions that neuroscientists know well and gave the model its own specialized modules, each one trained to be analogous to one of those brain regions.”

An LLM usually functions as a stack of layers that a problem or question can be processed through. In the case of MiCRo, each layer is divided into the four different experts. You give a sentence to the model starting at layer one, for example “The cat is asleep”. Then within this layer, the router can choose one expert for the first word “the”, but a different epxert for second word “cat” and so on, making it modular and highly adaptable.

“Each word of a sentence can go to different experts,” AlKhamissi explains. “So one sentence can actually be processed by multiple experts at each layer.”

Consider a prompt like: “Emma wants to split a CHF 60 dinner bill among three friends, but she knows that Jake lost his job last week and is too proud to say he’s struggling.” A purely mathematical module handles the arithmetic: CHF 60 divided by three is CHF 20 each. But the social reasoning module picks up on something subtler: Emma’s awareness of Jake’s situation, his unspoken pride, and the implicit suggestion that she might quietly cover his share. Both kinds of reasoning are needed to fully understand what’s going on, and in MiCRo, each aspect of the prompt is routed to the expert best equipped to handle it.

“When we see how the model works, we can see that it routes the words that relate to the social aspects to the social expert, and when it does the mathematical part, it routes those numbers to the logic expert.”

This separation makes it easier to see how the model is ‘thinking’ and why it makes certain decisions. It also means decisions can be steered – for example, you can decide to increase the impact of the social expert, or suppress the logic expert, depending on what kind of model you want to use in a certain situation.

“In traditional LLMs, you can do this via prompting by telling the model to make the output more social or make it more related to emotions,” AlKhamissi says. “But here, this is done by intervening in the architecture itself without doing any prompting.”

“A virtuous circle”

To create MiCRo, the EPFL team worked with Greta Tuckute, a neuroscientist from Harvard and MIT, to understand which parts of the human brain are activated by different problems, and then applied that learning to the model.

To identify the region analogous to the ‘logic’ expert in the brain, neuroscientists give humans demanding tasks, such as hard mathematical equations, and less demanding tasks, like easy mathematical equations, and then recorded their brain activity to find which brain regions are the most active for the demanding tasks versus non-demanding tasks. AlKhamissi’s team then did the same for the model, giving it demanding mathematical equations to see which experts would be most activated.

“The cool thing is we just used exactly what they do in neuroscience, but in the model. And the model was able to identify those experts on its own.”

While neuroscience informs the model, the model also informs the understanding of the brain, potentially allowing neuroscientists to discover the contributions of different areas for a given problem or question; for example that a certain sentence activates the language areas 20%, the mathematical areas 50%, and the social reasoning areas 40%.

“For my PhD work, I have been interested in this virtuous circle between neuroscience and AI. In one direction, we use findings and insights from neuroscience about the brain and integrate them into language models,” AlKhamissi says, “and now, with models like MiCRo, we can explore the other direction and ask how we can use AI models to help us understand the brain in a better way.”

Author: Stephanie Parker
Source: EPFL

Smarter waste sorting with AI

EPFL startup WasteFlow has developed an AI-powered copilot that identifies and measures waste streams, helping sorting facilities work more efficiently. Support from several EPFL entrepreneurship programs helped the company accelerate the development of its technology.

(more…)

EPFL launches the world’s first fully open medical LLMs

MeditronFO is the first fully open framework for building medical large language models, to make AI in healthcare more transparent and accountable.

(more…)