LLMs are always at the ready, offering a convenient, generally nonjudgmental resource for people in search of help. The problem, especially in high-stakes domains like eating disorders (ED), is that the risks are often indirect and culturally normalized.
Millions of people turn to AI models daily — not just for information but for emotional support, including in moments of acute distress.
Yet most AI evaluation frameworks were not designed for these high-stakes interactions. They often miss the nuance, judgment and clinical sensitivity required when someone may be at risk of suicide. The recently released mPACT Suicide Benchmark was designed to address this crucial gap.