Get Started

mPACT™ Benchmarks

As AI systems increasingly engage in sensitive mental health conversations, rigorous, clinically grounded evaluation is essential. mPACT is a psychologist-led benchmarking framework designed to assess how AI models respond in high-risk scenarios.

About Methodology   Get In Touch



This report presents research findings from a clinician-developed benchmark evaluating large language model behavior in simulated suicide-related conversations. It is intended for researchers, developers, regulators, and clinicians studying the safety of AI systems… More

Findings characterize the specific model versions identified herein, accessed through their respective public APIs in default configuration, without system prompts or additional safety scaffolding. Production deployments of these models may include additional safety layers, system prompts, or content policies that were not evaluated and that may materially affect model behavior. Results should not be generalized to other model versions, configurations, or deployment contexts.

This benchmark is a research instrument. It does not constitute clinical advice, and is not a certification, endorsement, or warranty of any AI system’s safety or fitness for any clinical, consumer, or other use. Findings should not be relied upon to make clinical decisions or as a substitute for the judgment of a qualified mental health professional. Product and company names are used for identification only and do not imply affiliation with or endorsement by the named entities.

mPACT
Suicide Benchmark

Evaluates how effectively models detect, assess, and respond to conversations involving suicidal ideation and self-harm.

More Results

mPACT
Eating Disorder Benchmark

Assesses model performance in identifying disordered eating signals and providing clinically appropriate support.

More Results

mPACT
Misinformation Benchmark

Assesses how frequently models generate or propagate inaccurate or misleading information, evaluating their reliability in providing truthful, evidence-based responses.

More Results