Know Whether It Is Safe
Evaluation beyond benchmark accuracy, hallucination and failure characterization, and robustness under temporal and institutional shift.
COLM 2026 · October 9 · San Francisco
A full-day workshop on what it actually takes to responsibly deploy language and vision-language models in clinical settings.
Contact: colm.daih2026@gmail.com
Benchmark performance is a poor proxy for real-world readiness. DAIH brings together machine learning researchers, physicians, health-system leaders, and policy researchers to tackle safety, equity, privacy, regulation, workflow fit, and post-deployment monitoring.
We invite non-archival submissions on the responsible deployment of large language and vision-language models in clinical settings.
All deadlines are Anywhere on Earth (AoE).
We welcome novel work relating to the responsible deployment of LLMs and VLMs in healthcare, including (but not limited to):
We solicit non-archival submissions in two categories.
Up to 8 pages excluding references. Technical work on evaluation, safety, robustness, multimodal modeling, fairness, privacy, governance, monitoring, or workflow integration. Accepted papers will be presented as spotlight talks or posters.
Up to 4 pages excluding references. Practical accounts from clinical, operational, or industry deployments — including work-in-progress, negative results, and lessons learned. Accepted case reports will be presented as posters.
A full-day program balancing invited talks, contributed spotlights, poster sessions, a structured roundtable lunch, and a deployment-focused panel.








