COLM 2026 · October 9 · San Francisco

DAIH: LLM/VLM Deployment Opportunities and Risks in Healthcare

A full-day workshop on what it actually takes to responsibly deploy language and vision-language models in clinical settings.

Contact: colm.daih2026@gmail.com

Why DAIH

Clinical AI Needs a Deployment Science

Benchmark performance is a poor proxy for real-world readiness. DAIH brings together machine learning researchers, physicians, health-system leaders, and policy researchers to tackle safety, equity, privacy, regulation, workflow fit, and post-deployment monitoring.

01

Know Whether It Is Safe

Evaluation beyond benchmark accuracy, hallucination and failure characterization, and robustness under temporal and institutional shift.

02

Know Whether It Is Equitable

Fairness across patient populations, privacy-preserving deployment, legal soundness, and emerging regulatory constraints.

03

Know Whether It Works

Multimodal clinical applications, human-AI collaboration, operational lessons, and the practical reality of production deployment.

DAIH @ COLM 2026

Call for Papers

We invite non-archival submissions on the responsible deployment of large language and vision-language models in clinical settings.

Submissions open
Submission deadline
Reviews due
Notification

All deadlines are Anywhere on Earth (AoE).

Submit on OpenReview

Topics

We welcome novel work relating to the responsible deployment of LLMs and VLMs in healthcare, including (but not limited to):

Real-world deployment and clinical integration

  • Multimodal clinical reasoning and decision support
  • Human–AI collaboration in clinical workflows
  • Operational integration and workflow design
  • Clinical documentation, summarization, and chart review
  • Patient-facing communication and health literacy
  • Negative results and failure case analyses from real-world deployments
  • Lessons learned from industry and health system implementations

Safety and reliability

  • Clinical evaluation beyond benchmark accuracy
  • Hallucination detection, characterization, and mitigation in medical contexts
  • Robustness under temporal, institutional, and demographic distribution shift
  • Failure-mode characterization and safety monitoring
  • Post-deployment surveillance and error analysis

Equity, privacy, and governance

  • Fairness and bias auditing across patient populations
  • Privacy-preserving methods for clinical LLM/VLM deployment
  • Regulatory compliance and emerging governance frameworks
  • Accountability and liability in clinical AI systems

Submission Tracks

We solicit non-archival submissions in two categories.

Research Papers

Up to 8 pages excluding references. Technical work on evaluation, safety, robustness, multimodal modeling, fairness, privacy, governance, monitoring, or workflow integration. Accepted papers will be presented as spotlight talks or posters.

Deployment Case Reports

Up to 4 pages excluding references. Practical accounts from clinical, operational, or industry deployments — including work-in-progress, negative results, and lessons learned. Accepted case reports will be presented as posters.

Submission Guidelines

Platform
OpenReview
Format
Papers must use the COLM 2026 LaTeX template.
Review process
Double-blind. Authors should anonymize their submissions accordingly; papers that are not properly anonymized will be desk rejected. Related arXiv papers by the same authors do not break anonymity; if cited, they should be cited in third person.
Conflict of interest
Conflicts will be managed per COLM policy. Conflicted organizers will be recused from assignment, discussion, and decisions for affected papers.
Dual-submission policy
We accept submissions of ongoing unpublished work as well as work currently in submission elsewhere. We also welcome substantial extensions of works previously presented at non-archival venues. However, we do not accept work that has already been published in a journal or included in conference proceedings (including the COLM main conference).
LLM usage policy
Authors must follow COLM 2026 policies on Large Language Model usage.
Program

Workshop Schedule

A full-day program balancing invited talks, contributed spotlights, poster sessions, a structured roundtable lunch, and a deployment-focused panel.

Morning

Opening RemarksOrganizers
Keynote: Sanmi Koyejo
Invited Talk: Danielle Bitterman
Contributed Spotlights and Poster Session ISelected research and deployment reports
Invited Talk: Majid Afshar

Midday and Afternoon

Roundtable LunchStakeholder needs, deployment risks, industry constraints, and research needs
Invited Talk: Emma Pierson
Contributed Spotlights and Poster Session IISelected research and deployment reports
Panel DiscussionWei Liu, Corinna Loeckenhoff, Chufan Gao, Tymor Hamamsy, Martin Seneviratne, and Tristan Naumann
Closing RemarksOrganizers
Organizing Team

Organizers

Yuexing Hao

Yuexing Hao

MIT · Microsoft

Profile
Shan Chen

Shan Chen

Phare Health · R37 Lab

Profile
Stella Li

Stella Li

University of Washington

Profile
Tom Hartvigsen

Tom Hartvigsen

University of Virginia

Profile
Jason Fries

Jason Fries

Stanford Medicine

Profile
Jack Gallifant

Jack Gallifant

Phare Health · R37 Lab

Profile
Angelina Wang

Angelina Wang

Cornell University

Profile
Yulia Tsvetkov

Yulia Tsvetkov

University of Washington

Profile