On October 23, 2025, the Whiting School of Engineering’s Department of Computer Science hosted a significant talk by Aaron Roth, a professor of computer and cognitive science at the University of Pennsylvania. His presentation, titled “Agreement and Alignment for Human-AI Collaboration,” addressed the complexities of integrating artificial intelligence (AI) into decision-making processes across various fields, particularly in healthcare.
Understanding Human-AI Interaction
Roth’s discussion revolved around findings from three important papers: “Tractable Agreement Protocols,” presented at the 2025 ACM Symposium on Theory of Computing; “Collaborative Prediction: Tractable Information Aggregation via Agreement,” from the ACM-SIAM Symposium on Discrete Algorithms; and “Emergent Alignment from Competition.” Each of these works explores how AI can assist humans in making critical decisions, using the healthcare sector as a primary example.
In a clinical setting, Roth illustrated how AI could support doctors in diagnosing patients. The AI generates predictions based on various factors, including previous diagnoses, blood types, and other symptoms. These predictions are then reviewed by the physician, who can either concur or disagree based on their clinical knowledge and observations. If a disagreement arises, Roth explained that both the doctor and the AI can enter a dialogue, sharing perspectives iteratively until they reach a consensus.
This process relies on the concept of a “common prior,” where both the doctor and the AI operate under shared initial assumptions about the situation, even if they possess different information. Roth termed this dynamic “Perfect Bayesian Rationality,” wherein each party acknowledges the other’s knowledge limitations. Despite its potential, Roth noted the challenges inherent in establishing a common prior, especially in complex scenarios such as hospital diagnostic codes.
Calibration and Agreement in Decision-Making
Roth introduced the concept of calibration, essential for achieving accurate agreements between humans and AI. He likened it to testing a weather forecaster’s accuracy. “You can design tests that would show whether forecasts reflect true probabilities,” Roth stated. “Calibration enables us to evaluate these forecasts effectively.”
In the context of doctor-AI interactions, Roth explained that conversation calibration allows each party to adjust its claims based on the other’s inputs. For instance, if an AI assesses a treatment risk at 40% and the doctor estimates it at 35%, the AI will modify its subsequent claims to fall between these two figures. This iterative process continues until a mutual agreement is achieved, ultimately making the negotiation quicker and more efficient.
However, Roth highlighted that this ideal scenario presumes both parties share the same objectives. In cases where an AI is developed by a pharmaceutical company, underlying motivations could skew the treatment recommendations. To mitigate this risk, Roth suggested that doctors consult multiple large language models (LLMs) from different companies. By doing so, physicians can compare recommendations and select the best course of action. This competitive dynamic among LLM providers may drive improvements in model accuracy and reduce biases.
Roth concluded his talk by addressing the concept of “real probabilities,” which represent the underlying truths of how the world operates. While these probabilities provide the most accurate insights, he emphasized that achieving such precision is often unnecessary. Instead, unbiased estimates under specific conditions can suffice. This pragmatic approach enables doctors and AI to collaborate effectively, leading to more accurate diagnoses and treatment recommendations.
The insights shared by Aaron Roth at the Whiting School of Engineering underscore the potential of AI to enhance human decision-making, particularly in critical fields like healthcare. As researchers continue to explore these relationships, the future holds promise for more effective human-AI collaboration.
