Plenary Session II
HOW WE TALK AND TEACH IN THE OPERATING ROOM: USING LIVE OPERATIVE RECORDINGS TO EVALUATE RESIDENT AUTONOMY
Katharine E Caldwell, MD, MSCI, Blake T Beneville, MD, Jenna Bennett, Mohamed A Jama, Cory Fox, Mike Ferzoco, Lauren Lewis, Jonathan Tong, Michael A Awad, MD, PhD, MHPE; Washington University in Saint Louis
Introduction
Autonomy and feedback are essential to surgical training, yet programs must rely on brief end-of-case ratings, rather than direct analysis of what faculty and trainees actually say in the operating room. Trainees report wanting more specific, real-time feedback, and faculty lack objective measures of how much guidance they provide. We asked whether patterns of live operative dialogue map to resident autonomy on the Zwisch scale and could inform entrustability and coaching.
Methods
We audio-recorded attending-trainee interactions during 25 general surgery operations (cholecystectomy, hernia repair, colectomy; open, laparoscopic, robotic). Recordings were de-identified, transcribed, and corrected. A codebook labeled five categories (technical instruction, instrument requests, shared mental modelling (verbalizing anatomy/next step/strategy), feedback (positive or constructive), and off-target talk (non-procedural conversation). After each case, attendings rated resident autonomy by Zwisch scale (Show and Tell, Active Help, Passive Help, Supervision Only). We analyzed lower-autonomy cases (Show and Tell/Active Help) versus higher-autonomy cases (Passive Help/Supervision Only). Coding was supported by automated tools and verified by investigators.
Results
More autonomous learners contributed more intraoperative speech. (33.4% v 13%, p=0.03), approached attending levels in shared mental modelling (47.4% v 16.9%, p=0.01), initiated more instrument request (49.3% v 16.4%, p=0.03), and required fewer explicit technical instruction events per case (p=0.03). Total feedback statements per case were similar (p=0.14), but attendings in higher-autonomy cases delivered more positive feedback (70.0% v 35.9%, p=0.05). Dyads in higher-autonomy cases also showed more off-target talking (45.1% v 11.5%, p=0.04), consistent with decreased need for continuous coaching. After training, the AI model was able to correctly identify resident autonomy levels by Zwisch rating with 90.9% accuracy to attending ratings.
Conclusions
Distinct, quantifiable speech signatures mark higher Zwisch autonomy: less step-by-step technical direction, more resident-initiated control, more forward-looking shared mental modeling, and proportionally more positive feedback. Capturing OR dialogue provides a scalable, behavior-based supplement to Zwisch and EPA ratings, supports targeted faculty coaching, and enables structured post-case resident feedback, linking what is said in the OR to decisions about entrustment, graduated responsibility, and practice readiness.
ARTIFICIAL INTELLIGENCE-ENABLED EVALUATION OF LAPAROSCOPIC PEG TRANSFER PERFORMANCE
Terrance Peng, MD, MPH, Armin Alipour, MS, Derek Chen, Grace Huang, Raul J Rosenthal, MD, Yijun Chen, MD, Peyman Benharash, MD; UCLA
INTRODUCTION: Successful completion of the Fundamentals of Laparoscopic Surgery (FLS) is a prerequisite for the American Board of Surgery Qualifying Exam and remains a cornerstone of general surgery residency training in the United States. Current evaluation for FLS relies heavily on faculty observation, which can limit the frequency and objectivity of feedback. We aimed to develop and validate a computer vision-based AI model to autonomously evaluate performance and score the laparoscopic PEG transfer task.
METHODS: General surgery residents and medical students at an academic medical center were recorded performing the FLS PEG transfer task. Using a rubric accounting for instrument handling, economy of motion, and efficiency, videos were independently scored by two adjudicators into three skill levels: Beginner, intermediate, and expert. A computer vision pipeline was constructed using the You Only Look Once (YOLO) algorithm for object detection and ByteTrack for instrument tracking. Raw trajectory coordinates were processed through feature engineering to generate motion-based metrics reflecting technical precision and fluidity, including total path length, jitter, dwell time, and efficiency. These features were used to train machine learning classifiers for automated skill classification.
RESULTS: A total of 157 videos of the laparoscopic PEG transfer task were recorded, of which 110 were used to develop the model. The computer vision pipeline achieved 85-95% detection confidence and 87-95% tracking uptime in high-quality recordings. All classifiers demonstrated strong discriminative performance, accurately distinguishing skill levels. Random Forest achieved the highest area under the curve (AUC = 0.951), followed by Support Vector Machines (0.936), Gradient Boosting (0.926), and Logistic Regression (0.912), for an overall mean AUC of 0.930 (Figure 1). The superior performance of ensemble methods suggested that non-linear feature interactions were critical for accurate skill classification.
CONCLUSION: Our AI-driven computer vision model accurately tracked instrument motion and reliably distinguished skill levels during the FLS PEG transfer task. This model may offer a scalable alternative to traditional expert-based evaluation with potential to provide instant feedback without additional faculty burden, thereby promoting more efficient and equitable technical skill development across training programs.

FACULTY PERSPECTIVES ON PERSONAL AND INSTITUTIONAL BARRIERS TO EDUCATION:
“I APPARENTLY STRESS PEOPLE OUT IN A VERY UNINTENTIONAL WAY”
Jonathan D'Angelo, PhD, MAEd1, Oviya Giri, MBBS1, Aashna Mehta, MD1, Mohamed Baloul, MD1, Mariela Rivera, MD1, Rebecca Busch, MD2, Anne-Lise D'Angelo, MD, MSEd1; 1Mayo Clinic - Rochester, 2University of Wisconsin
Introduction
Significant research has focused on how surgical faculty can enhance their teaching skills, but less on identifying barriers to achieving quality instruction. In fact, no research to our knowledge has examined the degree to which faculty are aware of how their personal behaviors may cause trainees to struggle. The aim of this research was to identify personal and institutional barriers to surgical education.
Methods
A survey was distributed to surgical faculty at three institutions focusing on the learning environment. This analysis considered a series of questions on personal behaviors or institutional barriers that may impede the learning environment (open-ended questions) and demographics. A thematic analysis was conducted on the qualitative responses.
Results
Fifty-six surgeons responded to the survey (52% female; M=9.48, SD=8.07 years in practice).
Fifty surgeons (89%) identified at least one institutional barrier to education (M=1.53±1.33). Most frequently cited barriers were time constraints and workload (63%), followed by trainee continuity and preparedness (16%), institutional culture/pressure (12%), case complexity/acuity (8%), and faculty role strain (2%).
Thirty-two surgeons (57%) identified at least one way in which they may intentionally or unintentionally impede resident learning (M=0.78±0.61). The most frequently cited item was communication failure (38%) (“I do use shame sometimes to get trainees to understand what I expect and that does not always read well”). This was followed by the need to be efficient (22%) (“I focus on efficiency and timeliness, sometimes limiting their experience with uncertainty”), personal characteristics (22%) (“I am relatively quiet and introverted”), one’s own training level (16%) (“I am junior and still like to have a lot of control”), being in a state of exhaustion or high stress (13%) (“By the end of the day I am tired, so may be less engaged”) and setting resident expectations too high (13%) (“High expectations”).
Notably, 46% of surgeons identified resident behaviors that impede learning even though this topic was not prompted.
Conclusion
This research adds to the limited body of work examining institutional barriers to surgical education while newly identifying surgeon self-identified behaviors that may impede resident learning. Future research should consider interventions and systematic changes to reduce these barriers to enhance resident education.
