• Skip to primary navigation
  • Skip to main content
  • Skip to footer
The Association for Surgical Education

The Association for Surgical Education

Impacting Surgical Education Globally

  • About
    • By-Laws
    • Contact the ASE
    • Leadership
    • Past Presidents
    • Standing Committees
    • Global Surgical Education-Journal of the ASE
    • ASE Strategic Plan 2023-2026
  • Join!
  • Meeting
    • Annual Meeting Information
    • ASE Fall Meeting & Courses
    • Call For Abstracts
      • Scientific Sessions
      • Candlelight Session
      • ASE Pre-Meeting Course Proposal
      • Shark Tank: Multi-Institutional Research Submissions
      • Thinking Out of the Box
      • Workshop and Panel Submissions
    • Industry
      • Exhibits
      • Commercial Promotional Opportunities
    • Institutional Members & Sponsors
    • Meetings Archives
    • Media Gallery
  • Awards
    • ASE/APDS: Collaborative Grant Initiative
    • ASE Underrepresented in Medicine (URiM) Scholarship Application
    • Education Awards
    • Shark Tank: Multi-Institutional Research Grant
  • Programs
    • 2025-2026 Association for Surgical Education Curriculum in Education Innovation and Teaching (ASCENT)
    • Academy of Clerkship Directors
    • Academic Program Administrator Certification in Surgery
    • Ethics of Surgery Fellowship (EthoS)
    • Surgeon Empowerment Leadership Fellowship (SELF 2.0)
    • Surgical Education Research Fellowship (SERF)
      • Surgical Education Research Fellowship Graduates
  • Foundation
    • Donate Now!
    • Foundation Board
    • The ASE Foundation: Building for the Future – Donors
    • Deb DaRosa Scholarship Application
    • Dr. Debra DaRosa Career Development Scholarship – Donors
    • Patricia Numann, MD, FACS, Scholarship for LMIC Surgical Educators
    • CESERT Pyramid Grant Application
    • Spotlight on CESERT Pyramid Grant Awardees!
    • Newsletter
    • Annual Report
    • Review Committee
    • Grants Awarded
    • Corporate Partners
  • Resources
    • Job Board
    • Research Board
    • Policy for Conducting Survey Research of ASE Members
    • Surgical Education Research Webinar Series
    • Podcasts
    • ASE CoSEF Peer Engagement for Education Research Success Webinar Series
  • ATLAS
  • Donate
  • Login

ASE 2024 Abstracts

 

Podium Session I C - Assessment

Wednesday, April 24, 2024  |  7:00 AM - 8:30 AM  |  Room: Plaza F

 

(S026) Characterizing Written Feedback and Predictors of Feedback Content on an Entrustable Professional Activity (EPA) Assessment Tool
Alyssa D Murillo1, Aileen Gozali1, Riley Brian1, Alexandra Highet1, Olle ten Cate2, Adnan A Alseidi1, Patricia O'Sullivan1, Lan Vu1; 1University of California San Francisco, 2UMC Utrecht

Background: Our general surgery residency piloted a new intraoperative Entrustable Professional Activities (EPA) assessment tool in 2022, in anticipation of the American Board of Surgery (ABS) rollout of EPAs and to improve residents’ satisfaction with faculty feedback. There is an opportunity for narrative feedback to be included in EPA assessment tools; however, limited data exist defining types and predictors of narrative feedback on EPAs. We explored the quality of written feedback on our EPAs assessment tool by characterizing feedback types and their associations with entrustment, case-specific variables, and faculty/resident characteristics.

Methods: The assessment tool requires faculty to assign an entrustment score, four sub-scores (knowledge of anatomy, steps of operation, recognition of potential errors, and surgical technique), and provide narrative feedback. Given strong intercorrelations (r=0.45-0.69) and high reliability (α=0.84) between sub-scores, the four sub-scores were summed to a composite score. We dichotomously coded narrative feedback for valence (reinforcing vs constructive), specificity (specific vs general), appreciation (recognizing or rewarding trainee), coaching (offering a better way to do something), and evaluation (assessing resident against set of standards). Inter-rater reliability was 91% (K=0.84). Multivariate logistic regression assessed for associations between feedback characteristics and entrustment score, composite score, PGY level (1-5), case difficulty (1-3), resident/faculty gender (0=male), gender matching (1=alignment), faculty years in practice, and resident’s under-represented in medicine (URiM) status (1=URiM).

Results: 398 intraoperative EPA assessments were completed on 44 residents (PGY1-PGY5), by 46 faculty, on 10 surgical services. Written feedback had high valence (83%) and high specificity (84%). Comments frequently contained appreciation (91%); coaching (59%) and evaluation (35%) were less common. Key findings include that female faculty were associated with coaching, but not evaluative feedback. Generally, entrustment level, composite score, PGY level correlated with different feedback types. Case difficulty, faculty/resident gender, and gender matching were also significant for different types of feedback. URiM and faculty years in practice were not statistically significant (Table1).

Conclusion: Entrustment and performance can be related to the type of feedback received. Gender and gender match resulted in different types of feedback. Evaluative feedback was the least prevalent and warrants further exploration since evaluation is critical for learning.


(S027) Patient perspectives on optimal surgical resident care: a qualitative analysis of general surgery inpatient experiences
Arian Mansur, BA, Rebecca Tang, MD, Emil Petrusa, PhD, John T Mullen, MD, Roy Phitayakorn, MD, MHPE, Sophia K McKinley, MD, EdM; Massachusetts General Hospital

Background: Surgical trainees are often the primary care givers to patients in the post-operative setting, yet patient feedback is not routinely incorporated into surgical resident assessment despite the increasing use of patient satisfaction metrics in the evaluation of practicing surgeons. The aim of this exploratory study is to identify the behaviors and characteristics associated with optimal surgical resident care from the perspective of general surgery inpatients to inform the future development of patient-derived assessments of surgical housestaff. 

Methods: English-speaking, general surgery inpatients recovering from elective general and oncologic surgery were purposively recruited to undergo semi-structured interviews regarding their experience and satisfaction with surgical resident care. Patients were recruited on post-operative day 2 or later to ensure multiple interactions with the surgical resident team, which is comprised of a senior resident and intern. Interview transcripts were inductively analyzed by a multidisciplinary team to identify prominent themes regarding the actions and qualities that patients perceive to be most desirable and important in their surgical resident care providers. 

Results: Eighteen patients participated (56% men, mean age = 58.6 years). We identified five domains that patients indicated as essential and valued in surgical resident care: (1) listening to and addressing patient concerns through shared-decision making, (2) establishing a bi-directional, personal connection between patient and resident, (3) portraying competence and confidence, (4) communicating clearly and consistently as an individual and in concert with other providers, and (5) putting the patient at ease with a personable bedside manner. A resident’s demonstration of these five domains was interpreted by patients as evidence of receiving the best possible resident care. 

Conclusions: Patients report a set of characteristics and behaviors that they associate with optimal surgical resident care. These results can guide the development of patient-based evaluations of residents such that they reflect the values and perspectives of patients themselves. Future work should aim to understand how to best incorporate patient evaluation into trainee assessment to support development and prepare trainees for independent practice. 


(S028) Longitudinal Comparison of Multi-Institutional Standardized Mock Oral Examination Performance is a Novel Method to Identify Curriculum Improvement Opportunities in General Surgery Training Programs
Jerome Andres, Justin Wagner, MD, James Wu, MD, Tara Russell, MD, PhD, Catherine Lewis, MD, MEd, Timothy Donahue, MD, Areti Tillou, MD, Formosa Chen, MD, MPH; David Geffen School of Medicine at UCLA, Department of Surgery

Objective: Mock Oral Examinations (MOE) are widely utilized to prepare trainees for board certification. We propose that longitudinal comparison of multi-institutional MOE performance by program and clinical topic can demonstrate recurrent deficits to help identify opportunities for curricular improvement.

Methods: Between 2021 and 2023, multi-institutional MOEs were conducted annually. Examinees were assessed in three 30-minute rooms, each with two examiners, using 12 standardized cases. Overall and topical performance by institution, based on the mean percentage of questions answered correctly, were compared using one-sided t-tests (p<0.05), and topics in which programs had patterns of lower performance were highlighted.

Results: During the consecutive MOEs, up to seven residency programs participated, and 43 to 49 senior residents (R4 & R5) were examined each year. Fifteen clinical topics were covered, of which five recurred in all exams. Three institutions had topics where they consistently trended below the group mean, with at least one year being statistically significant.

Conclusions: This MOE program is characterized by longitudinal data collection, multi-institutional participation and standardized case content. These features allowed us to conduct an in-depth comparison of MOE performance over time and identify clinical topics with patterns of lower performance in participating programs. These patterns may point to curricular gaps that can be targeted for improvement.


(S029) Complex Laparoscopy in General Surgery Training: Keeping Up with the Times
Susan You, BA1, Michael Kell, BA1, Elizabeth Dauer, MD2; 1Temple University Lewis Katz School of Medicine, 2Department of Surgery, Temple University Lewis Katz School of Medicine

Introduction

The Accreditation Council for Graduate Medical Education (ACGME) currently requires graduating chief residents to have participated in 75 complex laparoscopic cases, however, they do not define subcategories. We aim to compare trends in the use of minimally invasive approaches for common general surgery procedures nationally and in training to determine if subcategories in training are necessary to mirror national practice patterns.

Methods

We analyzed ACGME national case log data from 2015-2021 to determine trends in the proportion of common procedures being performed with a minimally invasive approach during general surgery training. Minimally invasive procedures were included if the average number of cases completed during training was 10 or greater.  We then compared these trends to national trends using the NSQIP and MBSAQIP databases as representative samples of the national practice to determine if the trends were different among the two groups. 

Results

Six procedures were identified for analysis.  We found statistically significant differences in the trends for gastric resection for morbid obesity (ANCOVA:F=5.8, df=1, p=0.04), hernia repair (ANCOVA:F=122.1, df=1, p<0.001), and enterectomy (ANCOVA:F=37.0, df=1, p<0.001). Colectomy procedures trended toward, but did not reach statistical significance (F=4.5, df=1, p=0.06). There was no statistically significant difference for antireflux (ANCOVA:F=0.4, df=1, p=0.53) and partial gastrectomy procedures (ANCOVA:F=0.3, df=1, p=0.60). 

Conclusion

The proportion of cases performed with a minimally invasive approach continues to rise every year. Consideration should be given to adapting general surgery training requirements to match this growing practice, particularly for procedures that do not appear to be keeping pace with national trends. 


(S030) Disparities in Resident Feedback for General Surgery Residents Underrepresented in Medicine
Omowunmi E Oluwo, MD, Clementine F Young, MD, Adam Nelson, MD, Iman Ghaderi, MD, Michael Ditillo, DO, Stephanie Worrell, MD; University of Arizona - Tucson

Introduction

Studies in medical education have shown inequalities in the assessments of trainees and faculty advancement for those that are underrepresented in medicine (URiM). URiM groups include racial and ethnic populations that are underrepresented in the medical profession relative to the general population.

With limited research on competency-based assessments relating to a resident’s URiM status, there is a need for studies to evaluate these assessments as they become central to resident advancement. This study aims to evaluate the nature of feedback provided to residents who are URiM.

Methods

Written feedback in a general surgery program was collected from 8/2022-6/2023. The feedback was collected from resident individual, private assessment portals. Residents requested feedback from attendings following cases or interactions through their individual online portal. These assessments were then de-identified and reviewed by two independent reviewers (CFY and OEO). The reviewers categorized the narrative feedback to determine if it was personality or clinically-based, actionable, and theme (positive, negative or mixed). The themes were further divided in to 6 categories (preparedness, efficiency, independence, teaching/direction to assistants, technical skills and professionalism). The feedback was compared between URiM versus non-URiM residents.

Results

There were 38% (141/376) URiM and 62% (235/376) non-URiM assessments submitted. URiM residents received similar amounts of mixed, positive and negative feedback to non URiM residents, p=0.568. The themes of feedback for URiM were significantly different than non URiM residents with URiM residents receiving significantly less feedback regarding independence (14% vs 26%, p=0.003). URiM residents received significantly more feedback on professionalism (24% vs 14%, p=0.013).

Conclusions

As surgical training shifts towards a competency-based advancement program, program directors and clinical competency committees must be aware of the inherent biases already at play for URiM residents. These residents receive similar amounts of positive and negative feedback, but with a significant lack in feedback related to leadership, independence and professionalism. These areas are critical to obtain fellowships, jobs and faculty advancement. Addressing the shortcomings in assessments or lack of mentorship amongst these residents in

these key areas could be one step in achieving parity in URiM faculty retention and the diverse workplaces that healthcare needs.


(S031) Comparing three scoring approaches to evaluate feedback quality across general surgery training
Rebecca Moreci, MD, MS1, Kayla M Marcotte, MS1, Alyssa Pradarelli, MD1, Gurjit Sandhu, PhD1, Julie Evans1, Tanvi Gupta, MEng1, Andrew E Krumm, PhD1, Chia Chye Yee, PhD1, Brian C George, MD, MAEd1, Stefanie S Sebok-Syer2; 1University of Michigan, 2Stanford University

Introduction

Trainees require high quality feedback in order to progress through surgical training. Though this is necessary, there is variable quality in feedback received.1–5 Machine learning and natural language processing (NLP) models have been utilized in automating the quality of feedback.6,7 However, what is unknown is if these different models result in similar evaluations of quality. Therefore, we compared three different approaches to evaluate feedback quality across surgical training.

Methods

Dictated feedback comments from the Society for Improving Medical and Professional Learning (SIMPL) were collected from general surgery residents in the United States between 2015-2023. To control for task-related variation and ensure adequate data across all training years, only comments from open inguinal hernia procedures were included. Dictated feedback comments were evaluated using three different approaches: (1) by human raters (calibrated by SSS, RM and coded by RM) using the previously validated QuAL Score rubric; (2) by a machine learning model trained using the QuAL Score rubric (see commentquality.com); and (3) by a natural language processing (NLP) model5 previously trained to predict the probability of a comment being high quality. For each scoring approach, a Kruskal Wallis test was performed to identify any differences in composite scores across each PGY year. Dunn's post-hoc test was used to determine which years of training had significantly different scores.

Results

A total of 1,270 dictated feedback comments were analyzed across all postgraduate training years. Significant differences were identified across years of training for all three scoring approaches: (1) human rater p<0.01; (2) machine rater p<0.01, and (3) NLP quality predictor p<0.01. Post-hoc differences were found between PGY1 and PGY4 and between PGY1 and PGY5, with higher quality scores given to the PGY1s and lower quality scores given to the PGY 4s and 5s. These findings were consistent amongst all three approaches.

Conclusions

There were significant differences found in overall feedback quality for the three different approaches, with PGY1 residents receiving the highest quality feedback. These findings provide evidence that contributes to the validity argument for using NLP models to evaluate feedback.


(S032) Machine Learning Based Automated Assessment of Intracorporeal Suturing in Laparoscopic Fundoplication
Shekhar Madhav Khairnar, MS, Alexis D Desir, MD, Carla Holcomb, MD, Daniel J Scott, MD, Ganesh Sankaranarayanan, PhD; UT Southwestern Medical Center

Introduction 

Automated assessment of surgical skill using artificial intelligence (AI) is a valuable for trainees to obtain instantaneous feedback. Capturing bimanual tool motions and using the derived kinematic metrics have shown to be reliable predictors of performance in laparoscopic tasks. Implementing automated tool tracking assessment requires time intensive human annotation. We have implemented an AI based tracking tool using the Segment Anything Model (SAM), to eliminate the need for human annotators. The goal of this work is to evaluate the usefulness of our tool tracking model in automated assessment of performance in a laparoscopic suturing task of the fundoplication procedure.  

Methods

An automated tool tracking model was applied to recorded videos of our study on the Nissen fundoplication on a porcine bowel model. The participants were grouped into novice (PGY1-2) and experts (PGY3-5, attending). The beginning and ending of each of the suturing steps were segmented and the motions of the left and right tools were extracted. A low pass filter with a cut off frequency of 24 Hz was then applied to remove any noise.  Kinematic features that includes displacement, root mean squares (RMS) velocity, acceleration and jerk in pixel coordinates (x,y) were extracted from the filtered data. Data was split into training, testing, and validation sets. An ablation study was then conducted with machine learning models that include Logistic Regression, Random Forest Classifier, Support Vector Classifier (SVC), and XGBoost, to find the best model for differentiating the skill level of our participants. 

Results 

Out of a total of (n = 38) participants, we extracted useful data from 28 (novice =9, expert =19). The excluded videos included incomplete task or excessive camera movements.  Figure 1a shows an example tool segmentation and tracking. The ablation study results show that the Logistic Regression model outperformed others in classification with accuracy of 82% and an F1 score of 0.8. Figure 1b shows the confusion matrix of the Logistic Regression model. 

Conclusion 

We successfully demonstrated an AI model for automated classification of performance independent of human annotation of surgical videos.  


(S033) "Addition by subtraction" of evaluation tools in the context of Entrustable Professional Activities
Nancy Ly, MD, Nicole Roberts, PhD, Margaret Boehler, MS, RN, Cathy Schwind, MS, RN, Frances Lee, MD, Prasad Poola, MD; Southern Illinois University School of Medicine

Background:

The implementation of Entrustable Professional Activities (EPAs) in general surgery by The American Board of Surgery offers numerous benefits for resident training and assessment. However, it also represents an increased administrative workload for faculty and possibly for residents. We used this implementation as an opportunity to engage in a quality assurance project to enhance the quality and quantity of all evaluations within our surgery program.

Methods:

We first conducted a comprehensive review of all the evaluations faculty and residents were required to complete in our residency program. We next organized separate focus groups for residents and faculty, utilizing a semi-structured questionnaire.

The discussions primarily centered on the effectiveness of the evaluations for residents, with a specific emphasis on whether the feedback provided assists them in their journey toward becoming proficient surgeons. The discussions among the faculty revolved around their ability to gather information that aids in their understanding of the progress made by the residents they supervise. All discussions were recorded and subsequently transcribed. We performed qualitative analysis of these transcripts using Atlas.ti software.

Results:

Our program was using 24 evaluations to meet requirements from ACGME, ABS, our institutional GME office, and other regulatory bodies. The predominant theme from the focus groups was the imperative to clarify the purpose of evaluations, avoid redundancy, and reorient them to be more actionable for both faculty and residents.

Faculty and residents pinpointed four evaluation tools (CAMEO, GAGES colonoscopy, GAGES EGD, OPRS) from the American Board of Surgery that would duplicate information to be gathered through EPAs. In addition, we recognized an opportunity to restructure other evaluations, such as end-of-rotation assessments and bi-annual evaluations, to enhance their effectiveness.

Discussion:

In general, both faculty and residents expressed that they possess an ample array of evaluation tools to offer comprehensive feedback. In addition, they pinpointed opportunities to streamline and reduce redundancy in the data collected. This underscores the importance of periodically reviewing these evaluation instruments to align them with the evolving requirements and anticipations of resident education.


(S034) Grit and well-being in general surgery residents
Kristen M Quinn, MD, Kevin Huang, MD, Vineeth Sama, Colleen Donahue, MD, Andrea Abbott, MD; Medical University of South Carolina

Background: With burnout affecting over 50% of surgical residents, efforts to focus on well-being are needed in training. Grit, defined as perseverance toward achieving long-term goals, is a measure of mental toughness and may be protective against burnout, and positively contribute to resident well-being. This study seeks to better characterize general surgery resident grit and its association with well-being. 

Methods: An anonymous electronic survey was distributed to all general surgery categorical residents at one program. Grit was measured using a 12-question validated Grit Scale and well-being was measured using a 5-question validated WHO-5 Well-Being Index. Linear regression was used to identify associations between PGY class, grit and well-being. Student’s t-test was used to compare grit and well-being scores among male and female residents. 

Results: 23/38 residents completed the survey. Grit scores ranged from 3-4.67 (scale of 1-5) and well-being scores from 24-84 (scale of 0-100). There was no correlation between grit and well-being (R2=0.01, p<0.63) or between PGY class and grit (R2=0.06, p<0.27). However, there was a weakly positive correlation between PGY class and well-being (R2=0.18, p<0.05). Each one-year increase in PGY class was associated with a 5% increase in the well-being index. There was a decrease in well-being from PGY-1 to PGY-2, increases from PGY-2 to PGY-4, and a decrease from PGY-4 to PGY-5. There was no significant difference between male and female residents for grit scores (3.76 vs 4.12, p<0.07) or well-being scores (60 vs 53, p<0.18). 

Conclusion: The lack of association between grit and well-being or PGY class and grit suggests that well-being may be more related to external factors than personal characteristics. Although there was no significant difference among genders, female residents had a mean higher grit score, yet a lower mean well-being score. Further investigation is necessary to elucidate factors contributing to these findings.


(S035) Evaluating the Evaluator: Comparing Faculty Tendencies in Narrative Feedback Using Natural Language Processing Analysis
Priya A Rajdev, MD, Jeremi Smith, EdD, Lisa Yañez-Fox, EdD; University of Arizona

Purpose: This study evaluates the utility of narrative feedback in medical education by analyzing teaching faculty's written feedback patterns and evaluating their alignment with a subjective numerical grade and the objective shelf exam score data. We employ Natural Language Processing (NLP) via ChatGPT, a large language model-based artificial intelligence system, to examine qualitative data, revealing thematic trends, sentiment, and feedback patterns over multiple years.

Methods: A retrospective analysis was conducted on end-of-rotation (EOR) assessments of medical students who completed third-year clinical rotations at general and specialty surgery sites managed by a single medical school between 2018 to 2022. NLP was employed to extract content, themes, and variability within the narrative feedback portion of the EORs by each faculty. These data (subjective) were compared with students' numerical EOR grades, obtained from a five-point Likert scale covering 14 core performance measures (1 = "fails to meet expectations," 2 = “meets expectations,”  to 3 = "exceeds expectations" in 0.5 increments). Additionally, these subjective measures were compared to the average shelf score (objective) for each academic year to assess congruity.

Results: Preliminary findings revealed numerical grade variability, ranging from 2 to 3. Despite these numerical variations, the written component of the assessments consistently conveyed a positive tone and content from all faculty. Actionable feedback was thematically limited. Thematic and sentiment consistency in narrative assessment content persisted year-to-year, even as average shelf scores fluctuated by class.

Conclusions: Our study reveals potential limitations of narrative feedback in EOR assessments and raises questions about their current value in providing actionable feedback. Generally, NLP analysis can identify such deficiencies to help improve assessment practices and potentially broadly advance medical education by conducting similar analyses across all curricula. As we move towards entrustable professional activities in medical education, the results of this analysis underscores the need for a shift in focus away from summative EOR assessments alone. These insights should inform faculty development, emphasizing direct, actionable, and frequent incremental feedback throughout each rotation, thus improving learning outcomes of medical students.

Footer

Contact Us

Association for Surgical Education
15821 Ventura Blvd Ste 400
Encino, CA 91436

Tel: 310-215-1226
Email: [email protected]

  • LinkedIn
  • Twitter

Advanced Training in Laparoscopic Suturing

The Official Journal of the Association for Surgical Education

  • Twitter