Poster Session I - Assessment
Introduction: Entrustable Professional Activities (EPAs) provide a framework for competency-based assessment in surgery. EPA descriptions include specific behaviors to be demonstrated by trainees at progressive levels of autonomy. EPA microassessments offer an opportunity for self-reflection by the resident, with immediate feedback/evaluation by assessors. The required self-reflection free-text allows residents the opportunity to contemplate upon their interaction leading to EPA submission. Our aim is to understand the content of the self-reflective comments provided by residents within EPA microassessments.
Methods: A total of 738 General Surgery EPA assessments for inguinal hernia, gallbladder disease, right lower quadrant pain, trauma, and surgical consultation were evaluated from 6/2021-6/2023. We assessed the self-reflective free text comments by residents to understand the content that residents focused on when reflecting on the EPA microassessments. These comments were further evaluated for positive or negative comments about the interaction.
Results: Qualitative analysis demonstrated a majority of residents recounting the interaction that led to the submission of the EPA Microassessment. Language within self-assessments reflected doubt and uncertainty within operative cases until the final year of training. Junior residents focused self-reflections on topics including gaps in knowledge or operative ability or requirement of assistance from attending or senior residents. Senior residents focused on topics pertaining to the ability to independently perform cases or the ability to supervise case/consult completion.
Conclusions: Microassessment self-reflection allows residents an opportunity to introspectively evaluate the interaction that led to the EPA microassessment. A majority of self-assessments focus on “negative” aspects of a microassessment rather than what may have gone well. Self-reflection is an essential skill to develop as a resident as this allows residents to guide their own education and direct interventions toward their own self-identified skill deficits. With many residents having very busy clinical schedules, they lack the time and opportunity to deeply contemplate the EPA microassessment. Providing a structure for self-reflection, such as discussing a positive aspect and an opportunity area within the EPA microassessment, allows the resident to efficiently complete the self-assessment with a meaningful reflection that could be revisited to guide future learning.
INTRODUCTION
Practice-based learning is an ACGME competency but may be limited by the frequent need to reorient to subspeciality specific workflows, practice protocols, and expectations. Learners, teachers, and patients may suffer when there is not a clear understanding of the goals and objectives of each clinical rotation. This could be mitigated through a formalized monthly onboarding curriculum. While annual onboarding and introductory bootcamps have become common, we aimed to assess rotational onboarding practices and needs. We hypothesized that general surgery residents lack formal onboarding for their rotation-specific learning experience.
METHODS
This was a survey-based study across two University hospital General Surgery Residency programs. Survey data were collected regarding trainee onboarding experiences and preferences using a Likert scale. Statistical analysis was performed and p<0.05 was significant.
RESULTS
Eighty-four trainees responded. Mean(±SD) age was 31(±4) years, most were interns(35%) and PGY2(20%). PGY 3, 4, and 5 represented 13%, 12%, and 14% of participants. 56% were male, 44% Caucasian, 88% non-Hispanic, and most were categorical(71%). 44(52.4%) residents reported never receiving any rotational onboarding. Across each general surgery resident rotation, subspecialty-specific onboarding was only reported 0% to 20.2%. Of the 40 residents(47.6%) who reported receiving onboarding, the most common format of materials was survival guides(77.5%), handbooks(42.5%), and/or in-person lectures(32.5%). Onboarding was provided mostly by resident peers(77.5%), faculty(42.5%), and/or administration(25%). Accessibility of onboarding materials was rated 3.7 out of a 1-5 Likert scale (5=highly accessible), however 32.5% of residents only referred to materials once or never and only 42.5% reported access to clinical practice management guidelines. Video, virtual, and app-based materials were rarely utilized (n=6, 15%) but resident preference identified this to be a preferred medium for delivery (n=17 (42.5%), p=0.013). The majority of residents(52.5%) had expectations provided mid-rotation rather than beforehand with expectations being provided mostly by resident-peers(87.5%), faculty(57.5%), and/or administration(10%).
CONCLUSION
Formal onboarding is inconsistent and underutilized across surgical subspecialties with content and expectations often created and delivered by peer-trainees. This study highlights the opportunity and need for rotation-based onboarding curriculum with additional faculty input delivered prior to the rotation in a more accessible modality (such as app-based) for modern learners.

Background: United States Medical Licensing Examination (USMLE) Step 2 scores have been shown to predict performance on American Board of Surgery In-Training examinations (ABSITE) as well as American Board of Surgery (ABS) qualifying exams (QE) and certifying exams (CE). Since USMLE Step 1 is now pass/fail, there may be more emphasis on USMLE Step 2 for residency match. How National Board of Medical Examiners (NBME) Subject Exams compare to other standardized exams in predicting Step 2 performance is not well understood.
Methods: This was a single-institutional cross-sectional study. Scores for the Medical College Admission Test (MCAT), Comprehensive Basic Sciences Examination (CBSE), and NBME Subject Exams were collected for medical students matriculating 2016-2019. We used multivariate regression models to estimate model adjusted R-square for each score’s predictability for USMLE Step 2 CK score.
Results: There were 600 medical students and exam data were available for 530 students. There were 6 NBME Subject exams (Medicine, Neurology, OBGYN, Pediatrics, Psychiatry, and Surgery). MCAT combined percentile, CBSE 1 and CBSE 2 were predictive of Step 2 score (Table 1). The best predictors were Neurology, Psychiatry, and Surgery NBME exams (Table 2).
Conclusions: Our study shows that Neurology, Psychiatry, and Surgery NBME scores are predictive of Step 2 performance. This promotes earlier interventions to optimize standardized test scores which, with a well-rounded curriculum vitae, maximize chances of successful residency application.

As general surgery education shifts towards a competency basis, there is a need for updated competency-based assessment scales to correlate. This study introduces a novel Competency-Based Assessment of Robotic Surgery Skills (CARS) Scale and reports on its initial potential to evaluate general surgery residents. The CARS scale incorporates ten robotic surgery competencies scored out of five points including: tissue dissection, tissue handling and retraction, robotic stapler use, arm exchange, camera use, intracorporeal suturing and tying, wristed articulation, port placement, docking, and “intangibles” such as console ergonomics and control. Post-graduate year (PGY) 1-5 resident robotic surgery performance was directly observed by six in-person faculty reviewers and evaluated based on CARS metrics. Additionally, a sample of 20 varying robotic surgical procedure videos including attending surgeon samples were scored by two blinded expert reviewers using both the CARS and Global Evaluative Assessment of Robotic Skills (GEARS) scales.
When utilizing the CARS scale, blinded video reviewer mean scores reliably differentiated attending surgeon performance from beginner [PGY-1-2] (4.6 ± 0.1 vs. 3.9 ± 0.3; p<0.041), intermediate [PGY-3] (4.6 ± 0.1 vs. 4.3 ± 0.2; p < 0.0009), and advanced [PGY-5] (4.6 ± 0.1 vs. 4.0 ± 0.3; p < 0.0001) resident performance (Figure 1), respectively, as did the GEARS scale. The CARS scale demonstrated moderate inter-rater reliability when used to evaluate inguinal hernia repair video samples, similar to the GEARS scale.
In-person resident CARS scores showed the greatest statistically-significant improvement in mean scores between PGY-1 and PGY-2 (2.8 ± 0.4 vs. 3.6 ± 0.4, respectively; p<0.0003) as well as PGY-4 and PGY-5 resident levels (4 ± 0.5 vs. 4.5 ± 0.2, respectively; p<0.013). Robot docking exhibited the highest mean score across all PGY levels (4.36/5), while robotic stapler use exhibited the lowest mean score (3.06/5). Although in-person scoring using the CARS scale by six faculty reviewers differentiated beginner-, intermediate-, and advanced-level residents, blinded expert video scoring did not.
The novel CARS assessment scale shows initial promise as an adjunct tool for general surgery residency programs. Further studies are necessary to validate this scale across a broader reviewer audience and procedure base.

Introduction
Objective evaluations of medical student clinical reasoning are lacking. We implemented a summative Oral Objective Skills Clinical Examination at the end of 3rd year (MS3) clerkships to determine feasibility of oral exams in undergraduate medical education. Our hypothesis was that evaluations would differ based on faculty examiner specialty.
Methods
After completing their core clinical clerkships, all MS3s participated in a 15-minute oral exam on a cholangitis and septic shock case via Zoom. Medicine (IM) and Surgery (GS) faculty volunteers were trained to standardize evaluations of each student’s performance based on a validated assessment of reasoning rubric. There were four assessment checkpoints during the case: 1) Completion of an H&P, 2) Utilization of appropriate laboratory and imaging data, 3) Determination of a prioritized differential diagnosis, and 4) Initiation of a safe clinical management plan. At each checkpoint, students were scored as 1-Needs Improvement, 2-Competent, or 3-Excellent. The final score was a composite of checkpoint totals (Max=12). A passing score was set at a composite >4. Data is depicted as medians (IQR) and percentages.
Results
N=152 MS3s took the oral exam. The median final score was 6.5 (5,7) out of 12 with 22% students evaluated as fails. Comparing IM (n=6) and GS (n=9) faculty examiners, there were no statistically significant differences based on gender, number of evaluations performed, or level of experience. While the final scores were similar between IM and GS faculty, GS faculty gave students a grade of Excellent on individual checkpoints more often (48% vs. 29%, p<0.05). Sub-analyses showed there were no differences in grading between IM and GS for checkpoints 1-3, however for checkpoint 4 there was a divergence. GS faculty gave students a significantly higher score on questions regarding safe management 2(1,2) vs. 1(1,1), p<0.001).
Conclusion
Objective evaluation of MS3 clinical reasoning is possible. GS faculty appeared to be more lenient with grading, specifically with respect to assessment of safety. Understanding differences between examiners will help optimize future assessments of clinical reasoning, especially in light of inherent biases from individual examiners’ specialty and scope of practice.

Introduction
Robotic inguinal hernia repair (RIH) is increasingly performed and incorporated into general surgery residency training. Developing benchmarks for resident active console time and efficiency is necessary for implementing and evaluating robotic surgical training curricula.
Methods
We extracted 86 RIH cases performed with a dual console Da Vinci robotic system from the Intuitive Data Recorder (IDR), which segmented RIH into procedure-specific tasks with objective performance indicators (OPIs), data calculations derived from robotic data which encompass time, speed, path length, and events from the surgeon and robotic instruments. Six RIH steps were included: creation of the peritoneal incision, peritoneal flap exploration/exposure of the myopectineal orifice, reduction of the hernia sac, mesh placement, mesh fixation, and peritoneal closure. Resident active console time was measured in seconds, and efficiency was measured as median instrument tip speed (cm/second). Descriptive statistical data analysis and ANOVA were performed.
Results
Most cases (n=79, 91.9%) were with resident (PGY2-PGY5) participation, 3.5% (n=3) with a fellow, and 4.7% (n=4) by the attending surgeon alone. Total operative time with a resident/fellow was on average 7 minutes shorter than RIH without trainee participation. For step-specific tasks, junior residents (PGY2-3) were given more time to operate as active console surgeons than senior residents (PGY4-5), though the difference was explained primarily by time given to the final step “closure of peritoneum” (Fig 1). Senior residents had higher operative efficiency than junior residents, (Arm #1 1.02 vs 0.10, Arm #2 1.26 vs 0.68, Arm #3 1.37 vs 0.53, p<0.0001).
Conclusion
IDR analysis allows for assessment of trainee operative proficiency by step. Operative arm efficiency in RIH using OPIs reveals statistically significant improvement in efficiency between junior and senior residents. Development of robotic operative proficiency benchmarks using objective data should include the anticipated progression of efficiency and active console time.

Background:
In January of 2022, the USMLE score report for Step 1 changed from a 3-digit number to pass/fail. This was intended to reduce student anxiety related to the test score, allow undistracted participation in school curriculum, and prevent secondary use as a screening tool for residency program selection. Currently, there are few reports on the real-world impact of the change. We seek to understand medical student perception of the impact of the change on their anxiety, time and money spent on testing, and their application for residency.
Methods:
The perspective of third year medical students (USMLE pass/fail) was compared to fourth year medical students by an online survey at two medical schools during the academic year of 2021-2022. Students were asked to rank their anxiety levels on a scale of 1-10 for both Step 1 and Step 2, rank eight sections of the residency application from least important to most important, and report total hours and dollars spent outside of school’s curriculum in order to prepare for both exams. We compared the two groups using independent samples t-test.
Results:
Of the total 74 students, 33 received a 3-digit score report for Step 1, and 41 had a pass/fail report. Seventeen were students interested in surgical specialties (SISS).
Score reporting mechanism had no impact on anxiety about Step 1; however, students with pass/fail reports were significantly more anxious about Step 2.
In general, those with a scored Step 1 result ranked it as of primary importance in the residency application, while those with a pass/fail result ranked the Step 2 score most highly; there were no significant differences between the groups on the ranking of the other sections of the residency application. SISS had the same results.
Score reporting mechanism had no impact on total time and money spent outside the curriculum for either test. However, SISS with a pass/fail Step 1 showed a significant decrease in study time for that exam.
Conclusion:
Although our results are not definitive, they suggest that rather than alleviating stress, the change in reporting Step 1 scores simply moved student focus to Step 2.
Background: Simulation is a growing modality for nontechnical skills (NTS) curriculum in surgical education, yet it is largely focused on communication. In addition to the previously well-studied skills of empathy and compassion, attributes of courage, composure, honesty, and humility are equally important aspects of individual character that form the foundation of the physician patient relationship. Our objective was to review the literature describing how simulation has been used to develop and assess character attributes during NTS training for surgical residents.
Methods: PubMed, Embase, Scopus, and Education Full Text databases identified relevant articles published before July 14, 2023. Studies evaluating empathy, compassion, courage, composure, honesty, or humility in surgical trainees using simulation were included. Extracted data included details of simulation design, character attributes studied, NTS assessment methods, and key outcomes.
Results: Of 1921 articles identified, 227 underwent full text review, with 12 included in our analysis. Modes of simulation included low and high-fidelity scenarios with diverse topics relevant to surgical training. Most (7/12) included simulation as part of a larger multimodal NTS curriculum, and only 2/12 assessed technical skills and NTS simultaneously. All had the goal of improving demonstrated participant empathy, 2/12 additionally assessed participant composure. Most (9/12) used a form of validated psychometric tool such as the Jefferson Scale of Empathy to assess character attributes. Studies varied in who conducted NTS assessments, between participants, peers, standardized patients, and independent faculty raters. Most (10/12) reported improved participant empathy post-curriculum or an association between simulation performance and participant empathy.
Conclusions: Daily, good surgeons must integrate technical and NTS alike in clinical practice. This review identifies a need to incorporate technical skills and NTS simultaneously in simulation-based surgical training. Simulation as a modality for character development has largely focused on empathy and compassion. We suggest future interventions include additional attributes like courage, composure, honesty, and humility for more comprehensive character development. Finally, the demonstration of character attributes like empathy is a complex behavior involving multiple perspectives. Accordingly, a “360 degree” assessment by all participants in an observed interaction is recommended to thoroughly identify areas for improvement.
Introduction
Oral patient presentations are often presented in a SOAP (subjective-objective-assessment-plan) or inductive structure, which provides a diagnostic anchor at the end of the presentation. Prior qualitative work in surgical education literature has suggested that a deductive structure contributes to an effective oral case presentation by providing the diagnostic anchor early to define the relevancy of subsequent information. This experimental study tests if a deductive structure of presentation improves listener comprehension.
Methods
Senior (n=25) and junior (n=32) general surgery residents at a single institution watched videos of two clinical case presentations. Videos differed in presentation structure (deductive or inductive) and anchor veracity (diagnosis given is true or misleading). After each video, participants provided an oral retelling of the case including only relevant details and their own diagnosis. Listener comprehension was measured by diagnostic accuracy and reported relevant elements.
Results
Scenario A: 67% of residents diagnosed the case correctly. While presentation structure did not influence residents’ diagnostic accuracy, anchor veracity had a significant effect. All participants were more likely to report the correct diagnosis if given a true diagnosis in the presentation (p=.001). Junior residents were also more likely to report an incorrect diagnosis if given a misleading diagnosis (p=.009).
Scenario B: 32% of residents diagnosed the case correctly. Neither presentation structure nor anchor veracity independently influenced diagnostic accuracy. Those given a deductive structure with a true diagnosis reported fewer relevant elements (p=.026) than those given an inductive structure with a true diagnosis. Those given a deductive structure were less likely to report elements specific to the correct diagnosis than those who received an inductive structure (p=.044). Those who reported an incorrect diagnosis were less likely to report elements specific to the correct diagnosis (p<.001). Junior residents given a deductive structure but reported an incorrect diagnosis were more confident they provided the correct diagnosis (p=.001).
Conclusion
Listener comprehension is not improved in a deductive structure of oral presentation as previously proposed. For the more difficult scenario, deductive structure was associated with worse listening comprehension and affected by confirmation bias. Junior residents may be more susceptible to anchoring bias in certain conditions.
