Poster Session IV - Education Technology & Simulation
Introduction
There is an increasing interest in the role of large language models (LLM) in aiding medical education and clinical decision making. LLMs, like ChatGPT, have been shown to pass professional certification examinations required in medicine. Comparative studies into the accuracy and strength of reasoning between LLMs have yet to be established in surgical specialties. This study aimed to compare the performance and reasoning capabilities of ChatGPT-4 vs Google Bard on nationally available subscription-based questions banks designed for general surgery resident education in preparation for American Board of Surgery In-Training Examination (ABSITE).
Methods
Individual practice questions were input into GPT-4 and Google Bard. Accuracy and strength of reasoning were graded on a scale of 0-3 by an individual grader for each question. Questions that required visual input were excluded. Comparison of mean scores for accuracy and strength of reasoning was performed.
Results
2500 questions were evaluated by GPT-4 and Google Bard. GPT-4 answered 77.56% questions correctly (86.80% easy, 75.65% medium, and 64.79% hard questions). Google Bard answered 52.95% questions correctly (58.38% easy, 52.32% medium, and 35.85% hard). Analysis revealed a statistically significant difference (P < 0.001) in the distribution of questions answered correctly between the GPT-4 and Google Bard. GPT-4 demonstrated a higher average reasoning score compared to Google Bard (2.3634 and 1.4922 respectively, P <0.001). GPT-4 was more likely to answer easy (2.62 vs 1.79), medium (2.31 vs 1.57), and hard (2.00 vs 1.06) questions with a greater strength of reasoning (P<0.001).
Conclusions
While neither AI program achieved perfect accuracy in responding to all questions, GPT-4 demonstrated superior performance compared to Google Bard, showcasing enhanced accuracy and reasoning skills when addressing questions demanding a deep comprehension of the intricate language prevalent in surgical training. These questions are crafted to prepare residents for the ABSITE and are closely associated with higher scores. This study augments the mounting evidence supporting ChatGPT's capacity to provide well-reasoned answers. Future research is essential to ascertain its potential as a viable question generator, offering a cost-effective alternative to expensive subscription-based question banks, thereby improving access to invaluable resources for surgical training.
Introduction
Electroencephalogram (EEG) measures the electrical activity of the brain and can be correlated with focus during activities. The rise of wearable EEG sensors has allowed increase access to these metrics. In this study we piloted the use of a wearable EEG sensor during a simulated task to assess cognitive load during the procedure.
Methods
Data was collected from 10 participants performing a simulated laparoscopic ventral hernia repair. EEG data was collected using a wearable headband-based sensor (Arctop, Los Angeles, CA). Median focus score (range 0 to 100) was calculated during the mesh preparation and mesh placement phases of the procedure (Figure 1). Mesh preparation was the period from when the participant picked up the piece of mesh to when it was placed in the abdomen including trimming the mesh to size and placement of any anchoring sutures extracorporeally. Mesh placement was the period from when the mesh was placed into the abdomen to when the mesh was secured to the anterior abdominal wall.
Results
Six participants placed stay sutures during the mesh preparation phase and four participants secured their mesh using intracorporeal sutures during the mesh placement phase. Compared to participants who used intracorporeal sutures, participants who placed extracorporeal sutures had significantly lower focus scores during the mesh placement phase (23.7 vs 38.0 p<0.01). There was no difference in focus scores between the two groups during the mesh preparation phase (41.9 vs 33.8 p=0.56). Participants who placed extracorporeal sutures had higher but not significantly different focus scores during mesh preparation compared to the mesh placement phase (41.9 vs 23.7 p=0.059).
Discussion
This study identified that placing stay sutures extracorporeally resulted in lower focus scores on EEG. Although the sample size is small the ability to detect a difference in cognitive load based on operative strategy supports the continued use of EEG to evaluate operative performance. Understanding how cognitive load changes throughout a procedure and by operative approach can help shape intra-operative instruction strategies to maximize learning. Additionally, higher cognitive load interferes with the ability to address intraoperative errors and unexpected events.

Introduction: Postgraduate surgical education has undergone a transformative shift, accelerated by the COVID-19 pandemic and the growing emphasis on patient safety. Augmented reality (AR) simulations, like the LapAR™ simulator, have emerged as innovative tools, providing realistic haptic feedback and capturing performance data. The primary aim of this study is to assess the impact of AR simulations on junior surgical trainees' skill acquisition and refinement, including their experiences with this technology in home settings.
Methodology: Our study involved 15 junior surgical trainees benchmarked against two consultants across five distinct training sites. The participants performed ten laparoscopic appendectomies interspersed with "LapPass" tasks, including camera handling and technical skill exercises. Objective metrics, such as completion time and distance travelled, were collected, followed by in-depth interviews.
Results: The study unveiled notable improvements in completion time and reduced distance travelled during successive laparoscopic appendectomies and LapPass tasks. Additionally, enhancements in the smoothness, acceleration, and ambidexterity of movements were observed, although these aspects were not the primary focus. It is worth noting that the consultants also exhibited improvements, mainly in distance travelled and somewhat in acceleration. Qualitative analysis highlighted the early-stage efficacy of AR-based training, advocating its incorporation into standard surgical training. Participants expressed a desire for extended access to this technology, recognizing its utility for operation list preparation, skill acquisition, and knowledge enrichment. Home-based training added flexibility and convenience, complementing the high-pressure surgical setting. Furthermore, the flexibility of home-based training offers accommodation to individuals with unique circumstances, ultimately making the field of surgery more accessible to those who might face physical or geographical barriers. Despite occasional technical challenges related to setup and reliability, AR offered lifelike realism and tangible educational benefits.
Conclusion: AR solution offers a scalable platform to enrich surgical skills training, and fosters a more equitable and inclusive environment. The study provides evidence of construct validity through enhanced objective skill scores, supported by positive feedback on the educational content from qualitative interviews. The realism of the AR further underpins face validity, bringing trainees closer to real-life scenarios, marking a breakthrough in surgical education.

Background: The ability to fire a circular EEA stapler is a necessary skill for all residents graduating from general surgery residency. A prior needs assessment demonstrated that while experience firing an EEA stapler increased with number of operative exposures to EEA anastomoses, there were still a significant number of residents who had never fired an EEA stapler. Furthermore, residents were uncomfortable and unfamiliar with the steps of an EEA anastomosis. Informed by our needs assessment, an EEA circular stapler task trainer was created. Our aim is to describe and validate that model.
Methods: A low-cost and easy-to-replicate EEA circular stapler task trainer was created by placing a synthetic silicone rectosigmoid colon (that can be cut, stapled, and sutured) within a simulated bony pelvis. To validate the model, 4 colorectal surgeons were asked to perform an EEA anastomosis. After testing the model, a survey was then distributed designed to elucidate expert opinions on the fidelity of the model and whether skills would be translatable to the operating room. Qualitative analysis of the survey results was then performed.
Results: All participants were able to perform each step of an EEA anastomosis on the trainer with no modifications necessary. While they appreciated the fidelity of the model when firing the TA stapler, cutting the rectum, firing the EEA circular stapler, and evaluating the donuts, they noted less fidelity with passing the stapler through the anal sphincter, performing the anvil pursestring, and advancing the stapler through the bony pelvis. All felt that the task trainer was a good model to simulate the steps of an EEA anastomosis and answered "definitely yes" or "probably yes" when asked if resident success on the simulator would translate to them being more likely to allow the resident to fire an EEA in the operating room.
Conclusion: The RELAX Model (Rectal Excision Low Anastomosis Task Trainer) has been validated as a translatable simulator with good fidelity to intra-operative experience. The model has been modified based on improvements suggested during the validation process and a curriculum for general surgery residents is being developed to accompany the task trainer.


Introduction: Since the introduction of the FAST exam, surgeon educators have debated how to teach novices this ultrasound technique. FAST training is typically done in a single massed session, but considerable hands-on skill must be maintained for the correct performance of a FAST and there is little literature regarding skill decay following initial FAST training. We hypothesized that knowledge and skill performance would decline throughout all time points following initial training.
Methods: This prospective observational study evaluated skill and knowledge retention following introductory FAST exam training (consisting of an in-person didactic session followed by hands-on education with standardized patients). First and second-year surgery residents were assessed using a written and hands-on test at pre/post-training, 1 month, 3 months, and 6 months after training. The Quality of Ultrasound Imaging and Competence (QUICk) score was used to grade the learner’s performance. Statistical analyses were performed using repeated measures ANOVA to assess mean scores across the study intervals.
Results: Thirty-nine surgical trainees were followed for 6 months with 92% retention. Twenty residents (51%) had previous FAST training and 61.5% were female but neither variable was associated with a significant difference in pre- or post-training written assessment or QUICk score. A significant increase in written score was noted following the initial training (66% vs 87%,p <0.001) but knowledge decay was not significant until the 3-month test. A significant increase in QUICk score was noted following the initial training (21.8 vs 55.1, p <0.001). While QUICk score significantly decreased at the 1-month test (55.1 vs 43.1, p <0.001), this decay stabilized and there was no significant difference between 1, 3, or 6-month scores.
Conclusion: Traditional ultrasound education has focused on achieving short-term competency. Although massed training is associated with a significant decline in skill performance at 1 month and knowledge at 3 months, the skill/knowledge decay stabilizes and remains higher than baseline for at least 6 months.
Introduction
Excessive stress can negatively impact surgeons’ technical and non-technical skills in the operating room. The development of an intraoperative simulation scenario that reliably increases surgeon stress significantly would be useful for the study of interventions that mitigate such stress. The purpose of this study was to develop and study the effectiveness of an intraoperative simulated scenario in increasing surgeons’ acute stress.
Methods
Based on interviews with surgeons and an iterative process of pilot testing with study-team members and scenario refinement, an acute pneumothorax scenario was developed that included an acute worsening of the patient’s vital signs during laparotomy necessitating chest tube placement. Practicing surgeons and residents voluntarily participated in this scenario and wore a commercial heart rate monitor to objectively assess their stress response. Heart rate metrics (root mean square of successive differences (RMSSD) in the R-R interval, Mean difference in R-R interval (mean RR), and average heart rate) were captured at baseline and throughout the scenario. Changes in heart rate metrics were compared between baseline and various events during the scenario (preoperative time out, inspecting the patient’s abdomen during exploratory laparotomy, and acute vital sign deterioration) using a one-way ANOVA and paired t-tests.
Results
Seven surgeons (57% attendings) completed our study. Compared to baseline, all surgeons experienced significantly heightened stress during the scenario based on heart rate metrics with large effect sizes (Cohen’s D for differences from baseline to the scenario: RMSSD=0.67, Mean RR=1.15, Average HR=0.86). Surgeons’ stress was limited to the vital sign deterioration segment of the scenario and was not affected during the preoperative timeout or laparotomy inspection (p<0.05).
Conclusions
We developed a surgical simulation scenario that reliably leads to increased surgeon (including attendings) stress based on objective heart rate metrics. This scenario can be used for the study of stress in a controlled environment to evaluate the impact of mitigation strategies such as the use of mental skills.
Introduction:
The eFAST has quickly replaced more invasive exams such as the diagnostic peritoneal lavage and can be performed by many different providers. In the training of general surgery residents (GSRs), there is a variation in eFAST competency, and the available curricula available lack standardization, summative assessment, and ability to demonstrate abnormal findings. Additionally, Tripu et al found that there was a perceived competency among surgical residents that was higher than the observed competence during the procedure.
Methods:
This curriculum followed Kern’s 6-step approach to curriculum development. The curriculum took place during two hours of two different days for all years of GSRs of the residency program. Expert US instructors were defined by having completed a dedicated ultrasound certification and included emergency medicine physicians and acute care/trauma surgeons. The participants rotated through two stations: one normal anatomy standardized patient and one abnormal anatomy using CAE Vimedix US Simulator. The participants completed a pre- and post-test, skills assessment, and satisfaction survey.
Results:
Day 1 included 11 participants and Day 2 included 13 participants. The pre-test average score was 70.9% correct with a post-test average of 97.3%. After the course, participants reported more confidence in performing the exam, identifying a positive finding, and making clinical decisions based on their imaging findings. Written comments from trainees regarding the curriculum included enjoyment of the standardized patient and CAE Vimedix US Simulator.
Conclusion:
This curriculum displayed an overall improvement in resident knowledge after participation, in addition to an increase in overall confidence in ability in performing eFAST as well as diagnosis and triage of trauma patients. The curriculum’s main strength is the use of a normal anatomy standardized patient and its ability to display abnormal anatomy with the CAE Vimedix US Simulator. This technology offers the ability to create clinical scenarios utilizing pathologic findings and correlate those findings clinically to determine a treatment plan for the patient. Future directions of this study should be to expand on the curriculum content to include more aspects of POCUS and the inclusion of a summative evaluation of the curriculum.
Background: The prevalence of robotic surgery has grown exponentially since its FDA approval in 2000. This shift in technology has raised questions about the impact on surgery resident training. There is concern that increased utilization of robotic surgery may result in deficiency in laparoscopic training. The aim of this study was to assess the recent impact of robotic surgery on general surgery resident training.
Methods: We performed a single institution, mixed methods study. A retrospective review of operative cases using the EPIC electronic health record from 2015-2022 was used to assess trends in common minimally invasive surgical procedures. Following this, a quantitative, anonymous survey was sent to former and current general surgery residents who graduated after 2015.
Results: Since 2016, 1436 robotic and 5504 laparoscopic surgeries were performed. There was a nearly six-fold increase in robotic surgery between 2016 and 2021, with a 216% increase from 2020-2023. By 2022, the number of robotic cholecystectomies, ventral hernias, and inguinal hernias bypassed the number of laparoscopic procedures (Figure).
75 current and former residents were surveyed. 11/33 (33%) alumni responded. 73% of alumni reported having a formal robotic curriculum with 90% receiving an equivalency certificate. All alumni felt they had a good or excellent laparoscopic experience and 80% had at least a good robotic experience. Almost all considered robotic surgery beneficial to training and did not impede laparoscopic training. 90% of alumni completed a fellowship, 50% in minimally invasive surgery.
23/42 (55%) current residents responded, one PGY1, 11 junior (PGY2-3) and 13 senior residents (PGY4-5). Respondents reported an average of 65% of cases performed robotically. 82.6% reported good or excellent robotic experience. Only 31% reported good or excellent laparoscopic experience. The majority (59%) felt robotic surgery impeded their ability to learn laparoscopy. Residents were more comfortable performing common surgical procedures robotic, over laparoscopic.
Conclusion: Robotic surgery is increasing exponentially while laparoscopic surgical procedures have begun to decline. If this trend continues, residents may become deficient, or less competent at these procedures. Given the prolonged learning curve in laparoscopic surgery, it may be necessary to implement defined minimums in laparoscopy in surgical training.

