Highlights
-
• Provides evidence of validity for a virtual reality spine simulator’s L4–L5 pedicle screw insertion scenario.
-
• Utilizes a comprehensive validation approach using traditional (face, content, construct and convergent validity) and contemporary validity frameworks.
-
• Suggests that combining simulator-derived metrics with OSATS ratings can enhance our understanding and assessment of surgical skills.
Introduction
Surgical training involves balancing skill acquisition with ensuring patient safety. Reference Leung, Luu, Regehr, Murnaghan, Gallinger and Moulton1–Reference Rattner, Apelgren and Eubanks3 This becomes particularly relevant in spine surgery due to its complexity and variability in resident exposure. Reference Daniels, Ames and Garfin4–Reference McGaghie6 Pedicle screw insertion is a common but technically demanding spine surgical procedure, involving a steep learning curve. Reference McGaghie6,Reference Manbachi, Cobbold and Ginsberg7 Potential risks include malposition rates ranging between 4.2% and 7.8%, making acquiring proficiency under direct supervision essential. Reference Gang, Haibo, Fancai, Weishan and Qixin8–Reference Hicks, Singla, Shen and Arlet11
Virtual reality simulation offers a promising role in providing a risk-free environment for procedural learning and skill refinement. Reference Manbachi, Cobbold and Ginsberg7,Reference Alotaibi, AlZhrani, Sabbagh, Azarnoush, Winkler-Schwartz and Del Maestro12,Reference Rogers, DeSantis, Janjua, Barry and Kuo13 However, current spine surgery simulators often lack high fidelity and comprehensive validity. Reference Jung, Muddaluru, Gandhi, Pahuta and Guha14 –Reference Wang and Shen17 In a recent review of augmented reality, virtual reality and mixed reality related to learning in healthcare professions, only a small fraction of training tools had evidence supporting face, content or construct validity. Reference Asoodar, Janesarvatan, Yu and de Jong18 The study highlights the need for face, content and construct validity assessment and the development of more relevant simulation training tools. Reference Jung, Muddaluru, Gandhi, Pahuta and Guha14–Reference Asoodar, Janesarvatan, Yu and de Jong18
The TSYM Symgery virtual reality platform aims to fill this gap by providing realistic pedicle insertion and performance feedback. Reference Sewell, Morris and Blevins19–Reference Azarnoush, Alzhrani and Winkler-Schwartz21 This study evaluates its educational utility through established traditional and contemporary validation frameworks. Reference Huang, Cheng, Bureau, Ladak and Agrawal22–Reference Fried, Sadoughi and Weghorst24 Specifically, it assesses face and content validity via questionnaire responses from experts, and construct validity by comparing simulator performance metrics and Objective Structured Assessment of Technical Skills (OSATS) scores between “less skilled” and “skilled” groups. Reference Huang, Cheng, Bureau, Ladak and Agrawal22,Reference Schout, Hendrikx, Scheele, Bemelmans and Scherpbier25,Reference Ledwos, Mirchi, Bissonnette, Winkler-Schwartz, Yilmaz and Del Maestro26 Convergent validity is explored by correlating simulator performance metrics with OSATS scores, the gold standard in surgical assessment. Reference Fried, Sadoughi and Weghorst24,Reference Faulkner, Regehr, Martin and Reznick27–Reference Orovec, Bishop and Scott29
This study seeks to answer the research question: What evidence of validity supports the educational utility of the TSYM simulator for spine surgery training? Therefore, the objectives of this case series study were 1) to evaluate face and content validity for an L4–L5 bilateral pedicle screw insertion simulation on the TSYM simulator platform, 2) to use simulation-derived metrics and the assessment of simulated pedicle screw insertion operative performance utilizing OSATS to assess construct validity, 3) to establish convergent validity employing simulation-derived metrics and simulated pedicle screw insertion operative performance OSATS and 4) to attempt to use the results to construct an argument supporting the TSYM simulator’s use for training residents and fellows in the L4–L5 bilateral pedicle screw insertion.
Methods
Participants
Neurosurgical and orthopedic residents, spine fellows, nonspine neurosurgical fellows who had experience in pedicle screw insertion and neurosurgical and orthopedic spine surgeons participated in this case series study. An exclusion criterion was previous experience with the TSYM simulator. Based on information from orthopedic and neurosurgical training programs in Quebec universities related to resident experience with clinical pedicle screw insertions, participants were categorized a priori into two groups: skilled participants (postgraduate year (PGY) 5–6 residents, fellows and spine surgeons) and less skilled residents in PGY 1 to 4. Participants signed an informed consent approved by the Neurosciences-Psychiatry McGill University Health Center Research Ethics Board. Participants then completed a demographic questionnaire and were provided with standardized written and verbal instructions on the steps and instruments to complete the simulated L4–L5 bilateral pedicle screw insertion on the TSYM simulator. Participants first performed a dry lab and an L2 simulated laminectomy procedure to become acquainted with the TSYM simulator (see Supplemental Information). After completing these tasks, participants performed a simulated L4–L5 bilateral pedicle screw insertion on the TSYM simulator. No time limit was imposed, but each step was dependent, requiring participant confirmation of step completion before proceeding. This article follows the Strengthening the Reporting of Observational Studies in Epidemiology reporting guidelines. Reference von Elm, Altman and Egger30
Virtual reality simulator platform
The TSYM Symgery simulation platform, developed by Cedarome Canada Inc. dba Symgery. (Montreal, Canada), was utilized in this study (Figure 1A). This simulator’s three-dimensional (3D) intraoperative spinal surgical procedures rely on a voxel-based system Reference Ledwos, Mirchi, Bissonnette, Winkler-Schwartz, Yilmaz and Del Maestro26 (Figure 1B). The simulator consists of a single haptic arm that provides continuous tactile, auditory and visual feedback while using the simulator’s surgical instruments (Figure 1C). This system is equipped with pre-programed surgical tools and captures multiple performance metrics, enabling a detailed analysis of surgical performance. The pedicle screw insertion simulation task consists of one animated and four deconstructed interactive steps described in Table 1. These steps were repeated for each screw. For standardization purposes, users performed the pedicle screw insertions using constant magnification and inserted 6.5 × 45 mm pedicle screws in a predetermined order: left L5, left L4, right L5, right L4 (see Supplementary Information). Participants had access to live X-rays to verify the entry point and angles for pedicle cannulation and confirm the accuracy of inserted screws. The Supplementary Video shows a skilled participant performing a pedicle screw insertion on the simulator.

Figure 1. TSYM virtual reality simulator platform developed by Cedarome Canada Inc. dba Symgery (Montreal, Canada) (A) The TSYM simulator set up, showing the (1) robotic arm that uses and provides advanced haptic feedback technology, (2) the different tool handles that can be used in the simulated scenario, (3) 3D monitor, (4) pedals for activating fluoroscopy and (5) secondary monitor. (B) A neurosurgical resident performing a task on the simulator, demonstrating its practical use in a training scenario. (C) The tool handles are available to mimic an array of tools in the virtual environment.
Table 1. Steps and tools utilized for each pedicle screw insertion simulation employing the TSYM simulator platform

Face and content validity
The spine surgeons and fellows assessed the face and content validity of the pedicle screw insertion simulation using questionnaires assessed with a 7-point Likert scale with 1 being “completely unrealistic” and 7 being “completely realistic”. Reference Ledwos, Mirchi, Bissonnette, Winkler-Schwartz, Yilmaz and Del Maestro26,Reference Almansouri, Abou Hamdan and Yilmaz31 While there is no universal median value for establishing sufficient face and content validity, this study considered the overall simulated procedure and its deconstructed tasks to demonstrate such validity if questionnaires achieved a median ≥ 4.0 on the 7-point Likert scale, consistent with prior studies. Reference Ledwos, Mirchi, Bissonnette, Winkler-Schwartz, Yilmaz and Del Maestro26,Reference Almansouri, Abou Hamdan and Yilmaz31
Construct validity
To assess construct validity, the study assessed each pedicle screw insertion independently and employed performance metrics derived from the TSYM simulator and blinded expert scoring using OSATS.
Simulation-derived tool metrics
The TSYM simulator continuously assessed several features of performance during pedicle screw insertion. Data on each tool’s 3D velocity, 3D force, maximum force, 3D acceleration and tool tissue contact were collected for each screw. The 3D force and maximum force refer to the forces applied to the haptic arm while using the tool. The 3D velocity and 3D acceleration of each tool are derived from the position of the tool’s tip in space. The tools that were assessed can be found in Table 1. The rationale to treat each pedicle screw insertion by each participant independently was that each screw insertion involved a different simulated vertebral entry point, orientation, and angulation.
Randomized-blinded OSATS assessment
In concert with the simulator-derived performance metrics, the study utilized the validated methodology of learner-operative performance assessment employed by surgical educators in human operative settings, OSATS ratings, to determine construct validity. Reference van Hove, Tuijthof, Verdaasdonk, Stassen and Dankelman28,Reference Orovec, Bishop and Scott29 Each participant’s simulated L4–L5 bilateral pedicle screw insertion was recorded on-screen, which was later subdivided into four videos, one for each pedicle screw insertion. Video recordings of each lumbar pedicle screw insertion were randomized and blindly rated by two experts with experience performing human pedicle screw insertions. The OSATS scale was adapted to the simulator’s capabilities, resulting in five items (respect for tissue, instrument handling, the economy of movement, flow and knowledge of procedure) and an overall rating. Each performance was rated on a 7-point Likert scale. The OSATS scale demonstrated excellent internal consistency (α = 0.97 [95% CI, 0.96, 0.98]) and excellent inter-rater reliability (α = 0.97 [95% CI, 0.97, 0.98]).
Convergent validity
The simulation-derived tool metrics were correlated with the average OSATS ratings to assess convergent validity. A two-tailed Spearman rank order correlation coefficient was calculated between all collected data for each tool metric that achieved evidence of construct validity and each OSATS item.
Statistical analysis
Collected data were imported into Python to develop tool metrics. Outliers in tool metrics were identified and imputed in MATLAB R2023b. All other statistical assessments were performed on SPSS (version 29.0; IBM, Armonk, New York). The data were not normally distributed as assessed by Shapiro-Wilk’s test (P < 0.05). Mann–Whitney U-tests assessed statistical differences between groups for each performance measure, and effect sizes for significant findings were reported (Cohen’s r). A two-tailed Spearman rank order correlation coefficient examined associations between performance metrics.
Results
Participants
Participants’ demographic data and relevant experience are presented in Table 2. A total of 27 participants from two Quebec universities were included. While the participant pool is small, other studies have successfully assessed face, content and construct validity of two different spine surgery virtual reality simulators with a similar participant size. Reference Ledwos, Mirchi, Bissonnette, Winkler-Schwartz, Yilmaz and Del Maestro26,Reference Alkadri, Del Maestro and Driscoll32 The skilled group reported a mean of 452 pedicle screws (SD = 883.6) inserted independently, while the less skilled group reported a mean of 0.5 pedicle screws (SD = 1.4) inserted. The difference between the two groups was statistically significant (P < 0.001). Since each participant inserted 4 screws, a total of 108 simulated screws were inserted. One screw was removed from the study due to a technical issue, resulting in 107 screws available for analysis. Therefore, 107 videos, one for each pedicle screw insertion, were evaluated using OSATS.
Table 2. Demographic data for the two groups performing the simulated pedicle screw insertion on the TSYM simulator platform

PGY = Postgraduate year; SD = standard deviation; **No significant difference was found between the two groups except for the mean number of reported pedicle screws inserted (P < 0.001).
Face and content validity
The pedicle screw insertion simulation median ratings and ranges for face and content validity are outlined in Table 3. The four participating spine surgeons and two spine fellows assessed face and content validity. This group rated the simulated procedure’s overall realism with a 5.0 median (range 3.0–6.0) rating, consistent with face validity. All steps achieved evidence of content validity (median ≥ 4.0) except the pre-threading step using the tap, which was rated a median of 3.5 (range 1.0–5.0). The skilled group rated the simulated procedure’s overall realism with a 5.0 median (3.0–6.0) rating.
Table 3. Face and content validity

The median score on a 7-point Likert scale for face and content validity for the spine fellows and surgeons after completing the pedicle screw simulation.
Construct validity
Simulation-derived tool metrics
All simulation-derived tool metrics were assessed between the groups (Table 4). Significant differences were found between the two groups in 4 of 25 performance metrics. We anticipated observing group differences between 3D velocity and 3D acceleration of the tap screw at step 3A and tool contact and maximum force of the screwdriver in step 4. Reference Ebina, Abe and Higuchi33–Reference Mirchi, Bissonnette and Ledwos35 While pre-threading the channel with the tap, the skilled group showed a significant increase in 3D velocity when compared to the less skilled group (0.0014, 95% CI [0.00119, 0.00153] vs 0.001, 95% CI [0.0012, 0.0013]; Cohen’s r = 0.20; P = 0.04). Using the tap, the less skilled group showed a significantly higher 3D acceleration than the skilled group (4.36e-9, 95% CI [-7.26e-9, 16e-9] vs 5.43e-10, 95% CI [-5.19e-9, 6.28e-9]; Cohen’s r = 0.24; P = 0.01). Although the 3D acceleration values were small across both groups, statistical analysis confirmed a significant difference (P = 0.01). During the insertion of the screw with the screwdriver, the less skilled group applied significantly more maximum force than the skilled group (10.14, 95% CI [7.34, 12.96] vs 7.52, 95% CI [5.07, 9.96]; Cohen’s r = 0.20; P = 0.04) and spent significantly more time in contact with surrounding tissue than the skilled group (0.22, 95% CI [0.18, 0.25] vs 0.11, 95% CI [0.09, 0.13]; Cohen’s r = 0.47; P < 0.001). These differences are depicted in Figure 2.

Figure 2. Significant performance assessments of the task using simulation-generated performance metrics. (A) Tap screw’s 3D velocity. (B) Tap screw’s 3D acceleration. (C) Screwdriver max force on the pedicle. (D) Screwdriver contact with pedicle. The central line indicates the mean value for each group. *Represents a significant difference between groups after Mann–Whitney U, nonparametric test (P < 0.05). **Represents a significant difference between groups after Mann–Whitney U, nonparametric test (P < 0.01).
Table 4. Simulation-derived metrics obtained from the L4-L5 bilateral pedicle screw insertion simulation on the TSYM simulator and corresponding Mann–Whitney U-test P-value

* Significant p-value for Mann–Whitney U-test, nonparametric test (P < 0.05).
Randomized, blinded OSATS ratings
An average rating for each OSATS item was calculated for each screw video by blinded ratings provided by two experts. The skilled group achieved a significantly higher mean overall OSATS rating compared to the less skilled group (5.02, 95% CI [4.63, 5.41] vs 3.30, 95% CI [2.92, 3.69]; P < .001). In each OSATS item (instrument handling, respect for tissue, economy of movement, flow and knowledge of procedure), the skilled group significantly outperformed the less skilled group (P < 0.001 for each item; respective Cohen’s r = 0.55, 0.43, 0.55, 0.54, 0.52, 0.52). Group differences are outlined in Figure 3.

Figure 3. Performance assessment of the pedicle screw insertion task using OSATS. *Represents a significant difference between groups after Mann–Whitney U-test, nonparametric test (P < 0.05). **Represents a significant difference between groups after Mann–Whitney U-test, nonparametric test (P < 0.01). OSATS = Objective Structured Assessment of Technical Skills.
Convergent validity
A two-tailed Spearman rank order correlation coefficient was calculated between each item of the OSATS ratings and the four significant tool metrics (screwdriver maximum force, screwdriver tool contact, 3D velocity using the tap and 3D acceleration using the tap). As predicted, the maximum force using the screwdriver had significant negative correlations with all OSATS items: respect for tissue, instrument handling, economy of movement, flow, knowledge of procedure and overall (Spearman’s coefficient = −0.32, P < 0.01; Spearman’s coefficient = −0.39, P < 0.01; Spearman’s coefficient = −0.37, P < 0.01; Spearman’s coefficient = −0.38, P < 0.01; Spearman’s coefficient = −0.29, P < 0.01; Spearman’s coefficient = −0.33, P < 0.01, respectively). As predicted, tool contact using the screwdriver significantly correlated with respect for tissue, instrument handling, economy of movement, flow, knowledge of procedure and overall (Spearman’s coefficient = −0.25, P < 0.01; Spearman’s coefficient = −0.34, P < 0.01; Spearman’s coefficient = −0.42, P < 0.01; Spearman’s coefficient = −0.43, P < 0.01; Spearman’s coefficient = −0.31, P < 0.01; Spearman’s coefficient = −0.31, P < 0.01, respectively). The tap’s 3D velocity significantly correlated with four out of six OSATS items, including economy of movement, flow, knowledge of procedure and overall (Spearman’s coefficient = 0.29, P < 0.01; Spearman’s coefficient = 0.25, P = 0.01; Spearman’s coefficient = 0.21, P = 0.03; Spearman’s coefficient = 0.20, P = 0.04). No significant correlations were found between the 3D acceleration and OSATS items. Table 5 outlines the associations between these performance metrics.
Table 5. Convergent validity determination between simulation-derived performance metrics and OSATS scoring

* Significant ρ-value for Spearman’s rank coefficient of correlation (ρ < 0.05). ** Significant ρ-value for Spearman’s rank coefficient of correlation (ρ < 0.01). aSimulation-derived performance metrics that showed construct validity. OSATS = Objective Structured Assessment of Technical Skills.
Discussion
The present study offers insight for surgical educators and researchers interested in spine simulation. First, the study’s pedicle screw insertion simulation demonstrated varying degrees of validity. Second, to our knowledge, this is the first study to correlate simulator-derived metrics with OSATS ratings to assess the convergent validity in a virtual reality spine platform. Finally, the dual performance assessment approach, using OSATS ratings and simulator-derived metrics, offers a comprehensive understanding of learner-operative performance.
Face, content and construct validity
This study used traditional (face, content and construct validity) and contemporary frameworks to construct a validity argument for the TSYM simulator’s use in surgical training. Reference Huang, Cheng, Bureau, Ladak and Agrawal22–Reference Fried, Sadoughi and Weghorst24 Face validity was included as subjective feedback but was not central to the validity argument. OSATS findings provided the strongest support, while evidence from the other measures was less robust, given their variability and small effect sizes.
Face and content validity were supported, with eight of nine statements rated with a median of 4.0 or greater by six participating spine surgeons and fellows. Reference Ledwos, Mirchi, Bissonnette, Winkler-Schwartz, Yilmaz and Del Maestro26,Reference Almansouri, Abou Hamdan and Yilmaz31 However, the variability of the results was wide (range: 1–7), and expert verbal feedback indicated that torque feedback from the tap for pre-threading the inner pedicle canal could be improved. Thus, the present results must be interpreted with care.
For construct validity, 4 of 25 simulation-derived tool metrics significantly distinguished the two groups with small effect sizes: 3D velocity and 3D acceleration of the simulated tap screw, and the maximum force and the tool contact of the simulated screwdriver. The skilled group exhibited higher 3D velocity and lower acceleration with tap screw use than the less skilled. These patterns are associated with previous studies showing smoother, controlled movements among surgical experts. Reference Gang, Haibo, Fancai, Weishan and Qixin8,Reference Gonzalvo, Fitt and Liew9,Reference Ebina, Abe and Higuchi33 Conversely, the less skilled group’s unfamiliarity with this instrument may have resulted in lower tap velocity. Meanwhile, the maximum force applied by the screwdriver was significantly higher for the less skilled group than for the skilled group. This is consistent with previous virtual reality studies, Reference Ebina, Abe and Higuchi33–Reference Reich, Mirchi and Yilmaz37 which show that more skilled participants tend to apply less instrument force, recognizing that excessive force may compromise patient safety. Reference Mirchi, Bissonnette and Ledwos35 The less skilled group’s higher screwdriver contact could likely be attributed to less precision, causing unintended tissue contact. The skilled group significantly outperformed the less skilled group in each OSATS component (Figure 3). These findings provide evidence of construct validity for the TSYM simulator’s pedicle screw insertion simulation.
Correlating simulation-derived performance metrics and OSATS ratings for convergent validity
Three of four simulation-derived performance metrics significantly correlated with all OSATS items, with moderate effect sizes, providing evidence of convergent validity for the TSYM simulator and suggesting several important implications. Screwdriver maximum force and tool contact were negatively correlated with all OSATS items, while 3D velocity using the tap positively correlated with four OSATS items: the economy of movement, flow, knowledge of procedure and overall score, supporting convergent validity. The less skilled groups’ lower OSATS ratings were consistent with their poorer performance on these key simulation-derived metrics. Instrument handling and respect for tissue did not significantly correlate with the 3D velocity using the tap, while its 3D acceleration did not significantly correlate with any OSATS item. These findings suggest that OSATS may not fully capture key performance features, possibly due to limitations of visual assessment in evaluating instrument dynamics, like acceleration within the bone channel. Reference Mirchi, Bissonnette and Ledwos35,Reference Mirchi, Bissonnette, Yilmaz, Ledwos, Winkler-Schwartz and Del Maestro38 Although OSATS is a validated tool for assessing surgical performance, several studies have questioned its ability to reflect the full complexity of operative performance. Reference Bernard, Dattilo, Srikumaran, Zikria, Jain and LaPorte39,Reference Anderson, Long, Thomas, Putnam, Bechtold and Karam40 This study indicates that combining OSATS with simulator-derived metrics could provide a more formative and comprehensive approach to evaluating and improving surgical skills. It also provides support for further research on the TSYM simulator’s potential to predict future pedicle screw insertion performance in patients.
TSYM as an educational tool
The results suggest that the TSYM simulator pedicle screw insertion scenario may be useful for the evaluation and training of less skilled learners, specifically on the four metrics showing construct validity. Virtual reality simulators have been assessed in pedicle screw placement training and have improved the accuracy and skill acquisition of pedicle screw placement. Reference Grantcharov and Reznick5,Reference McGaghie6,Reference Azarnoush, Alzhrani and Winkler-Schwartz21,Reference Yilmaz, Winkler-Schwartz and Mirchi41,Reference Hou, Lin, Shi, Chen and Yuan42 Further, incorporating virtual reality simulation into the spine surgery learning curriculum may benefit less skilled trainees by providing a valuable platform for practicing complex spine procedures and supporting formative skill development. Reference AlOtaibi, Al Zhrani, Bajunaid, Winkler-Schwartz, Azarnoush and Mullah20,Reference Azarnoush, Alzhrani and Winkler-Schwartz21 However, the TYSM simulator pedicle screw insertion scenario may benefit from modification to meet its full potential as a surgical education system.
This study’s findings align with prior research on neurosurgical simulators. A systematic review found that while the visual appearance of neurosurgical virtual reality simulators is generally favorable, haptic feedback remains a limitation across simulation platforms. Reference Chawla, Devi, Calvachi, Gormley and Rueda-Esteban43 This aligns with this study’s face and content validity results, where haptic-related features showed a greater variability in expert responses. Related to construct and convergent validity, Ledwos et al. demonstrated that skilled participants utilized greater maximum force than less skilled participants on a virtual reality spine simulator, while a systematic review identified maximum force as a reliable indicator of surgical expertise. Reference Ledwos, Mirchi, Bissonnette, Winkler-Schwartz, Yilmaz and Del Maestro26,Reference Chan, Pangal and Cardinal44 Further, an umbrella review suggests that performance metrics related to force and kinematics effectively ascertain skill level. Reference Harley, Tawakol, Azher, Quaiattini and Del Maestro45 Another systematic review found that neurosurgical virtual reality simulators’ performance metrics correlate well with intraoperative skills. Reference Chawla, Devi, Calvachi, Gormley and Rueda-Esteban43 Together, these studies support our investigation’s convergent validity findings, showing that key simulator-derived metrics, particularly those related to force and motion, align with OSATS ratings and can effectively distinguish between levels of expertise.
With the vast data generated from virtual reality simulators like the TSYM platform, artificial intelligence (AI) methodologies may enhance the understanding of surgical skills’ precision and granularity. Reference Mirchi, Bissonnette and Ledwos35,Reference Alkadri, Ledwos and Mirchi36,Reference Yilmaz, Winkler-Schwartz and Mirchi41,Reference Winkler-Schwartz, Yilmaz and Mirchi46 Further, it can be utilized to create intelligent tutoring systems, like the Intelligent Continuous Expertise Monitoring System. Reference Hou, Lin, Shi, Chen and Yuan42 However, incorporating human educator input is essential, as these systems have been linked to unintended outcomes. Reference Fazlollahi, Yilmaz and Winkler-Schwartz47,Reference Yilmaz, Bakhaidar and Alsayegh48 A recent randomized clinical trial demonstrated that AI-augmented personalized expert instruction resulted in improved simulated surgical performance, suggesting that spine simulation platforms may benefit from utilizing these technologies in future studies and curriculum design. Reference Giglio, Albeloushi and Alhaj49 Deep learning models that integrate simulator-derived metrics and equivalent OSATS video ratings may enable future AI systems to predict OSATS scores only using simulator data. Reference Yilmaz, Bakhaidar and Alsayegh48 Finally, implementing this data with intelligent tutoring systems can contribute to developing an “Intelligent Operating Room” that continually assesses and trains learners while minimizing surgical errors. Reference Almansouri, Abou Hamdan and Yilmaz31,Reference Mirchi, Bissonnette, Yilmaz, Ledwos, Winkler-Schwartz and Del Maestro38,Reference Yilmaz, Winkler-Schwartz and Mirchi41,Reference Fazlollahi, Bakhaidar and Alsayegh50
Limitations
The TSYM simulation platform has limitations. The pedicle screw insertion simulation does not capture the dynamic intraoperative learning environment, the flexible sequence during human procedures and bimanual psychomotor skills utilized during patient spinal procedures, given its single-handed robotic arm setup. Reference AlOtaibi, Al Zhrani, Bajunaid, Winkler-Schwartz, Azarnoush and Mullah20,Reference Mirchi, Bissonnette, Yilmaz, Ledwos, Winkler-Schwartz and Del Maestro38,Reference Anderson, Long, Thomas, Putnam, Bechtold and Karam40 The present study’s sample size was limited due to clinical commitments, limiting the generalization of results. Further, the statistical analyses for construct and convergent validity may have been underpowered, with significant findings possibly due to Type I error and reflected in the low to moderate effect sizes. While common in surgical education research, this limitation underscores the need for larger, multi-institutional samples to improve robustness and generalizability. Reference Winkler-Schwartz, Yilmaz and Mirchi46 Additionally, the study may be subject to potential biases, such as preconceived notions and social desirability bias, as face and content validity were measured through self-reports. Reference Nickerson51,Reference Althubaiti52 In this study, each pedicle screw insertion was evaluated individually due to variations in entry points, screw angulation and anatomy. Larger studies are needed to evaluate how repeated insertions affect the learning curves of less skilled and skilled individuals. Finally, to standardize the procedure, participants used a fixed-size screw, despite the TSYM platform offering various screw sizes and lengths to assess procedural skill.
Conclusion
While several limitations and challenges exist with the TSYM simulator platform pedicle screw insertion scenario, some performance metrics, including screwdriver maximum force, screwdriver tool contact and Tap 3D velocity, show potential to assist in surgical teaching. The information garnered from this study may allow improvements in the TSYM simulator to optimize future performance.
Supplementary material
The supplementary material for this article can be found at https://doi.org/10.1017/cjn.2025.10404.
Acknowledgments
The authors would like to thank all the neurosurgeons and orthopedic spine surgeons, fellows, along with neurosurgical and orthopedic residents who participated in this study. Special thanks to Dr Ahmed Aoude for allowing the authors to use the Orthopedic Research Laboratory, Montreal General Hospital, for these studies, and Dr Greg Berry for helping organize the use of orthopedic facilities and recruiting orthopedic residents for the study. The authors also thank Dr Zhi Wang, Dr Sung-Joo Yuh, Dr Ahmed Aoude, Dr Lucy Luo, Dr Ahmad Alsayegh, Dr Mohamad Bakhaidan, Dr Carlo Santaguida and Dr Abdulrahman Almansouri for their help recruiting trial participants. Finally, a special thanks to Dr Jason Harley for their expert educational input and to Dr Jose Correa for providing statistical input. This study was supported by Mitacs Accelerate Grant, Brain Tumour Foundation of Canada-Brain Tumour Research Grant, a Medical Education Research Grant from the Royal College of Physicians, the Franco Di Giovanni Foundation and the Montreal Neurological Institute and Hospital, McGill University. Cedarome Canada Inc. dba Symgery supplied the TSYM Symgery virtual reality nonimmersive simulator platform utilized for these investigations.
Author contributions
Trisha Tee: Contributed to conceptualization, methodology, data collection, formal analysis, investigation and writing. Noel Abboud: Contributed to methodology, formal analysis and writing. Bilal Tarabay: Contributed to conceptualization and methodology, formal analysis, data collection, participant recruitment and writing – review and editing. Abdulmajeed Albeloushi: Contributed to conceptualization and methodology, data collection and participant recruitment. Puja Pachchigar: Contributed to conceptualization and methodology, formal analysis, data collection and processing and participant recruitment. Mohamed Alhantoobi: Contributed to conceptualization and methodology, and formal analysis. Nour Abou Hamdan: Contributed to conceptualization and methodology and formal analysis. Recai Yilmaz: Contributed to conceptualization and methodology, formal analysis, and writing – review and editing. Ali Fazlollahi: Contributed to conceptualization and methodology. Rolando F. Del Maestro: Contributed to project creation, conceptualization, methodology, resources, investigation, project funding, guidance, supervision of this research, interpreting results, writing – original draft and writing – review and editing.
Funding statement
Trisha Tee, Bilal Tarabay and Puja Pachchigar are supported by a Mitacs Accelerate Internship Grant. Trisha Tee also received support from a Masters-CIHR. Dr Recai Yilmaz was supported by a Brain Tumour Foundation of Canada-Brain Tumour Research Grant, a Medical Education Research Grant from the Royal College of Physicians, a Max Binz Fellowship from McGill University Internal Studentships and a grant from the Fonds de recherche du Quebec–Santé. Dr Rolando Del Maestro is affiliated with a CIHR-funded grant but did not receive direct financial support from this grant. The Franco Di Giovanni Foundation supported the lab computer technology, and the Montreal Neurological Institute and Hospital provided lab space. The authors have no personal, financial or institutional interest in any of the drugs, materials or devices described in this article.
Competing interests
Dr Rolando Del Maestro and Dr Recai Yilmaz are co-inventors for pending patents related to training platforms and intelligent monitoring systems with patent numbers 05001770-843USPR and 05001770- 883USPR, respectively. These patents are not associated with the study. Dr Rolando Del Maestro held positions as the President American Osler Society from 2023–2024 and is currently a Member Board of the American Osler Society since 2022, Chairperson Osler Library Standing Committee since 2015, and a Member Board of the Osler Library at McGill University since 2015. No payments were received from these roles.