Affordable Access

Publisher Website

Effectiveness of AI-powered Chatbots in responding to orthopaedic postgraduate exam questions-an observational study.

Authors
  • Vaishya, Raju1
  • Iyengar, Karthikeyan P2
  • Patralekh, Mohit Kumar3
  • Botchu, Rajesh4
  • Shirodkar, Kapil4
  • Jain, Vijay Kumar5
  • Vaish, Abhishek6
  • Scarlat, Marius M7
  • 1 Department of Orthopaedics, Indraprastha Apollo Hospitals, Sarita Vihar, New Delhi, 110076, India. [email protected]. , (India)
  • 2 Department of Orthopaedics, Southport and Ormskirk Hospital, Mersey West Lancashire Teaching NHS Trust, Southport, UK.
  • 3 Department of Orthopaedics, Safdarjung Hospital, New Delhi, India. , (India)
  • 4 Department of Musculoskeletal Radiology, Royal Orthopedic Hospital, Birmingham, UK.
  • 5 Department of Orthopaedics, RML Hospital, New Delhi, India. , (India)
  • 6 Department of Orthopaedics, Indraprastha Apollo Hospitals, Sarita Vihar, New Delhi, 110076, India. , (India)
  • 7 Clinique Chirurgicale St Michel, Groupe ELSAN Toulon, France. , (France)
Type
Published Article
Journal
International Orthopaedics
Publisher
Springer-Verlag
Publication Date
Aug 01, 2024
Volume
48
Issue
8
Pages
1963–1969
Identifiers
DOI: 10.1007/s00264-024-06182-9
PMID: 38619565
Source
Medline
Keywords
Language
English
License
Unknown

Abstract

This study analyses the performance and proficiency of the three Artificial Intelligence (AI) generative chatbots (ChatGPT-3.5, ChatGPT-4.0, Bard Google AI®) and in answering the Multiple Choice Questions (MCQs) of postgraduate (PG) level orthopaedic qualifying examinations. A series of 120 mock Single Best Answer' (SBA) MCQs with four possible options named A, B, C and D as answers on various musculoskeletal (MSK) conditions covering Trauma and Orthopaedic curricula were compiled. A standardised text prompt was used to generate and feed ChatGPT (both 3.5 and 4.0 versions) and Google Bard programs, which were then statistically analysed. Significant differences were found between responses from Chat GPT 3.5 with Chat GPT 4.0 (Chi square = 27.2, P < 0.001) and on comparing both Chat GPT 3.5 (Chi square = 63.852, P < 0.001) with Chat GPT 4.0 (Chi square = 44.246, P < 0.001) with. Bard Google AI® had 100% efficiency and was significantly more efficient than both Chat GPT 3.5 with Chat GPT 4.0 (p < 0.0001). The results demonstrate the variable potential of the different AI generative chatbots (Chat GPT 3.5, Chat GPT 4.0 and Bard Google) in their ability to answer the MCQ of PG-level orthopaedic qualifying examinations. Bard Google AI® has shown superior performance than both ChatGPT versions, underlining the potential of such large language processing models in processing and applying orthopaedic subspecialty knowledge at a PG level. © 2024. The Author(s) under exclusive licence to SICOT aisbl.

Report this publication

Statistics

Seen <100 times