AI in clinical practice: positive outcomes dominate but questions linger, study finds

In a recent study posted to the medRxiv* preprint server, researchers performed a comprehensive search for randomized controlled trials (RCTs) involving artificial intelligence (AI) algorithms published between 2018 and 2023 on PubMed and the International Clinical Trials Registry Platform (ICTRP). 

Specifically, the current scoping review evaluated study endpoints, intervention features, and RCT outcomes to inform stakeholders about the clinical relevance of AI, which, in turn, might help improve care management and medical decision-making while identifying areas that require further work in this rapidly evolving research domain. 

Study: Randomized Controlled Trials Evaluating AI in Clinical Practice: A Scoping Evaluation. Image Credit: metamorworks/

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.


The Food and Drug Administration (FDA) approved ~300 AI-enabled medical devices after several research studies reported these models performed superior to clinicians; however, only a few AI-enabled medical devices have undergone evaluation using prospective RCTs.

For instance, a widely used AI model, the sepsis model, was found to perform worse than was reported by its developer, resulting in multiple incorrect alerts.

When deployed prospectively, AI-based devices perform worse, and adopting AI in clinical practice could further diminish its potential benefits.

About the study

In the present study, researchers used keywords related to artificial intelligence, clinician, and clinical trial, to name a few, and identified RCTs published in English on PubMed and the ICTRP between January 1, 2018, and August 18, 2023, that met the following criteria:

i) used a non-linear computational model based on AI as an intervention; 

ii) integrated AI-based intervention into clinical practice, such that it impacted patient health; and 

iii) published as a full-text peer-reviewed article. 

Two independent investigators used the Covidence Review software for the initial screening, followed by a full-text screening, while a third reviewer resolved discrepancies (if any) through discussion.

The team retrieved information regarding the study site, clinical task, results, and the type of AI used from all eligible RCTs.

Additionally, they categorized studies by their primary endpoints, e.g., care management, medical specialty, and AI-used data modality. Finally, they presented simple descriptive statistics to provide an overview of all the eligible trials. 

The current study adhered to preferred reporting items for systematic reviews and meta-analyses (PRISMA) guidelines.


A total of 84 RCTs constituted this study's analytical dataset, which revealed several notable trends with implications for the development of AI in real-world clinical settings.

Of these 84 studies, 71 and 13 were sourced through primary and reference screening, respectively.

Most RCTs were gastroenterology-related (35/84), followed by radiology, surgery, and cardiology, with 13, five, and five RCTs, respectively.

Four research groups from Wuhan University, Wision AI, Medtronic, and Fujifilm conducted most gastroenterology-related RCTs (24/35), which were notable for their uniformity and testing of video-based machine learning (ML) algorithms with help from clinicians.

The United States (US) led the way, followed by China, suggesting most RCTs were single-site studies. Indeed, there is a need for multi-center international trials to ensure tests of AI systems are valid across diverse populations and healthcare systems. 

China predominantly conducted gastroenterology-related RCTs (19/24), while RCTs conducted in the US covered multiple medical specialties. Multi-center RCTs mainly involved European nations, while single-site RCTs evaluating an average of 359 patients were predominant (52/84) in the final study set.

Compared to success rates observed in historical reviews of RCTs for AI in healthcare, most RCTs evaluating AI-based medical devices in clinical practice fetched more positive outcomes for all primary endpoints evaluated (69/84).

Such a high success rate lends credibility to clinical AI; however, it is also possible that the nascency of the field and publication bias might have tempered these observations.

Furthermore, most RCTs evaluating interventions on diagnostic accuracy offering convincing prospective evidence of the performance of clinical AI might not be a precise representation of improved patient outcomes.

Thus, RCTs assessing AI algorithms in healthcare should focus on incorporating clinically meaningful endpoints, e.g., patient symptoms, survival, and treatment needs.


Overall, the existing RCTs on AI in clinical practice demonstrated an increasing interest in AI applications across wide-ranging medical specialties and locations.

However, given AI's limitations in the healthcare domain, further research focused on multi-center RCTs incorporating diverse clinically meaningful endpoints is needed. 

*Important notice: medRxiv publishes preliminary scientific reports that are not peer-reviewed and, therefore, should not be regarded as conclusive, guide clinical practice/health-related behavior, or treated as established information.

Journal reference:
  • Preliminary scientific report.

    Han, R. et al. (2023) "Randomized Controlled Trials Evaluating AI in Clinical Practice: A Scoping Evaluation". medRxiv. doi: 10.1101/2023.09.12.23295381.

Posted in: Device / Technology News | Medical Procedure News | Medical Science News | Medical Research News | Medical Condition News | Healthcare News

Tags: Artificial Intelligence, Cardiology, Clinical Trial, Diagnostic, Food, Gastroenterology, Healthcare, Machine Learning, Medical Devices, Radiology, Research, Sepsis, Software, Surgery

Comments (0)

Written by

Neha Mathur

Neha is a digital marketing professional based in Gurugram, India. She has a Master’s degree from the University of Rajasthan with a specialization in Biotechnology in 2008. She has experience in pre-clinical research as part of her research project in The Department of Toxicology at the prestigious Central Drug Research Institute (CDRI), Lucknow, India. She also holds a certification in C++ programming.

Source: Read Full Article