Gen AI: A Medical Co-Pilot

Posted by
Peter Lee, co-author of the book "The AI Revolution in Medicine. GPT-4 and Beyond" published in April 2023
Peter Lee, co-author of the book “The AI Revolution in Medicine. GPT-4 and Beyond” published in April 2023

ChatGPT can diagnose rare genetic diseases but also make life-threatening errors. This thin line between finesse skills and nonsense makes applying AI in medicine challengingaccording to the authors of “The AI Revolution in Medicine. GPT-4 and Beyond.”

I reviewed the book and interviewed co-author Peter Lee.


At a glance:

  • Around 10% of doctors already use generative AI;
  • ChatGPT delights in accuracy but is sometimes manipulative;
  • Thus, the use of Large Language Models (LLMs) always needs a doctor in the loop;
  • GPT-4 answered correctly 90% of questions from the US Medical Licensing Examination asked by the book’s authors;
  • Giving up on using AI because it hallucinates is not a solution;
  • Humans should “trust, but verify” the outputs from GPT-4;
  • GPT-4’s most impressive attributes: the ability to assist in the diagnosis of rare diseases through an advanced understanding of genetics and remarkable empathy when communicating with both patients and doctors.

A book on hyped ChatGPT in medicine, published just months after OpenAI Large Language Model (LLM) premiere, may generate both interest and skepticism. But three experts—Peter Lee (Microsoft), Carey Goldberg (Massachusetts Institute of Technology), and Isaac Kohane (Harvard Medical School)—managed to write a comprehensive manual for ChatGPT in healthcare.

They carefully test and check GPT-4 capabilities to diagnose, calculate medical indicators, transform medical notes into FHIR data standards, write clinical letters, and answer patients’ complex questions.

The outcomes are both astonishing and sobering.

A non-medical tool with medical-grade capabilities

In the foreword, Sam Altman (OpenAI CEO) writes that GPT-4 can enable access to medical knowledge among billions of people who lack decent care, generate summaries of research papers, assist doctors or nurses with clinical decision-making or documentation, and create educational materials.

Following current surveys, around 10% of doctors already use generative AI. When applying ChatGPT’s guidance, they hope to improve their decision-making and help patients. But unintentionally, they enter an unknown territory full of tricky traps.

AI delights in accuracy but is sometimes manipulative. It is like a sharp knife helping a good chef master great meals while hurting beginners’ fingers. In medicine, it can also harm others.

The biggest problem is that we all are amateurs when it comes to GPT-4. Scientists created a super-advanced technology that developed its own rules of operation. Good news: it works! Bad news: nobody knows how and why.

Imagine billions of data sets collected somewhere on the internet processed by an AI model constructed of ~1 trillion parameters—no chance of tracking back its reasoning.

ChatGPT can find a diagnosis that doctors wouldn’t come up with and miscalculate drug dosages, thus putting the patient at risk of death. Therefore, the use of LLMs always needs a doctor in the loop. This gives rise to a new partnership between healthcare professionals and AI, transforming current human cognition-based medicine into future “symbiotic medicine”—a cohesion where humans know how to ask AI to get precise answers and trust AI but also learn how to verify its output.

AI is a powerful tool that can reduce inequity and improve life for millions of people around the world - writes Bill Gates in the foreword.
AI is a powerful tool that can reduce inequity and improve life for millions of people around the world—writes Bill Gates in the foreword.

Prompt in, “Wow” out

GPT-4 knows surprisingly a lot.

It correctly answered 90 percent of questions from the US Medical Licensing Examination (USMLE) asked in the tests performed by the book’s authors. Gen AI demonstrates impressive reasoning capabilities and is a good conversationist, which makes it trustworthy at first sight. It can summarize a conversation between a patient and a doctor and write a clinical encounter to include in an electronic health record.

It even cleverly evades answering unethical questions and constructs mind-blowing sentences that sound empathetic, as if spoken by a compassionate doctor. It’s eloquent, creative, and multilingual. It can explain to a child the results of a complex research paper and construct engaging medical quizzes.

On the patient’s end, ChatGPT already helps to make more informed decisions about treatment options, insurers, testing, or simply wellness. Just ask I would like to lose weight”, and you will get a list of advice less biased than the ones you would get from your best friend.

Subtle errors, severe consequences

Examples of GPT-4’s knowledge demonstrate that we are dealing with a tool that has mathematical, statistical, medical, and linguistic capabilities that will have a significant impact on medicine.

Nevertheless, all enthusiasm around LLMs needs to be cooled. It’s too early to talk about a revolution. “GPT-4 is not an end in and of itself,” according to the research conducted by the authors. GPT-4 still gets confused and creates fictions that appear reasonable and convincing. It fabricates information and can stubbornly tell you you are wrong when you are right. On the first try, AI can answer a question perfectly, only to find that the second time it hallucinates and behaves like a child—when caught red-handed, it even tries to explain that a fatal mistake was “just a typo.”

The list of limitations is as long as the list of benefits: GPT-4 has no access to knowledge after January 2022, when it was taken offline to be trained. It doesn’t have a long-term memory—unlike humans, the newly generated knowledge is lost as soon as the conversational session is finished. It means ChatGPT can’t see patients because it doesn’t remember them and must start every interaction from scratch. Session-size limits make analysis of often long electronic medical records impossible.

Use, trust but double-check

Throwing gen AI into a bin because it hallucinates is not a solution. Medicine urgently needs rescue amid healthcare professional shortages, administrative burdens, rising burnout among doctors and nurses, and limited and unequal access to healthcare. These troubles can no longer be solved using old methods—we have repeatedly tried and failed.

When discussing the limitations of AI, let’s be honest: humans also make mistakes. Medical error is the third-leading cause of death. GPT-4 has the potential to become a clinical copilot and patient well-being assistant, contributing to safer medicine, patient-oriented healthcare, and superhuman clinical performance.

“Medicine traditionally refers to a sacred relationship between a doctor and a patient—a twosome, a dyad. Now we move to a triad with AI as a new partner,” according to Peter Lee.

It’s also clear that we can’t continue Wild Wild West regarding AI. GPT-4 must be regulated but not overregulated to avoid creating a massive brake on LLMs’ development for use in healthcare. We have too much to lose to nip this innovation in the bud. Similarly to mobile health applications, GPT-4 applications can be divided into low-risk and high-risk. Writing insurance letters doesn’t require much control while making diagnoses—which affect patients directly—already requires supervision.

Everything would be a lot easier if AI were never wrong; however, that’s not going to happen anytime soon. Doctors and nurses, nevertheless, will benefit from AI if only they learn how to handle it.


Peter Lee, Corporate Vice President, Microsoft Research & Incubations
Peter Lee, Corporate Vice President, Microsoft Research & Incubations

How healthcare professionals should navigate ChatGPT. Interview with Peter Lee, co-author of the book “The AI Revolution in Medicine. GPT-4 and Beyond.”


Many studies suggest that physicians already use GPT-4, for example, as clinical decision support. Based on your experience, what would you advise them? How should they formulate questions and verify AI’s answers?

It is important that any new technology is used responsibly, and, as we explain in our book, we advise that humans should “trust, but verify” the outputs from GPT-4. In other words, humans should always be in the loop and think of the AI as a kind of “copilot” partner.

We also point out that, often, GPT-4 is better at reviewing and evaluating ideas and proposed decisions than it is in generating them (the term “generative AI” notwithstanding). So, for example, if a doctor has a proposed differential diagnosis, it can be an excellent idea to ask GPT-4 to check it for any possible omissions or errors. Similarly, if a nurse makes a calculation for the IV administration of a medication, it is a very good idea to ask GPT-4 to “double-check” the calculation.

Such uses are basically using GPT-4 as a “second set of eyes,” to help reduce medical errors. This “double-checking” concept also applies to decisions and calculations made by GPT-4 itself. While GPT-4 may sometimes hallucinate or miscalculate, a second instance of GPT-4 with a “clean” context is surprisingly effective in spotting those errors.

Could you list the types of prompts/questions you would classify as “low risk” and “high risk”?

Using GPT-4 as a “second set of eyes” on work done by either humans or AI is likely to emerge as an excellent practice that may reduce errors in medicine.

This is an example of how AI can be an effective “copilot” for doctors and nurses. GPT-4 can also be an effective tool in medical education. For example, a medical student studying for board exams can ask GPT-4 to play-act the role of a patient described in a medical vignette, thereby allowing the student to practice. GPT-4 can give detailed feedback to the student on how well the patient was handled. This is another example of a low-risk use.

Medical practice today is full of mundane and yet time-consuming administrative paperwork, including activities like writing justifications for prior authorization. To address this, another low-risk area for GPT-4 is in helping automate such “back office” medical administrative and clerical tasks. Similarly, patient-facing communications, such as after-visit summaries, prescription orders, and lab orders, is another promising area; however, the risk is higher because errors can be misleading.

As one gets into actual diagnosis and development of treatment, these are the highest risk areas, and the areas most in need of regulatory oversight. While we assess that GPT-4 performs exceptionally well on medical knowledge questions, the development of clinical decisions, such as differential diagnoses or therapeutic options, is something that—at the current state of AI technology—we strongly recommend to be done by humans, possibly assisted by AI like GPT-4.

You spent much time chatting with GPT-4 about healthcare-related issues. What impressed you the most? 

The two things that impressed me the most were (1) the ability of GPT-4 to assist in the diagnosis of extremely rare diseases, making use of a sophisticated understanding of genetics, and (2) the ability of GPT-4 to communicate with both patients and clinicians with a great deal of empathy. GPT-4 can suggest empathetic points for a doctor or nurse to communicate to (sometimes desperate) patients. And beyond that, it can express a concern for the emotions and psychological well-being of doctors and nurses themselves, as they do their work.

The prologue features a fictional case of second-year medical resident Kristen Chan. GPT-4 not only helps her save a patient’s life but also generates a workout plan, writes a letter to an insurer, or reviews patient charts. What needs to happen for generative AI to enter hospitals and gain the trust of healthcare professionals?

Governance, evaluation of effectiveness and safety, and compliance are key.

The medical community needs to play the leading role in determining how AI should be used. We believe strongly that the medical community needs to “own” the decisions about whether, where, when, and how technologies like GPT-4 should or shouldn’t be used.

We are hoping to help the medical community do that by providing as much information as possible. Just as the Hippocratic oath and laws govern practices today, there likely needs to be guidelines and training for AI applications. To facilitate this, we are supporting major efforts at institutions, such as the National Academy of Medicine, to develop “AI codes of conduct” for the medical community.

You claim there’s never been technology like this. How will GPT-4 and subsequent models change healthcare?

The most important thing to understand is that this isn’t just a single-point disruption.

Yes, GPT-4 will likely have a major impact on healthcare. But in the near future, there will likely be the emergence of much more capable models and a greater diversity of models. In that sense, what GPT-4 really represents is a phase change.

In the early going, we see AI playing a key role in helping healthcare companies and workers address the administrative demands that come with providing high-quality care:

  • Providers must navigate complicated coding and billing requirements. And, they have to manage the cognitive burden of accurately recording and recalling a lot of patient data;
  • Workforce burnout in the healthcare industry impacts clinicians and it can affect patients’ access to care, quality of care and ultimately outcomes.

In the longer run, I’m optimistic that AI tools will create new opportunities and make a positive difference in people’s lives. This includes:

(1) Reducing clinician workloads to relieve workforce burnout:

  • By taking on administrative tasks, AI can help reduce the amount of time clinicians spend on paperwork, easing cognitive burdens and reducing costs;
  • We are already seeing some real-world deployments of GPT-4 in products for this, from Nuance, Epic, and others.

(2) Analyzing data and delivering insights to improve patient outcomes:

  • AI can analyze vast amounts of patient data, deliver workflow automation, facilitate reporting and communication, and provide AI insights that support more informed decision-making, planning and treatment for surgeons, radiologists, and other clinicians;
  • By analyzing data to uncover findings, AI can help simplify patient and physician communication and provide comprehensive care plan tracking.

(3) Helping providers and frontline staff deliver personalized care and enhanced patient engagement:

  • Automated clinical documentation solutions can give back time to clinicians, allowing them to see more patients.
  • Our data shows that with the help of AI technologies, physicians add five appointments per average clinic day – enabling clinicians to provide their best care to more people.
  • By spending less time on administrative tasks, AI helps providers spend more time on patients, strengthening the human interaction in medicine.

(4) Empowering consumers around the world with greater and much more equitable access to healthcare information, and assisting them in navigating the sometimes overly complex healthcare system.


Can I ask you a favour?

Please donate to aboutDigitalHealth.com (€1+) and support independent journalism. Thank you for your support!

€1.00

Leave a comment