Free ChatGPT beats doctors on health questions: what changed

Last week, 260 physicians from 60 countries evaluated medical answers from the free ChatGPT model. Their verdict: the AI was more accurate, more complete, and communicated better than the answers written by fellow doctors. That isn't a future prediction. That's what happened when OpenAI updated GPT-5.5 Instant on June 18, the model every free ChatGPT user gets by default.

The update touches 230 million people who ask ChatGPT a health question every week.

What exactly changed?

GPT-5.5 Instant now performs on par with OpenAI's most expensive paid models on health questions. Factual errors dropped by 71% over two months, measured across billions of messages per week.

Four concrete improvements drove that number. The model now recognizes earlier when you need a doctor, for example when your symptoms might indicate a heart attack or stroke. It asks clarifying questions before suggesting anything, instead of jumping to conclusions from a partial description. It tells you what it doesn't know, so you understand when to consult a professional. And it explains complex medical information in plain language without oversimplifying it.

For context: 230 million people ask ChatGPT a health question every week. That's more than the combined populations of France and Germany. And all of them now get a free model that matches what paying subscribers get for medical questions. Worth noting: that 230 million is roughly one in four of ChatGPT's total weekly active users, which now exceeds 900 million globally according to TheAIDaily's ChatGPT usage data.

How was it tested?

OpenAI maintains a network of more than 260 physicians across 60 countries. Together they evaluated over 700,000 model responses during development. For the final comparison, the team selected 3,500 representative health conversations.

Here's the thing about that comparison. A physician panel placed AI answers next to answers written by doctors, without knowing which was which. On three criteria, accuracy, communication, and completeness, they rated the AI higher.

GPT-5.5 Instant missed fewer local healthcare context cues, flagged more safety signals, and followed up more often before giving an answer. Think of it like a second-year resident who has read every case in the textbook: narrow on real-world experience, but very good at not skipping steps. Someone who mentions a headache gets asked about neck stiffness and light sensitivity before the model forms any response. That approach beat not only older models, but also the human physicians in the comparison.

OpenAI measured this using HealthBench Professional, a benchmark developed with their physician network. On that benchmark, GPT-5.5 Instant scores comparably to the company's most powerful reasoning models, models that require ten times the compute to run.

What are the limits of this test?

The test comes from OpenAI. HealthBench was developed by OpenAI, with their own physician network. There is no independent, peer-reviewed validation of the specific 71% figure.

Think of it like a restaurant review written by the chef. The food may genuinely be excellent, but you still want to hear from someone who didn't cook it. Independent research published in NEJM AI, the AI journal of the New England Journal of Medicine, confirms that AI models are improving measurably on medical tasks. But OpenAI's specific claims about the free version remain self-validated for now.

That said, the scale of the evaluation is real: 260 physicians, 700,000 rated responses, 60 countries. That isn't a quick internal check. It's a serious evaluation process, even if OpenAI sets the parameters.

There's also the practical gap. ChatGPT is not a medical device. OpenAI positions GPT-5.5 Instant explicitly as support for physicians, not a replacement. A model can outperform doctors in a controlled benchmark and still miss what your GP knows from memory: your medication history, your chronic conditions, how you looked when you walked in.

AI identified 18 rare diagnoses that doctors had missed

On the same day, OpenAI published research conducted with Boston Children's Hospital and Harvard. Researchers gave 376 anonymized pediatric genomes, cases where no diagnosis had previously been found, to OpenAI's o3 Deep Research model. Result: 18 new diagnoses of rare genetic conditions that human specialists had not identified.

The conditions ranged from rare neurological disorders to unexplained sudden death in children. Each finding was independently confirmed by at least two clinical geneticists and a CLIA-certified laboratory. The research appeared in NEJM AI, one of the most respected medical journals in the world.

Boston Children's Hospital has been working with OpenAI since early 2025. The program has produced more than 40 diagnoses of rare diseases previously considered unsolvable. The collaboration received $50 million in funding.

This was not the first time AI outperformed physicians in a direct comparison. A Harvard study published earlier this year found that an AI model on the emergency department reached the correct diagnosis in 67% of cases, while two physicians scored 55% and 50%. The pattern is difficult to dismiss: AI is measurably improving at medical diagnosis, not as a replacement for doctors, but as a second pair of eyes that catches patterns a human can miss.

What this means in a world with too few doctors

Across Europe, an estimated 1.8 million healthcare workers are projected to be missing by 2030, according to the European Commission. GP shortages are a documented problem in Germany, rural France, much of Eastern Europe, and the UK, where NHS waiting lists topped 7.5 million in early 2026.

In that context, it makes sense that people turn to ChatGPT for health questions. TheAIDaily's AI in healthcare statistics show that 93% of hospitals worldwide already deploy AI in some form, and adoption in healthcare is growing eight times faster than in other sectors. The question is no longer whether people use AI for health advice. The question is how good that advice actually is.

The EU AI Act classifies AI systems used in healthcare as high-risk under Annex III, which means tools marketed as medical devices face strict conformity assessments and mandatory human oversight before deployment. ChatGPT is not marketed as a medical device. That's precisely why it sits outside the regulatory perimeter, and why understanding its limits is your responsibility as a user, not OpenAI's as a regulated provider.

The gap between free and paid is narrowing

The broader implication for teams and organizations is straightforward: ChatGPT Plus costs $20 per month per user. The free version now runs on GPT-5.5 Instant, and for health-related questions it delivers comparable results to the most powerful paid models.

That raises a practical question. If the free version keeps improving on specific tasks, when does the paid subscription still justify the cost? The honest answer depends on what you use it for. Coding, long documents, and advanced analysis still favor paid models. For everyday questions, including health, the free version is now a serious option.

A team of ten people paying for ChatGPT Plus spends $2,400 per year. Research into shadow AI use among SMBs found that nearly half of businesses share company data with free AI tools without realizing it. If your team primarily uses ChatGPT for routine questions rather than advanced analysis, you may be paying for capacity you don't actually need.

Health is not the only area where the free version is gaining ground. OpenAI has already improved GPT-5.5 Instant for math, coding, and creative writing earlier this year. The pattern is consistent: the company is investing structurally in free-tier quality, almost certainly to grow total user numbers. For users, that's straightforwardly good news. For organizations paying per license, it's a reason to review regularly whether that investment still makes sense.

The approach that works for most teams: paid licenses for power users who handle complex, high-volume tasks daily, the free version for everyone else. This GPT-5.5 Instant update strengthens that argument.

What you can do with this right now

If your employees use ChatGPT for health-related questions, the answers are meaningfully better than they were two months ago. That's genuinely good news. It doesn't change the baseline expectation: ChatGPT is not a doctor, and your team should know the difference.

Four steps worth taking this week:

Map which employees use ChatGPT and for what. Shadow AI research shows that nearly half of SMBs share company data with free AI tools without awareness. For health data, that matters more: medical information qualifies as sensitive personal data under GDPR, which means a data incident carries greater regulatory exposure than a regular data breach.
Revisit your ChatGPT licenses. If your team primarily uses it for routine questions rather than advanced analysis, the free version may now be sufficient. That's up to $240 per employee per year in potential savings.
Set a short internal guideline on AI for health advice. Not to ban it, but to set realistic expectations. A single sentence like "ChatGPT can help you understand medical information, but it doesn't replace a doctor" is enough.
If you work in healthcare or handle health data, prepare your team for patients who arrive with AI-generated questions and interpretations. That's already happening. TheAIDaily's AI in healthcare data makes the trajectory clear: this will become the norm, not the exception.