AI models are trained on large datasets to make predictions, but they can sometimes “hallucinate,” meaning they produce false or inaccurate statements. This often happens due to incomplete or biased training data.
Jesse Kommandeur, a strategic analyst at the Hague Centre for Strategic Studies, likens it to baking a cake without a full recipe—where you rely on past experiences to make educated guesses. Sometimes the result is good, but other times it fails.
“The AI is essentially trying to ‘bake’ the final output (like a text or decision) using the incomplete ‘recipes’ it has learned,” Kommandeur explained in an email.
There have been several high-profile instances where AI chatbots have given false or misleading answers. For example, some lawyers submitted fake legal cases after using AI models, and earlier this year, Google’s AI summaries were found to provide inaccurate information.
A 2023 analysis by the company Vectara revealed that AI models hallucinated between 3% and 27% of the time, depending on the tool. Additionally, the non-profit Democracy Reporting International warned ahead of the European elections that none of the most popular chatbots delivered “reliably trustworthy” answers to election-related questions Microsoft’s AI.
Could this new tool address the issue of hallucinations?
“Generative AI doesn’t truly reflect, plan, or think; it simply responds sequentially to inputs. We’ve seen the limitations of this approach,” said Vasant Dhar, a professor at New York University’s Stern School of Business and Center for Data Science.
“While it’s possible that the new correction capability will reduce hallucinations, it’s virtually impossible to eliminate them completely with the current architecture,” he noted. Ideally, Dhar said, a company would aim to claim a reduction in a specific percentage of hallucinations.
“That would necessitate a substantial amount of data on known hallucinations, along with testing to determine if this prompt engineering method effectively reduces them Microsoft’s AI. That’s quite a demanding task, which is why they haven’t made any quantitative claims about how much it reduces hallucinations,” he said.
Kommandeur reviewed a paper that Microsoft confirmed was published regarding the correction feature. He noted that while it “appears promising and employs a methodology I haven’t seen before, it’s likely that the technology is still evolving and may have its limitations Microsoft’s AI.”
Gradual Enhancements
Microsoft acknowledges that hallucinations in AI models have hindered their use in critical fields like medicine and their overall deployment.
“All these technologies, including Google Search, are in a phase of continuous incremental improvement once the core product is established,” said Dhar.
“In the long run, I believe that investing in AI could become problematic if the models continue to hallucinate, as these errors may lead to misinformation and flawed decision-making,” noted Kommandeur.
“However, in the short term, I think large language models (LLMs) provide significant value in daily life, enhancing efficiency, which makes us somewhat overlook the issue of hallucinations,” he added.