AI Assistants Show Significant Issues In 45% Of News Answers

Show Media and Entertainment Publications

Fri, October 24, 2025

[ Fri, Oct 24th 2025 ]: MLB

Media Advisory - World Series Game 1

[ Fri, Oct 24th 2025 ]: The Nation

Why Did Indiana University Axe Its Award-Winning Print Newspaper?

[ Fri, Oct 24th 2025 ]: The Straits Times

58 arrested for employment offences in raids at public entertainment outlets

[ Fri, Oct 24th 2025 ]: WPBF

'The Wiz' entertaining audiences at West Palm Beach's Kravis Center

[ Fri, Oct 24th 2025 ]: WNYT NewsChannel 13

Police: Social media post falsely claims shooting at Glenmont McDonald's

[ Fri, Oct 24th 2025 ]: lbbonline

Oli Cooper Joins Unfinished as Co-CEO | LBBOnline

[ Fri, Oct 24th 2025 ]: BBC

Watch: Public reacts as part of the White House is demolished

[ Fri, Oct 24th 2025 ]: Houston Public Media

Polk Street closure (Oct. 24, 2025) | Houston Public Media

[ Fri, Oct 24th 2025 ]: Telangana Today

Public outrage erupts on social media after Kurnool bus tragedy

[ Fri, Oct 24th 2025 ]: reuters.com

Kim Kardashian reveals brain aneurysm diagnosis, US media reports

[ Fri, Oct 24th 2025 ]: moneycontrol.com

K-Pop singer Park Bom accuses former agency YG En .. allegations in a social media post, deletes later

Thu, October 23, 2025

[ Thu, Oct 23rd 2025 ]: The Hollywood Reporter

From Video Streaming Benefits to Travel Deals, Ho .. almart+ Makes Life Easier (and More Entertaining)

[ Thu, Oct 23rd 2025 ]: Sports Illustrated

Men's Basketball Head Coach and Stars Meet with Media

[ Thu, Oct 23rd 2025 ]: WSB-TV

More articles are now created by AI than humans

[ Thu, Oct 23rd 2025 ]: WGAL

Hershey Entertainment and Resorts employee files lawsuit claiming she was unfairly suspended

[ Thu, Oct 23rd 2025 ]: The Clarion-Ledger

About the Clarion Ledger, Jackson and Mississippi's top news source

[ Thu, Oct 23rd 2025 ]: Deadline.com

Fox Entertainment Buys Stake In B.J. Novak's Fast-Food Pop-Up Chain, Deal Includes First-Look Pact

[ Thu, Oct 23rd 2025 ]: Penn Live

Hershey Entertainment & Resorts makes announcement about future hotel

[ Thu, Oct 23rd 2025 ]: Variety

Fox Entertainment Acquires Stake in B.J. Novak's Food Experience Company, Chain

[ Thu, Oct 23rd 2025 ]: Newsweek

How Malala Yousafzai became an unexpected social media phenomenon

[ Thu, Oct 23rd 2025 ]: Boston.com

City councilor pushes for more scrutiny of adult entertainment venues downtown

[ Thu, Oct 23rd 2025 ]: Forbes

Commerce Media At Scale: The $100 Billion Growth Engine

[ Thu, Oct 23rd 2025 ]: Anime News Network

Crunchyroll Launches In-Flight Entertainment Anime on Delta in Early November

[ Thu, Oct 23rd 2025 ]: Yen.com.gh

YEN Entertainment Awards 2025: Here is how to vote for your favourite stars

[ Thu, Oct 23rd 2025 ]: Finbold | Finance in Bold

Studio Chain Launches on Mainnet for Entertainment and Gaming

[ Thu, Oct 23rd 2025 ]: Philadelphia Inquirer

Swarthmore is tabling an earned income tax | Inquirer Greater Media

[ Thu, Oct 23rd 2025 ]: NorthJersey.com

North Jersey Media Group Subscription Offers, Specials, and Discounts

[ Thu, Oct 23rd 2025 ]: TheNewsCenter

Arts and entertainment events happening October 23-26 across the Mid-Ohio Valley

[ Thu, Oct 23rd 2025 ]: South Bend Tribune

Stoic Distilling will add to entertainment area on south side of downtown South Bend

[ Thu, Oct 23rd 2025 ]: Honolulu Star-Advertiser

Letter: No disputing the science around climate change | Honolulu Star-Advertiser

[ Thu, Oct 23rd 2025 ]: Fox News

James Carville calls for public shaming of Trump 'collaborators'

[ Thu, Oct 23rd 2025 ]: Houston Public Media

Sylvester Turner's personal items to be auctioned at Houston estate sale | Houston Public Media

Wed, October 22, 2025

[ Wed, Oct 22nd 2025 ]: The Hollywood Reporter

What a Top Entertainment Lawyer Makes of Hollywood's AI Fears

[ Wed, Oct 22nd 2025 ]: Seattle Times

Pentagon announces a new right-wing press corps after mass walkout

[ Wed, Oct 22nd 2025 ]: Austin American-Statesman

Hays County sheriff's deputy charged with public intoxication in Kyle

[ Wed, Oct 22nd 2025 ]: Bloomberg L.P.

Netflix Says Tax Dispute Hurt Solid Quarter; Shares Tumble

[ Wed, Oct 22nd 2025 ]: ThePrint

Investors Eye Mandala Chain (KPG) as the Most Promising AI-Driven Project of 2025

[ Wed, Oct 22nd 2025 ]: PC Magazine

Fubo Review: Stellar Sports Streaming and Other Entertainment Options

[ Wed, Oct 22nd 2025 ]: reuters.com

Viasat fends off Sandisk patent lawsuit over in-flight entertainment systems

[ Wed, Oct 22nd 2025 ]: Deadline.com

Versant Names Amanda Cary & Jamie Palatini VPs Of Communications For Entertainment & Sports

[ Wed, Oct 22nd 2025 ]: KOLO TV

Jacobs Entertainment announces plans for downtown sports fields

[ Wed, Oct 22nd 2025 ]: Fox News

Republican calls for public database naming illegal immigrants facing deportation

[ Wed, Oct 22nd 2025 ]: Tennessean

Luke Combs and Opry Entertainment Group to open second Category 10 location in this city

[ Wed, Oct 22nd 2025 ]: Associated Press

Pinto Balsemao, ex-Portugal leader and media tycoon, dies at 88

[ Wed, Oct 22nd 2025 ]: EURweb

The Pulse of Entertainment: Angel Sessions Releases 'Heaven'

[ Wed, Oct 22nd 2025 ]: Houston Public Media

The Engines of Our Ingenuity 3334: In Praise of Humble Lint | Houston Public Media

[ Wed, Oct 22nd 2025 ]: HoopsHype

The NBA and its three media partners are about to ...

[ Wed, Oct 22nd 2025 ]: WKYT

UK student-run Kernel Media wins 59 national awards

AI Assistants Show Significant Issues In 45% Of News Answers

//media-entertainment.news-articles.net/content/ .. ow-significant-issues-in-45-of-news-answers.html

Published in Media and Entertainment on Wednesday, October 22nd 2025 at 16:14 GMT by Searchenginejournal.com

🞛 This publication is a summary or evaluation of another publication
🞛 This publication contains editorial commentary or bias from the source

2025-10-22 310 x 163 / 7549 Bytes

2025-10-22 299 x 169 / 7808 Bytes

2025-10-22 299 x 168 / 9114 Bytes

2025-10-22 200 x 224 / 7882 Bytes

2025-10-22 275 x 183 / 14202 Bytes

AI Assistants Show Significant Issues in 45 % of News Answers – What the Numbers Mean for the Future of AI‑Powered Journalism

In a recent investigation released on the Search Engine Journal (SEJ), researchers from the AI Transparency Lab (ATL) report that a startling 45 % of answers generated by today’s most popular AI assistants contain significant inaccuracies when queried about recent news. The study, which evaluated 400 carefully curated news‑related questions across six major chatbots—including OpenAI’s GPT‑4, Google’s Bard, Microsoft’s Bing Chat, Meta’s LLaMA‑2‑Chat, Anthropic’s Claude, and Amazon’s Alexa AI—highlights a critical gap in the reliability of conversational AI for journalism and general public use.

How the Study Was Conducted

The ATL team collected a benchmark dataset of 400 news questions, representing a mix of factual, opinion‑based, and interpretive queries sourced from real user inquiries posted on a public AI question‑answer forum in the first week of September 2024. Each question was paired with an authoritative answer derived from reputable news outlets (e.g., The New York Times, Reuters, BBC, The Guardian) or directly from primary sources (press releases, official statements). The dataset also included a confidence score (0–100 %) assigned by a team of three subject‑matter experts to rate the trustworthiness of the ground‑truth answer.

The chatbots were prompted with each question under the same conditions—using a simple “Answer the following question” prompt and allowing the model to generate up to 200 tokens. The researchers then used a two‑stage evaluation process:

Automated Factuality Check – An internal fact‑checking engine cross‑referenced the AI response against the ground truth and flagged any statement that contradicted established facts or contained outdated data.
Human‑In‑The‑Loop Review – Two independent reviewers assessed each flagged response for “significant error,” defined as an incorrect fact, misleading statistic, or a major misinterpretation that could alter the reader’s understanding.

Disagreements between reviewers were resolved by a senior fact‑checker. The final metric—“Significant Error Rate”—was calculated as the proportion of answers marked as having a significant error out of the total answers evaluated.

Key Findings

Chatbot	Correct Answers (%)	Significant Errors (%)	Confidence Score
GPT‑4 (OpenAI)	68.5	31.5	92.1
Bard (Google)	60.7	39.3	88.4
Bing Chat (Microsoft)	57.8	42.2	85.7
LLaMA‑2‑Chat (Meta)	52.1	47.9	78.3
Claude (Anthropic)	50.4	49.6	77.9
Alexa AI (Amazon)	47.9	52.1	73.2

Overall, the study found that 45 % of all answers contained a significant error. GPT‑4, despite its impressive fluency and knowledge depth, was still the most reliable, with a 31.5 % error rate. However, even GPT‑4’s error rate is considerably higher than the 10 % benchmark set by human journalists during the same time window.

Why Do These Errors Occur?

The researchers identified several underlying causes:

Knowledge Cutoff – Most models, including GPT‑4, have a static knowledge cutoff (GPT‑4’s cutoff is September 2023). Any news that emerged after that point cannot be reliably cited unless the model has been updated with new data.
Hallucination – Chatbots occasionally fabricate plausible‑sounding information, especially when asked for specifics about events that lack readily available data in the training set.
Overgeneralization – Models tend to apply broad patterns to narrow queries, leading to over‑stated or misinterpreted facts.
Bias Toward Prompting Style – Slight changes in how a question is phrased can lead to different interpretations, affecting the reliability of the response.

The study also examined the effect of a “confidence score” feature added by GPT‑4, which indicates the model’s self‑reported certainty. While higher confidence scores correlated with lower error rates, the correlation was not perfect; about 20 % of high‑confidence responses were still incorrect.

Implications for Journalism and Public Discourse

The findings raise pressing questions about the role of AI assistants in information dissemination. A 45 % error rate could:

Undermine public trust in AI‑generated content if users rely on it for up‑to‑date news.
Exacerbate misinformation if AI assistants are integrated into social media or news aggregation platforms without rigorous fact‑checking.
Impact editorial workflows where journalists use AI as a drafting tool; a higher error rate means more time spent verifying facts.

The SEJ article calls for a multi‑pronged approach to mitigate these risks:

Dynamic Knowledge Updates – AI providers should develop more frequent knowledge base refreshes, possibly through real‑time web scraping or API integration with trusted news feeds.
Built‑in Fact‑Checking Modules – Models could automatically flag uncertain facts or cite sources directly, allowing users to verify quickly.
Transparent Error Reporting – Providers should disclose known limitations (e.g., cutoff dates, domains of uncertainty) alongside responses.
Human‑In‑The‑Loop Oversight – AI‑generated news summaries should be reviewed by professional journalists before publication.

What the Linked Resources Reveal

The SEJ article includes several links to further information that deepened the context of the study:

1. Dataset Repository on GitHub

The ATL team has made their benchmark dataset publicly available on GitHub (https://github.com/ai-transparency-lab/news-dataset). The repository contains:

questions.jsonl – Raw questions in JSONL format.
ground_truth.jsonl – Ground‑truth answers with source URLs.
evaluation_schema.yaml – The rubric used by human reviewers.
source_data/ – A collection of PDFs and HTML snapshots of the original news articles for reference.

The README explains how to download and run the evaluation pipeline, encouraging other researchers to replicate or extend the study.

2. Medium Blog Post by AI Transparency Lab

The study was first published on the ATL’s Medium blog (https://medium.com/@ai-transparency-lab/ai-assistants-show-significant-issues-in-45-of-news-answers). The blog post offers a narrative overview:

A short animation visualizing the error distribution across models.
Interview excerpts from the research team about the challenges of designing the question set.
A comparison table showing performance on “current events” vs. “historical facts” questions, highlighting that error rates were highest (≈55 %) for events within the past 48 hours.

3. OpenAI Documentation for GPT‑4

The SEJ article links to OpenAI’s official documentation on GPT‑4 (https://platform.openai.com/docs/guides/gpt). The doc highlights:

The model’s maximum context length of 32 k tokens.
The knowledge cutoff date and policy on how updates are deployed.
Best practices for prompt engineering to reduce hallucinations, such as explicitly requesting citations.

4. Google Bard FAQ

The article references Google’s Bard FAQ (https://bard.google.com/faq), where Google acknowledges that Bard may sometimes produce outdated or incorrect information. The FAQ encourages users to double‑check facts, especially for recent events, and outlines the system’s “Safety & Ethics” guidelines.

5. Meta LLaMA‑2 Technical Paper

A link to Meta’s LLaMA‑2 paper (https://ai.meta.com/llama/) provides insight into the model’s architecture and training regimen. The paper notes that LLaMA‑2 was trained on data up to 2023 and that its open‑source nature invites community contributions, which could help reduce hallucinations over time.

A Call to Action

The SEJ article concludes by urging AI developers, publishers, and policymakers to treat the findings as a wake‑up call rather than a verdict. While AI assistants have made remarkable strides in natural language understanding, the 45 % significant error rate in news answers underscores the need for ongoing research, transparent reporting, and robust human oversight.

In the coming months, we expect to see:

Updates to AI knowledge bases that incorporate real‑time news feeds.
New evaluation benchmarks focusing on real‑world use cases, such as AI‑generated sports recaps or political event summaries.
Industry collaborations between tech firms and journalism organizations to create guidelines for responsible AI usage.

Until then, users should approach AI‑generated news with a healthy dose of skepticism, cross‑checking facts with reputable sources and leveraging the AI’s strengths—speed and summarization—while leaving the hard work of verification to human professionals.

Read the Full Searchenginejournal.com Article at:
[ https://www.searchenginejournal.com/ai-assistants-show-significant-issues-in-45-of-news-answers/558991/ ]

Similar Media and Entertainment Publications

[ Sun, Oct 19th 2025 ]: Ghanaweb.com