LeDuc's Little LLM List

Venn diagram showing that Discriminant AI is separate from Generative AI. Further, LLMs are a subset of Generative AI and Machine Learning is mostly Discriminant, but overlaps a little with Generative AI..


The banner I used for this list is a handy figure we use at my day-job to show the relationship between different AI technologies. The key features are that Generative AI is different from Discriminant AI and Machine Learning, and that Large Language Models are just one type of Generative AI.


TL;DR. This is a list of information about how the use of LLMs and related Generative AI methods underperforms relative to traditional approached.


It is 2025 and you don't need me to tell you that Generative AI just hit the peak of the Hype Cycle. As it starts descending, I figured I'd keep a list of useful references about how it is overhyped. 

I should have listed things by date. My bad.

November 2025

Nature has something to say...

Foundation models fail to differentiate opinion from fact, see this paywalled paper.

"In particular, all models tested systematically fail to acknowledge first-person false beliefs, with GPT-4o dropping from 98.2% to 64.4% accuracy and DeepSeek R1 plummeting from over 90% to 14.4%."

Metrics of Utility

A technical paper accidently shows the limits of current foundation models. What I like about this paper is if you look at Figure 1 you can see that there has been next to no improvement in the new foundation models over the last year. 

Don't replace Physicians with LLMs

Modern LLMs are not doing well at diagnosing patients--and given that the last paper showed that they aren't getting better, I wouldn't give up on Med School just yet. 

October 2025

AI Makes us think we are smarter than we are

No surprise here, but if you are on the first peak of the old Dunning Kruger curve (which is missed named) then Foundation AI makes you think you are even better... See this link and here is the peer-reviewed source.

AI for Coding is Overhyped

Here is a new report about AI for coding being overhyped. AI for Coding. Nothing really new, just that LLMs do not increase the performance of software engineers.

Rich's comment. The reason is obvious for anyone who has tried to use LLMs. They spit out code that almost works, but it turns out it takes longer to find the errors in randomly assembled code than it does to write the code in the first place.

Given that Generative AI is no longer improving in its performance, it stands to reason that this is not going to change anytime soon.

Before October 2025

LLM Inherent Limitations

A nice monograph on LLM hallucinations


Generative AI Underperforms

The BBC reports that LLMs make for wacky fast-food experiences.

Blog coverage of a paper showing that LLMs fail on 70% of office tasks.

Here is a broad "AI-is-hard to make useful" piece, here, from The Conversation.

In gene prediction, there is a nice Nature Methods paper from our friends at Heidelberg.

Chain-of-Thought Reasoning in LLMs is over-hyped.

Model Evaluation & Threat Research (METR): Using LLMs to help you code, slows you down. Here is an on-line link. Here is a PDF.

Media Coverage: AI Agents wrong about 70% of the time, here.

Jason Sanford: A journalist writes about GenAI underperforming, here.

Chinese Call Center Problems, here.


Case Studies

Survey showing software engineers pulling back from LLMs to write code, here. This matches what I'm seeing at my work. People use LLMs to help point them towards the right answer, but not to create the code. "Hey LLM, how do I iterate over the records in a data frame in R" rather than, "LLM write code to take each row in an R data frame and do this other calculation." In the first example, you ask the LLM one specific question and it is likely to return a working solution--or something close. In the second, it need to find two working solutions and put them together. And since there is nothing in their training that does this, they fail, and you spend hours trying to figure out what is wrong.

Korea is moving away from LLM-based textbooks.

Vibe coding issues, here.


Economics

Tom's Hardware reports on an economist who is concerned that we are heading into a Tech Company bubble larger than the old dot-com. It can be found here

I've been trying to find something like a peer-reviewed metric for measuring LLM performance, but everything that I find appears to be written by marketing groups of major corporations. If I find any, I'll add them here.


Bonus LLM in Peer-Reviewed Literature

Scientists have been slowly using more and more LLM to write peer-reviewed papers. With the most coming from the Computer Sciences.

-- And Bonus Rigor and Reproducibility 

Fake peer-reviewed papers are a thing.


Bonus Quantum Computing

In my world the people who over sell Generative AI usually talk about Quantum Computing. Which I find funny, and a bit sad. 

I felt I had to keep track of this story from the Register.


If you know of any that I could add to my list (or if you have other suggestions), please feel free to comment below.



No comments:

Post a Comment

Most Recent

Traveller Blog #1. Traveller before the Spinward Marches.

  So I pulled out Traveller a little while ago and started revisiting the game and its canon setting. I suspect that I have several posts ...

Most Popular