LeDuc's Little LLM List

Venn diagram showing that Discriminant AI is separate from Generative AI. Further, LLMs are a subset of Generative AI and Machine Learning is mostly Discriminant, but overlaps a little with Generative AI..


The banner I used for this list is a handy figure we use at my day-job to show the relationship between different AI technologies. The key features are that Generative AI is different from Discriminant AI and Machine Learning, and that Large Language Models are just one type of Generative AI.


TL;DR. This is a list of information about how the use of LLMs and related Generative AI methods underperforms relative to traditional approached.


It is 2025 and you don't need me to tell you that Generative AI just hit the peak of the Hype Cycle. As it starts descending, I figured I'd keep a list of useful references about how it is overhyped. 

LLM Inherent Limitations

A nice monograph on LLM hallucinations


Generative AI Underperforms

The BBC reports that LLMs make for wacky fast-food experiences.

Blog coverage of a paper showing that LLMs fail on 70% of office tasks.

Here is a broad "AI-is-hard to make useful" piece, here, from The Conversation.

In gene prediction, there is a nice Nature Methods paper from our friends at Heidelberg.

Chain-of-Thought Reasoning in LLMs is over-hyped.

Model Evaluation & Threat Research (METR): Using LLMs to help you code, slows you down. Here is an on-line link. Here is a PDF.

Media Coverage: AI Agents wrong about 70% of the time, here.

Jason Sanford: A journalist writes about GenAI underperforming, here.

Chinese Call Center Problems, here.


Case Studies

Survey showing software engineers pulling back from LLMs to write code, here. This matches what I'm seeing at my work. People use LLMs to help point them towards the right answer, but not to create the code. "Hey LLM, how do I iterate over the records in a data frame in R" rather than, "LLM write code to take each row in an R data frame and do this other calculation." In the first example, you ask the LLM one specific question and it is likely to return a working solution--or something close. In the second, it need to find two working solutions and put them together. And since there is nothing in their training that does this, they fail, and you spend hours trying to figure out what is wrong.

Korea is moving away from LLM-based textbooks.

Vibe coding issues, here.


Economics

Tom's Hardware reports on an economist who is concerned that we are heading into a Tech Company bubble larger than the old dot-com. It can be found here

I've been trying to find something like a peer-reviewed metric for measuring LLM performance, but everything that I find appears to be written by marketing groups of major corporations. If I find any, I'll add them here.


Bonus LLM in Peer-Reviewed Literature

Scientists have been slowly using more and more LLM to write peer-reviewed papers. With the most coming from the Computer Sciences.

-- And Bonus Rigor and Reproducibility 

Fake peer-reviewed papers are a thing.


Bonus Quantum Computing

In my world the people who over sell Generative AI usually talk about Quantum Computing. Which I find funny, and a bit sad. 

I felt I had to keep track of this story from the Register.


If you know of any that I could add to my list (or if you have other suggestions), please feel free to comment below.



No comments:

Post a Comment

Most Recent

Ready Gamer One, the Other Old School Gaming …

TL;DR. I’ve already mentioned that when I was a kid, we were not really fond of corporate backed game design, but as I think about it, ther...

Most Popular