Saturday, May 10, 2025

Please Don’t Fall for the Tech Bro’s Trap…

 

Or, "What Everyone should know about AI."


TL;DR. AI is not a single thing. It is an arbitrary collection of algorithms group together only by the fact that the algorithms need a lot of computational power to work. I believe some people want us to sound ignorant when we complain about what they are doing with AI.

I’m sorry, but I like “clickbait-y” titles. But I think it is true. I think there are people who are trying to make a fortune off developing and deploying artificial intelligence infrastructure, and I don’t think they want the common person, or at least the common voter, to understand certain basics.

Background. 1. I’m a scientist who works in informatics. 2. Last week I was at a public, townhall meeting (held by a group of rational and thoughtful people) about how the Canadian government should spend $505 million dollars to support AI. 3. Next week I’ll be in a meeting about how we should be teaching AI to the biomedical community. 4. Three weeks ago, I was subjected to an interview with a candidate who’s answer to every question was, “We should use AI”, but who was never more specific than that. 5. Since February, I’ve spent at least one hour a day (often more) working on deploying AI on a certain real-world problem.

AI is everywhere, and I know a little bit about it.

Problem. I read a lot of material, both professionally and in the media, where people mis-use the term Artificial Intelligence. They say “AI” when they should use some other term, but—frequently—they don’t know what that term is. The reason I can’t just tell you what that term is, is that we are using AI to refer to hundreds of different things. And some people are using some of those hundreds of different things inappropriately, to make money while hurting other people. I think those people want us to sound ignorant when we complain about what they are doing.

They need us to sound like an ill-informed, conspiracy theorist, in order for them to make their money.

I would like us not to do that.

What I tell Scientists. In the last few months, I have read several grant proposals for researchers here in Canada who used the term “artificial intelligence” and what I have told them is, “Read the sentence, but replace the word “AI” with “statistics”, and ask if the sentence still makes sense.

Examples:

1. “AI is revolutionizing this field of study, opening new doors in hypothesis generation.”

This statement makes perfect sense. It is a broad statement about how new methods are changing a field of study. If I thought “statistics” was revolutionizing a field, such a sentence would make perfect sense.

2. “We will then analyze this data using AI to determine the cause of medical condition X”

In 2025 no scientist should ever write this second statement. If I wrote, “We will then analyze this data using statistics…” I would sound like an idiot. What statistics? How will you analyze the data? How will you judge the validity of the results of your approach?

I can’t tell you how many times I have written something like, “the differential expression analysis of the proteomic data will use hierarchical linear models with the sample replicates nested within each biological sample and the injection replicates nested within each sample.”

I don’t need you to understand what I just wrote there; I just need you to hear the difference between that and example #2. My comment is full of specific details about what I am doing and how.

Aside. I find it funny that the folks I call “Tech Bros” consider that sentence I used to always write about proteomics and hierarchical linear models to be machine learning and therefore part of AI. (A lot of people lumping those together, “Machine learning and AI”.) While I consider the approach to be part of classical, frequentist, statistics, and trace its roots back to R.A. Fisher himself.

Different Tools. AI is a collection of different tools. So is Machine Learning and so is Statistics. Not everyone needs to know all the different tools, but everyone needs to know that this is true. One thing that happens is that new names are given to basically the same thing. For example, I work a lot with something called Biologically Informed Neural Networks, or BINNs. If I put out a new tool for using BINNs on proteomic data, I might call it ProBINN—or whatever. Suddenly we have a new name. We have one approach called BINNs and another called ProBINN and soon there are a dozen different terms and no one can know them all and everyone is confused.

Bad Actors. There are bad actors who hope to profit from confusion. I have personally met one of these. The individual straight up lied about what they were doing and hoped no one understood the specific in the confusion of technical jargon.

Some people consider the hierarchical linear models that I have been using for over twenty years to be AI. I don’t, but they do. Why? Because hierarchical linear models work. They provide solid, technically correct answers (when used correctly) and are tried and true. By lumping them into AI, they make the field appear to be rigorous, and thus, by extension, the tools they want to deploy must also be rigorous.

And (I argue, they say) if you don’t understand what they are doing, your opinion must not count. They would say, “That person doesn’t understand technology—ignore them.”

“Steal a bunch of other people’s work and making stereotypes of them”

Everyone needs to know these three classes of AI.

1. Large Language Models or LLM. (Read More) This is a broad class of AI tools that are the backbone of things like ChatGPT and all those annoying “AI assistants” that all the tech bros are shoving down our throats on practically every application we open.

2. Generative Adversarial Networks or GAN. (Read More) This is another broad category of tools which are used to steal artist’s work and create copies (or stereotypes) from text prompts.

I have a colleague who described these two techniques as “Stealing a bunch of other people’s work and making stereotypes of them”. These tools require massive training data sets—the larger the better. They don’t have to be tools of evil, but they are. It would be great to have an ethically sourced GAN. And I’m not sure there is enough text for an ethically sourced LLM, but that would be interesting also.

The people making the discissions on how to allocate resources are choosing to steal people’s copyrighted work—I understand they took 32 of my copyrighted works—and figure that if they make enough money, they can buy off the courts and never get into trouble.

They might be right. Only time will tell.

3. Everything else. And then there are thousands of other approaches that are called AI. Many actually solve real-world problems when used correctly, though many do not. But they are technical issues, and I currently see no harm that most people only learn about them when they are either interested in them or find that they need one to solve a certain problem. This category includes things like neural networks, gradient descent, and random forests, and even my old friend hierarchical linear models.

What I want you to do. When you complain about the misuse of AI, whenever possible, be specific. Say, “Students are using Large Language Models to write their term papers.”  Say, “Generative Adversarial Networks are being used illegally to put artists out of work,” because they are. Because when you say AI instead of the technically correct term, the bad actors will argue that you don’t know what you are talking about, and therefore your opinion should be dismissed.

As always, thank you for reading this and I welcome any comments or questions below.

No comments:

Post a Comment

Most Recent

Enigma Scout is on its way

The Sphere is made of five milli on domains completely enclosing the sun — each with the area of a planet and ringed by a microgravity void....

Most Popular