Or, "What Everyone should know about AI."
TL;DR. AI is not a single thing. It is an arbitrary collection of algorithms group together only by the fact that the algorithms need a lot of computational power to work. I believe some people want us to sound ignorant when we complain about what they are doing with AI.
I’m sorry, but I like “clickbait-y” titles. But I think it is true. I think there are people who are trying to make a fortune off developing and deploying artificial intelligence infrastructure, and I don’t think they want the common person, or at least the common voter, to understand certain basics.
Background. 1. I’m a scientist who works in
informatics. 2. Last week I was at a public, townhall meeting (held by a group
of rational and thoughtful people) about how the Canadian government should spend
$505 million dollars to support AI. 3. Next week I’ll be in a meeting about how
we should be teaching AI to the biomedical community. 4. Three weeks ago, I was
subjected to an interview with a candidate who’s answer to every question was, “We
should use AI”, but who was never more specific than that. 5. Since February, I’ve
spent at least one hour a day (often more) working on deploying AI on a certain
real-world problem.
AI is everywhere, and I know a little bit about it.
Problem. I read a lot of material, both
professionally and in the media, where people mis-use the term Artificial Intelligence.
They say “AI” when they should use some other term, but—frequently—they don’t
know what that term is. The reason I can’t just tell you what that term is, is
that we are using AI to refer to hundreds of different things. And some people
are using some of those hundreds of different things inappropriately, to make
money while hurting other people. I think those people want us to sound ignorant
when we complain about what they are doing.
They need us to sound like an ill-informed, conspiracy theorist,
in order for them to make their money.
I would like us not to do that.
What I tell Scientists. In the last few months, I
have read several grant proposals for researchers here in Canada who used the
term “artificial intelligence” and what I have told them is, “Read the
sentence, but replace the word “AI” with “statistics”, and ask if the sentence
still makes sense.
Examples:
1. “AI is revolutionizing this field of study, opening new
doors in hypothesis generation.”
This statement makes perfect sense. It is a broad statement
about how new methods are changing a field of study. If I thought “statistics”
was revolutionizing a field, such a sentence would make perfect sense.
2. “We will then analyze this data using AI to determine the
cause of medical condition X”
In 2025 no scientist should ever write this second
statement. If I wrote, “We will then analyze this data using statistics…” I
would sound like an idiot. What statistics? How will you analyze the data? How
will you judge the validity of the results of your approach?
I can’t tell you how many times I have written something
like, “the differential expression analysis of the proteomic data will use
hierarchical linear models with the sample replicates nested within each
biological sample and the injection replicates nested within each sample.”
I don’t need you to understand what I just wrote there; I
just need you to hear the difference between that and example #2. My comment is
full of specific details about what I am doing and how.
Aside. I find it funny that the folks I call “Tech
Bros” consider that sentence I used to always write about proteomics and
hierarchical linear models to be machine learning and therefore part of AI. (A
lot of people lumping those together, “Machine learning and AI”.) While I
consider the approach to be part of classical, frequentist, statistics, and
trace its roots back to R.A. Fisher himself.
Different Tools. AI is a collection of different
tools. So is Machine Learning and so is Statistics. Not everyone needs to
know all the different tools, but everyone needs to know that this is true.
One thing that happens is that new names are given to basically the same thing.
For example, I work a lot with something called Biologically Informed Neural
Networks, or BINNs. If I put out a new tool for using BINNs on proteomic data,
I might call it ProBINN—or whatever. Suddenly we have a new name. We have one
approach called BINNs and another called ProBINN and soon there are a dozen
different terms and no one can know them all and everyone is confused.
Bad Actors. There are bad actors who hope to profit
from confusion. I have personally met one of these. The individual straight
up lied about what they were doing and hoped no one understood the specific in
the confusion of technical jargon.
Some people consider the hierarchical linear models that I
have been using for over twenty years to be AI. I don’t, but they do. Why?
Because hierarchical linear models work. They provide solid, technically
correct answers (when used correctly) and are tried and true. By lumping them
into AI, they make the field appear to be rigorous, and thus, by extension, the
tools they want to deploy must also be rigorous.
And (I argue, they say) if you don’t understand what they
are doing, your opinion must not count. They would say, “That person doesn’t understand
technology—ignore them.”
“Steal a bunch of other people’s work and making stereotypes
of them”
Everyone needs to know these three classes of AI.
1. Large Language Models or LLM. (Read More) This is a broad class of AI tools that are the backbone of things like ChatGPT and
all those annoying “AI assistants” that all the tech bros are shoving down our
throats on practically every application we open.
2. Generative Adversarial Networks or GAN. (Read More) This is another broad category of tools which are used to steal artist’s work
and create copies (or stereotypes) from text prompts.
I have a colleague who described these two techniques as “Stealing
a bunch of other people’s work and making stereotypes of them”. These tools
require massive training data sets—the larger the better. They don’t have to be
tools of evil, but they are. It would be great to have an ethically sourced
GAN. And I’m not sure there is enough text for an ethically sourced LLM, but
that would be interesting also.
The people making the discissions on how to allocate
resources are choosing to steal people’s copyrighted work—I understand they
took 32 of my copyrighted works—and figure that if they make enough money, they
can buy off the courts and never get into trouble.
They might be right. Only time will tell.
3. Everything else. And then there are thousands of
other approaches that are called AI. Many actually solve real-world problems
when used correctly, though many do not. But they are technical issues, and I
currently see no harm that most people only learn about them when they are
either interested in them or find that they need one to solve a certain problem.
This category includes things like neural networks, gradient descent, and random
forests, and even my old friend hierarchical linear models.
What I want you to do. When you complain about the misuse
of AI, whenever possible, be specific. Say, “Students are using Large Language
Models to write their term papers.” Say,
“Generative Adversarial Networks are being used illegally to put artists out of
work,” because they are. Because when you say AI instead of the technically
correct term, the bad actors will argue that you don’t know what you are
talking about, and therefore your opinion should be dismissed.
As always, thank you for reading this and I welcome any
comments or questions below.
No comments:
Post a Comment