I'm all for reasonable disagreements, but I find a lot of the current conversation around generative AI relatively unproductive. Every keynote speech at every conference I've been to this year has repeated some trite phrases that might make for a good sound bite but don't hold up to much critical consideration. Even well-meaning people making good-faith arguments sometimes unintentionally confound terms and end up saying something less clearly than they intend. I do this myself.

In an attempt to further the conversation usefully, I wanted to point out some of the phrases that people use almost axiomatically that I don't think are actually true. I'm open to having my mind changed, though, so please do reach out if you feel otherwise.

In this article, I want to talk about two phrases that I’ve heard a half dozen times on various stages in the last few months: “A model is only as good as the data it’s trained on” and “LLMs hallucinate, so they’re a non-starter for any real use-cases”.

"A model is only as good as the data it's trained on."

This is one of the phrases I find most frustrating, because it actually used to be true. Many people saying this today truly are doing their best, and are often repeating what they have heard from some legitimate, well-meaning authority; they just don't realize that the situation has changed.

A brief aside on super resolution

In a bygone era (i.e. when the iPhone 4 was the "hot new thing"), we used to believe that you couldn't take a 64x64 blurry image and turn it into a 128x128 sharp image. You could maybe sharpen some edges in the 64x64, but you couldn't create new pixels to reflect details that the original camera/sensor did not capture -- you couldn't just create information out of nothing. This is why we all found the "ENHANCE!" meme hilariously unbelievable:

There were some theoretical approaches for infilling textures and such -- Photoshop had the rubber stamp/clone tools, but you still had to be an artist to use them believably.

It's important to realize the difference -- the artist isn't "revealing" something via Photoshop that was always there and just hiding in the blurriness. The artist is just drawing a believable scene, even though it may never have existed. They're imagining how to fill in the pixels that weren't there before, and they're mostly using their experience and artistic judgment to figure out which pixels are likely and/or believable.

Well, it turns out that that's now possible to do with computers.

Models can be trained to do a similar thing surprisingly well. Here's a random one I found with < 30 seconds of Googling.

https://ge.in.tum.de/2019/03/25/new-results-from-our-super-resolution-gan/

How they work and how they're trained is itself super interesting, but not the point of this article. Suffice it to say that, in a certain light, it's not that dissimilar from how a human artist would.

Back to models and their data

My point is this: Something that was a truism in our field for decades -- so much so that filmmakers were lambasted for suggesting otherwise -- has recently stopped being true. Things we fervently and earnestly believed (with good reason) are no longer true. Well-intentioned people repeating what they heard from an expert at a conference a few years ago may not realize how rapidly the field is shifting.

Yes, many types of models are merely reflections of their data.

Yes, data quality is important, and worth paying attention to.

Yes, data biases are real, and need to be understood and addressed.

Nonetheless, many current applications of AI leverage foundation models that came from elsewhere. These models have often been trained on vast enough corpora of data to have developed some approximations of (internal, mostly uninterpretable, perhaps brittle) world models in a variety of domains. Your application of that foundation model is not necessarily constrained by how limited your dataset is -- there are reasons to believe that the model may be able to extrapolate, and there are techniques to both encourage that and to ensure that it does so correctly.

Many students only know what was in their textbooks, but as they mature and think critically, they can spot (and eventually fill) gaps in their own training.

Models used to mostly only know what they've been told, but the field has shifted.

"GenAI models can hallucinate, so while they make for fun toys, they're a non-starter for any 'real' use-cases."

When I've heard this argument, it mostly seemed to be folks looking for a reason to justify the status quo (or maybe some other plan). I worry that it's often intended as a knock-down or a cover for something else, and is mostly a reason to end the conversation rather than to engage with it. I'm not sure this is a deeply held belief by folks being intellectually rigorous, but if so, I'd love to better understand it.

These positions are often expressed by folks typing on their smartphones, connected over cellular data connections, to cloud services like LinkedIn or Twitter/X. Ironically, all three of those types of technologies are "reliable enough" even though they're built on intrinsically unreliable components.

We have a long history of building reliable systems out of unreliable components. This is not news to any engineering leader.

It was one of the key innovations in cloud computing. We realized we could build high-quality, high-availability infrastructure out of a lot of essentially commodity hardware, if you built the right software layers on top. Systemic reliability does not require that each individual component be amazingly reliable; it requires that you build in the redundancy and monitoring and architecture to be robust to individual components being unreliable.

Naturally you would still prefer to have as reliable components as you can afford, but you're now viewing that as a tradeoff: Is it easier to get reliability via more reliable components or via architectural redundancy?

We also have a long history of building reliable organizations out of unreliable, finicky humans. Again, not perfect, but we have governments and militaries and companies and mission-driven organizations that are able to collectively present a more reliable interface to the world than any of the individual humans that comprise them. (I'm going to skip over all the political commentary that you might be tempted to make in the USA in 2024.)

With thoughtfulness, creativity and rigor, we've been able to engineer high-quality systems and organizations by carefully assembling (relatively) unreliable people and components into the right structure. Beautifully fallible humans have been sufficient to serve every role in every industry so far (with appropriate training and careful checks and balances). We've been happy to monitor/evaluate their performance and add the right checks and balances to achieve the right outcomes.

Why do we doubt our ability to do that with this new technology?

It may have novel forms of unreliability, and we may still be discovering the best ways to design reliability around it, but IMHO it's much more productive to talk about those things than to use this as an excuse to disengage entirely.

____

I’m always open to being convinced otherwise, so please do let me know if you feel strongly on either of these two. I have half a dozen more of these I want to write about, so if this is the kind of thing you Have Informed Opinions About, I’d love to share a draft of the next article with you.

– Ankur Kalra, Founder & CEO @ Hop

Insights

"A model is only as good as the data it's trained on."

*A brief aside on super resolution*

Back to models and their data

"GenAI models can hallucinate, so while they make for fun toys, they're a non-starter for any 'real' use-cases."

A brief aside on super resolution