AI for the Rest of Us
Posts
The Art of Artificial: How AI Creates Images and Videos

The Art of Artificial: How AI Creates Images and Videos

Understanding the tech that's changing how we create and consume visual content

Kyser Thompson
October 25, 2024

Hello friends,

Welcome to another edition of AI for the Rest of Us. And a double welcome to our new subscribers who’ve joined over the past few weeks. Thank you for coming along for the ride.

If you haven’t yet upgraded your membership, now is the time because we’re running a promotion where you get your first two months free. Click on the button below, click “Login” at the top of the page (if you’re not already logged in), and once logged in, click on the silhouette in the top right. From the menu, click Upgrade and you should be on the right page. No code to enter, nothing to pay right now, and cancel any time before the 60 days is up.

Once you’ve upgraded, you can view this edition and all the previous ones on our website. And starting in two weeks, you’ll get the full newsletter in your inbox every other Friday.

Oh hey, I’ve got two more bonuses for you non-premium subscribers: we’re including the full In the Know section and the Audio Preview in this week’s newsletter (see below) to give you a taste of what you get by upgrading. We hope you enjoy it.

OK, on to the topic: the world of AI-generated images and videos. We’ve got everything you need to know about the technology, our take on it, and plenty of things to learn and do.

Ready to learn how AI turns prompts into pixels? Here we go...

– Kyser

P.S. Be sure to follow us on Instagram for content throughout the week.

In the Know

Stepping Back
Before we jump in, we need to take a small step back and talk (again) about some of the basics of AI and how the various sub-fields of AI fit together.

Back in Edition #2, we defined AI using this simple, no-duh definition:

Artificial Intelligence is technology that tries to mimic human intelligence.

It is technology. It is made up of data, algorithms, and computer power. It doesn’t have feelings, self-awareness, or empathy.

It tries to mimic human intelligence. Human intelligence is a wildly complicated topic, and it’s a subject beyond our complete comprehension, even for the smartest scientists in the world. So, I say it tries because it can’t mimic something we don’t fully understand.

It does this by identifying patterns and following instructions. This allows it to perform tasks that may seem intelligent but are actually based on complex calculations, predefined rules, and/or statistical analysis.

We’ve talked a lot about Large Language Models – a type of AI software that processes and generates words. These words are generated for us by analyzing enormous amounts of written material, like books and websites, to recognize patterns in language. When we ask it a question or give it a task (i.e. prompt it), it uses patterns it has observed to create its response. It is quite literally predicting the most likely next words.

How AI Image and Video Generation Works
OK that’s great. But how the heck does this work with images and videos?

Well, when it comes to creating visuals, AI is basically trying to mimic our ability to visualize and create things based on ideas or descriptions. Think of it as a digital art apprentice that’s studied millions of pieces art, especially the patterns, shapes, and visual features of the art. You tell the apprentice what you want to create, it thinks through its art library, and it creates something new based on everything it knows. And all of this happens in a matter of seconds. There’s a twist though. This apprentice can be stubborn and tends to be quirky. It takes instructions but you don’t always know what instructions it likes best. Oh and btw, this apprentice has improved in its craft exponentially over the past few years and will likely continue to improve at an insane rate. More on that below.

Before we get too technical with how these work, we need to talk about one important thing: pixels. You know when you zoom waaaay in on a digital image and it gets all blocky? Or you go right up to your TV like a three year-old and see those teeny tiny squares of color? Those are pixels. Each pixel is like a single dot of color. Put millions of these dots together, and voila, you’ve got a picture. When we talk about creating images using AI, it’s really deciding what color each of these dots should be. Keep this in the front of your head for now.

OK, let’s get technical. Here’s how these things work:

Training: Just like LLMs are trained on tons of text, image generation AI is trained on millions of images. It learns patterns about how pixels are arranged to form different objects, scenes, and styles.
Understanding Prompts: When you give the AI a prompt like, say, “a manatee riding a unicycle,” it breaks down the key elements: manatee, riding, unicycle. It then looks for patterns in its training data related to these concepts. To get really technical, it actually uses something called probability distributions to determine how different elements (like “manatee” or “riding”) are visually represented and how they should fit together in the image. 🧐
Creating the Image: The AI starts with a blank canvas (or “random noise” in tech terms) and gradually builds up the image. It’s like a hyper-speed game of Pictionary, where the AI is both the artist and the judge.
Refining: The AI keeps tweaking the image, making sure the manatee looks manatee-like, the unicycle looks rideable, and the whole scene makes visual sense according to what it’s learned.

“a manatee riding a unicycle”

That’s for still images. For video generation, I want you to imagine one of those flip books you had as a kid. Each image is slightly different, and when you thumb through it quickly, you get a moving image. The process above happens for each frame – or in flip book parlence, each page – then it’s smashed together to make a video.

Tracking with me here? OK great. Let’s get even more technical.

Getting more technical
Here’s where we need to break down a few key tech terms:

Generative Adversarial Networks (GANs): Imagine two AIs playing an endless game of art forgery. One AI (the generator) creates images, while the other (the discriminator) tries to spot the fakes. They keep at it until the generator gets so good it can consistently fool the discriminator. It’s like an artistic arms race, but instead of weapons, they’re crafting increasingly realistic images. That’s how GAN works.

Diffusion Models: These work differently. Think of it like this: You start with a blurry, noisy image (like an old TV with bad reception), and the AI learns how to gradually clear it up into a sharp picture. Then, to create new images, it runs this process in reverse – starting with random noise and step-by-step turning it into a clear image based on your prompt. You can actually see this process happening when you use some of the more popular image generators like Midjourney – which we’ll be doing in Play Time below.

Most of today’s popular AI image generators use variations of Diffusion Models. They’ve become the go-to choice because they’re great at creating high-quality, diverse images. But GANs are still around and used in various applications.

The key thing to remember is this: Just like LLMs don’t truly understand language, these AI image and video generators don’t understand images the way we do. They’re not thinking, “Ah yes, it’s super normal for a manatee to ride a unicycle and there’s nothing weird about that at all.” (FWIW, they don’t seem to get my sarcasm either.) They’re recognizing patterns and making statistical guesses about what pixels should go where to match the prompt and look “right” based on its training.

The results can be mind-blowingly realistic or hilariously off-base. You might get a perfectly photorealistic manatee on a unicycle, or you might get a unicycle with flippers. It’s part of the charm – and challenge – of AI-generated visuals.

Remember, this is a simplified explanation. The actual tech is more complex, but the basic idea is the same: pattern recognition, statistical analysis, and a whole lot of computer processing power coming together to create images and videos that can fool, delight, or confuse us humans.

What are the implications?
First thing is first ... if you haven’t read Edition #6 on Deepfakes, do that. It lays out some of the more important implications.

When it comes to originality, ownership, and copyright, these are thorny issues we’re kinda sorta figuring out. It’s literally a topic for another day because we plan to address it in a future edition. Stay tuned.

For the purposes of today and image/video generation, one of the bigger implications has to do with the future of creativity. Any creative will tell you that these image and video tools have changed everything, or at the very least will change most things. But will it replace creatives? Maybe some, but probably not most. I just don’t see how that happens, and there are many, many creatives who believe the same.

I/we believe these tools will amplify the work in ways we’re just beginning to understand. And it’s already happening – just check out the video below in AI in the Wild.

Maybe more importantly for the rest of us, I believe it democratizes creativity. Take me for example. I don’t do Photoshop and I certainly can’t create a design from scratch. Heck, my three year-old draws better stick figures than me. But I’ve used Midjourney to generate every one of the images you’ve seen in these newsletters. I’ve put together presentations with images in my slides are exclusively generated using AI. And they’re good. I frequently ask my kids to share their wackiest ideas for visuals and run it through these tools. You can’t tell me that’s not expanding my kids’ capacity for creativity. Perhaps the biggest downside is that my three year-old is 100% convinced that manatees ride unicycles – and he tells me that his next zoo visit is gonna prove it. 🤷🏻‍♂️

Let’s Learn Something

The rest of the newsletter is for paying members only. I explain on the website why we do this – just click here and scroll down to the FAQs. If you have any questions about it, feel free to reach out to me.

You can upgrade or downgrade at any point (remember, we’ve got a promotion running – your first two months are free and it’s already baked in when you upgrade), or you can stick with free until you’re ready to join. We’re just glad to have you part of the community. Check us out on Instagram for news, content, and fun stuff.

AI in the Wild

This section is for paid members only.

Consider joining!

It’s Play Time

Newcomers [AI is new to me]

This section is for paid members only.

Explorers [I’m comfortable with AI]

This section is for paid members only.

And that wraps up another edition of AI for the Rest of Us. If you created something, please share it with me and I’ll post it this weekend or next week on Instagram.

We hope you enjoyed this one, and as always, I welcome any and all feedback.

Until next time …

Reply

or to participate.