OpenAI announced the upcoming launch of DALL-E 3, the next version of its image generation AI model. This announcement has been met with some fanfare: There was a really big jump in quality between DALL-E and DALL-E 2, and what OpenAI showed off from DALL-E 3 is … better? More on this at the end. Still, this announcement got me thinking a lot about what AI is, and what AI business actually looks like — and how far OpenAI actually is from making any money.
First, AI: It’s a few different things at once. At a baseline, it’s just a series of linear algebraic equations; if you really wanted to, you could write out the entirety of ChatGPT as a single math equation, though it would be mind-bogglingly long. There’s a great paper by Stephen Wolfram that I encourage you all to read that really gets into the workings of AI, but I’ll quickly define AI as probabilistic tools, trained on large amounts of data, which can replicate the outputs of complex tasks. The important thing to take away here is that we can train AI to perform tasks that we don’t have a good theory for, like writing text or identifying a cat. A cat can’t be described using pure math, but a pure mathematical model (via AI) can identify a cat. There’s also a level of randomness in AI, especially generative AI, which is intentionally put there by the designers to encourage a diversity of results. It’s worth calling out that training AI takes a lot of human labor — getting that data into a form that AI can use is quite a manual process.
That leads nicely into the second thing that AI is: a tool to mediate human labor. This mediation primarily comes in the form of automating (replacing a human task with a machine), deskilling (making the task easier), fragmenting (breaking big complex tasks into small simple tasks), and casualizing (turning something from a skilled profession into something that can be picked up as a job). The automating and deskilling elements are most obvious with image generation tools like DALL-E 3 — the whole branding revolves around the ability for anyone to create images that would have otherwise taken great skill. But DALL-E is also fragmenting the labor pool: Companies like open AI rely on the exploitation of labor in the Global South to tag images.
Finally, AI is also potentially a source of new knowledge. I mentioned earlier that we can train AI to complete tasks without a firm, theoretical understanding of those tasks. Many scientists and academics have already demonstrated that the use of AI can lead humans to a new understanding by enabling them to first model the problem, find solutions, and then work backwards from those solutions to a useful theory. This is not particularly new; people have been doing this since at least 2016 with some of the earliest work on materials informatics, but it is very valuable for groups doing this kind of knowledge work.
The basic issue for OpenAI is that DALL-E 3 is not a product. What is the business that is going to sit on top of image generation? The most obvious conclusion would be a Shutterstock-style online image platform, but that’s not a very attractive business (especially for the most hyped-up startup of the decade). Shutterstock made about USD 850 million last year. Moreover, its cost of goods sold (predominantly, the cost of the images) was only about USD 350 million; the far-larger part of its costs came from marketing, sales, and general administration. How much has OpenAI spent on DALL-E 3? It’s a little hard to say, but ChatGPT 4 took more than USD 100 million to train. It’s hard to imagine that OpenAI spent less on DALL-E 3, especially given the data- and labor-intensive nature of image processing. Even if you expect that OpenAI can take all of Shutterstock’s business and do it much better, it’s still really not that attractive.
That’s surely not the business plan OpenAI has in mind, but the point I’m making here is that AI with splashy capabilities doesn’t necessarily have a good value proposition. AI creates value in two ways: first, from the automation of labor (the approximate 350 million that OpenAI could theoretically save on Shutterstock) and second, from knowledge generation (the value generated by accelerating R&D or pushing us in the direction of new discoveries). These image-generation platforms don’t really offer that kind of value. This lack of clear value left me skeptical that OpenAI is going to actually disrupt any real businesses any time soon.
All this assumes that the images produced by DALL-E 3 are as good as the stock images on Shutterstock. However, they are not. The DALL-E 3 announcement came with a bunch of images, which have presumably been curated — I would certainly try to put my best foot forward when launching a new product. Still, even these curated images have many of the characteristic flaws of AI-generated images — strange details and issues that a human would never get wrong. OpenAI says it has made the software much easier to use, and maybe so, but there’s nowhere near the improvement seen from DALL-E to DALL-E 2. I can only conclude that it was too difficult (or, more realistically, too expensive) to try to train the model to avoid these issues. There are supposedly smart people like Eliezer Yudkowsky calling for drone strikes on data centers because of the threat of AI improving too quickly, but the best funded AI startup in the world can’t get AI to accurately draw a phone. We’re all going to be just fine.