Teleprompt moves to DALL-E 3
The image generation game
Last year I built a little Wordle clone, with the premise that you had to pick words from a prompt instead of letters from a word.
Essentially, the game gives you an image and you are expected to recreate the image by guessing prompt which is made up of 5 words (and maybe a small word that is given to you). Beyond that the game is mostly like Wordle where the words turn green/yellow/gray based on their correctness.
For example, the image below was created with a prompt in the form of “blank blank blank in a blank blank”
I mostly wanted to play with some image generation AI and see if I could build a functional game in one day. It ended up taking two days — mostly because of the complexity involved in integrating Stable Diffusion.
In honor of yesterday’s Wordle (yes that’s my name), I spent part of my last day of 2023 making a significant upgrade to Teleprompt.
Early Architecture
Stable Diffusion is pretty impressive and I don’t want to sound too negative about it because it is extremely powerful and flexible. It may have actually fallen into the category of “too flexible” and I ended up building more than I absolutely needed to. I probably should have looked harder for a hosted solution but when I didn’t find one quickly I decided to self-host Stable Diffusion from my house and use message passing to integrate the game itself with the image generation. It worked, but architecturally not a good idea and honestly it was a “ship it” decision that was never meant to last long.
The architecture looked roughly like this:
Beautiful Images
Beyond the architectural oddness and brittleness of self-hosting, the images were less visually appealing than, for instance, Midjourney at the time. Often, the images looked weird or simply didn’t represent the prompt very well.
Unfortunately, Midjourney didn’t create an API that would have made integration easy. I think their use of Discord as their primary (only?) interface is pretty clever and you can’t argue with their success. Their images have been excellent but their lack of an API blocked me from using them.
However, when OpenAI released DALL-E 3 and made it available via API in November, I knew I had to make the switch. The images produced were better and now I could clean up my architecture.
The dragon image above was created with DALL-E 3. It is simply visually stunning. You may want to try to create it in the game before reading more.
Nuanced Images
Perhaps more importantly for game play, there is detail and nuance to the DALL-E 3 images that wasn’t captured in previous tools.
Look at the following variations of the above dragon image with slightly wrong prompt words. DALL-E does a great job of pulling out these small differences which previously were not visible in Stable Diffusion making the game play harder.
Updated Architecture
Now that I had an API I could use to generated images I could simplify the architecture significantly.
I no longer needed self-host Stable Diffusion, but I also could get rid of the custom Python post-generation script that was responsible for publishing the file to AWS S3 and informing the web app that the generation was complete.
Instead, I’m using an Elixir feature called a GenServer to asynchronously call the (synchronous) DALL-E API. When the image generation is completely their API returns and then I can handle the response all within the app. I actually like the webapp + processor architecture from before which would give better scaling characteristics but this is simpler for now and I would want to add a queue and other elements before I split the project up.
Next Steps
Future focus will be around game play.
Right now it’s too confusing to immediately know how the game works, all of the UI is terrible on mobile, and the game really needs a dictionary with automated synonym / stemming detection.
But with all that said, I think the game is rather playable and now it has beautiful images.
If you liked this article, show your support with a clap!
Be sure to follow my Medium page for more articles like this and sign up to receive emails when I post so you never miss out.