Teleprompt moves to DALL-E 3

The image generation game

Byron Salty
4 min readJan 1, 2024

Last year I built a little Wordle clone, with the premise that you had to pick words from a prompt instead of letters from a word.

Essentially, the game gives you an image and you are expected to recreate the image by guessing prompt which is made up of 5 words (and maybe a small word that is given to you). Beyond that the game is mostly like Wordle where the words turn green/yellow/gray based on their correctness.

For example, the image below was created with a prompt in the form of “blank blank blank in a blank blank”

Teleprompt puzzle for 2024–01–01

I mostly wanted to play with some image generation AI and see if I could build a functional game in one day. It ended up taking two days — mostly because of the complexity involved in integrating Stable Diffusion.

In honor of yesterday’s Wordle (yes that’s my name), I spent part of my last day of 2023 making a significant upgrade to Teleprompt.

Wordle — 2023–12–31

Early Architecture

Stable Diffusion is pretty impressive and I don’t want to sound too negative about it because it is extremely powerful and flexible. It may have actually fallen into the category of “too flexible” and I ended up building more than I absolutely needed to. I probably should have looked harder for a hosted solution but when I didn’t find one quickly I decided to self-host Stable Diffusion from my house and use message passing to integrate the game itself with the image generation. It worked, but architecturally not a good idea and honestly it was a “ship it” decision that was never meant to last long.

The architecture looked roughly like this:

Architecture with Stable Diffusion

Beautiful Images

Beyond the architectural oddness and brittleness of self-hosting, the images were less visually appealing than, for instance, Midjourney at the time. Often, the images looked weird or simply didn’t represent the prompt very well.

Unfortunately, Midjourney didn’t create an API that would have made integration easy. I think their use of Discord as their primary (only?) interface is pretty clever and you can’t argue with their success. Their images have been excellent but their lack of an API blocked me from using them.

However, when OpenAI released DALL-E 3 and made it available via API in November, I knew I had to make the switch. The images produced were better and now I could clean up my architecture.

The dragon image above was created with DALL-E 3. It is simply visually stunning. You may want to try to create it in the game before reading more.

The following will spoil the dragon clue

Nuanced Images

Perhaps more importantly for game play, there is detail and nuance to the DALL-E 3 images that wasn’t captured in previous tools.

Look at the following variations of the above dragon image with slightly wrong prompt words. DALL-E does a great job of pulling out these small differences which previously were not visible in Stable Diffusion making the game play harder.

DALL-E captures the slight prompt variations well

Updated Architecture

Now that I had an API I could use to generated images I could simplify the architecture significantly.

Today’s simplified architecture

I no longer needed self-host Stable Diffusion, but I also could get rid of the custom Python post-generation script that was responsible for publishing the file to AWS S3 and informing the web app that the generation was complete.

Instead, I’m using an Elixir feature called a GenServer to asynchronously call the (synchronous) DALL-E API. When the image generation is completely their API returns and then I can handle the response all within the app. I actually like the webapp + processor architecture from before which would give better scaling characteristics but this is simpler for now and I would want to add a queue and other elements before I split the project up.

Next Steps

Future focus will be around game play.

Right now it’s too confusing to immediately know how the game works, all of the UI is terrible on mobile, and the game really needs a dictionary with automated synonym / stemming detection.

But with all that said, I think the game is rather playable and now it has beautiful images.

If you liked this article, show your support with a clap!

Be sure to follow my Medium page for more articles like this and sign up to receive emails when I post so you never miss out.

--

--

Byron Salty
Byron Salty

Written by Byron Salty

Interested in Leadership, Technology, and People.

Responses (2)