About

Published March 18th 2024, by Kevin Nørby Andersen

What happens if we build a tool that assists in generating new ideas when we are not working?

An idea factory

Idea Factory is a speculative AI tool by super ultra that generates images for projects while we are off the clock.

Every day, after human working hours, “the factory” spins up and generates images for each active project, based on a textual prompt.

The factory is open from 6pm-midnight, Copenhagen time. The next morning at 7am, this site rebuilds to show the latest factory output.

Background

One of the aspirations for super ultra as a design studio is to design and develop tools and products, across bits and atoms, that extend human ability to think and create.

Idea Factory comes from a thought experiment: what happens if there is a tool that assists in generating new ideas when we are not working?

This prototype was built to investigate how an asynchronous Generative AI-enabled tool might:

Aid in divergent thinking
Be a uninterrupting, creative sparring partner

It also served as a great opportunity to learn how to:

Work with image generation technologies
Build an automated tool
Deploy AI applications online

Tools That Extend Ability

Text and image generation technologies like ChatGPT and DALL-E-3 are powerful tools, but in the creative process, it is important to be critical and reflect on how the tools we use affect our ideas. These current tools have limitations that provide new opportunities:

Current tools like ChatGPT or MidJourney are synchronous, meaning they only work when you tell them to.
Current tools require your constant, full attention.

Tool building can be a powerful practice to change the way we think and create, and at the time of writing, it is not obvious that the chatbot interface of ChatGPT for text-to-text generation (using GPT) or text-to-image generation (using DALL-E-3) is the best one.

We need to explore new interfaces, modalities and mental models for how to make these generative AI models serve human thought and expression. So that, rather than build tools that replace us, we build tools to augment us.

How It Works

Idea Factory consists of two parts:

Factory bot, for image generation and uploading
Factory website, for displaying projects and associated images

Between the two, a shared database and cloud storage is used to store and retrieve project data and images. For the database, we use MongoDB with a Prisma schema. For cloud storage, we use Google Cloud Storage.

Bot

Factory bot is a Node.js script that runs on a schedule. It scans the project database, and for each active project it generates an image based on the latest project prompt that exists for the given project. It uses OpenAI's DALL-E-3 to generate images.

Website

Factory website is a SvelteKit app that rebuilds every day at 7am. It fetches the latest images from the shared cloud storage and displays them in a grid. For each project, images are displayed in an Instagram-style carousel, grouped by day.

Reflections

Divergent Thinking

Idea Factory was built during a time where multiple projects were in the works, and it made sense to lots of images generated. For wide, divergent exploration, the tool was helpful. For example, for the Sonic Jewelry project, switching between styles (realistic, illustrative, sketch) sparked different ideas.

It seems that an illustrative/sketchy style gives better results when generating an object that doesn't exist in the world yet (like electronic jewelry. which makes sense given that sketching and illustration are styles that often precede making things real.

Convergent Thinking

Unfortunately, the tool proved less useful as ideas started to converge around a direction. At least at the time of writing this, the DALL-E-3 API that is used for the image generation has significant limitations compared to the ChatGPT version and competitor products:

No inpainting, ie it is not possible to paint over the generated images to make targeted edits
The API doesn’t return the “seed ID” of the generated image, a unique identifier that can be referenced later on. E.g. “Generate an image like <SEED-ID> but in the style of line art”
The API does not take an image as input, so all prompts have to be written

Given the rapid development of models like Stable Diffusion, it should be possible to extend the tool with these capabilities.

Future Work

Given these reflections, we outline possible future work for Idea Factory:

Images as moodboards
The ability to render multiple images on a larger canvas in the style of moodboards
Synthetic prompts
It should be possible to chain the image generation with text generation. For example, “a piece of electronic jewelry” as a prompt could be deconstructed into nouns and verbs and extended with different adjectives that would then turn into new prompts.
Nudging
Enable the ability to select image and mark them “more like this”/”less like that”, and use that as input for the next round of image generation
Image-to-image
Rather than prompting using text, using image input. This is a technique already available in many image generation models
Seed ID
Every image should have a unique ID that can be used to reference it in the future. Example: “Remember 9808458973? Generate something like that, but with a nice sunset behind it”
Live painting
Use something like TLDraw to enable painting in the tool, instead of having to go into other tools