Online Courses
About
Creating Beyond Words: Generative AI as Creative Instruments and Evolving Narratives
In recent years, the remarkable progress in text-to-image AI models has captured widespread attention and sparked mainstream adoption of these innovative co-creative interfaces. This has led to a mix of excitement, curiosity, and concern. Concurrently, the flourishing open-source development of text-to-image models has democratized access to AI tools, extending beyond experts, tech giants, and professional technologists.


This 7-week course will explore the landscape of text-to-image AIs, focusing on well-known models like Stable Diffusion and the latest Flux model. We'll examine their potential for new modes of content creation and how they can help us reassess our language patterns.


We'll concentrate on developing effective prompting and image-making practices, explore various image synthesis skills related to text-to-image AIs, train our own models to create custom visuals, and learn to generate animations from text.
Additionally, we'll discuss how these tools can integrate into the workflows of artists and technologists, their potential benefits for researchers, and the important considerations and precautions when creating with these AIs.
The class is structured into two modules:
A 4-week foundation module that establishes a solid base for using text-to-image models and introduces essential tools.
A 3-week advanced module that delves into customization techniques.
Meet your instructor
Tong Wu and Yuguang Zhang
Tong Wu and Yuguang Zhang are a new media artist duo raised on the Internet. They now co-exist with their digital doubles in Brooklyn, New York, and the Chrome browser. Their joint artistic practice, which incorporates fine-tuned generative AI models, immersive installations, web-based interactions, 3D animations, community workshops and performance, explores the dynamic relationships we could cultivate with autonomous intelligent systems, and the societal and cultural shifts that accompany these interactions.
Since deciding on their joint practice in 2021, they have created and led community workshop series that focuses on generative AI as creative and artistic tools at organizations such as CultureHub Art and Technology Community, Yale University, School of Visual Arts, Pratt Institute, BlackStar Film Festival, University of Connecticut, University of Illinois Urbana-Champaign, and Emerson College, among others.
Week 1

Introduction to Text-to-Image AIs
Week 1 lays the foundation of the whole course. It introduces the history of image synthesis AI, how text-to-image models came about, what are the building block ML models / components involved in order to make a text-to-image model, the architecture of the open-source model Stable Diffusion, and how to generate an image with Stable Diffusion.
Week 1
Outline:
- Overview of Image Synthesis AIs and Text-to-Image AIs
- Intro to Open Source Text-to-Image Model – Stable Diffusion
- What is a diffusion process and what is a diffusion model
- Stable Diffusion and its predecessor Latent Diffusion
- Anatomy of Stable Diffusion: what does it consist of and what do they do?
- The Text-Encoder
- The Autoencoder: VAE
- The U-Net
- Set Up and Run Stable Diffusion Locally
- Generating a Single Image with Stable Diffusion
Assignment
Try generating 3 - 5 images using Stable Diffusion and pick your favorite one. Keep a note of every parameter used to generate the image, as well as the information of the model. Write down what you like about it, what you don’t like about it, and how you’d hope it can be improved.
Week 2: 

Stable Diffusion Params & Prompting 101
Week 2 will take a look at different SD models and the key diffusion params that control the image synthesis process.
We will then explore the text space of these models — more specifically, it’s about how to guide the generation towards specific directions by mostly changing the text prompt / text embeddings while keeping all other aspects untouched.
Week 2
Outline:
-
Understanding Model Versions
-
SD 1.5
-
SD 2
-
SDXL
-
SDXL Turbo / LCM
-
SDXL Turbo / SD Hyper
-
SD 3
-
Flux
-
-
Understanding the Params
Samplers, Denoising Schedulers -
Prompting 101 in WebUI
-
Common Prompting Techniques
-
Prompt Weights and Prompt Editing
-
-
Developing Your Prompting Practices
-
Learning to Prompt with the Help of Machines
-
Text Encoders (CLIP / OpenCLIP)
-
Clip-Interrogator
-
Negative Prompts
-
Text Embeddings and how to use them
-
Assignment
Download a new trained/tuned base model and try 5 generations by re-using a prompt from week 1. Compare the difference, and try to describe it in words.
Make one or more variations or improvements over each of these 5 generations using only the text prmopt/embedding, and keeping all other parameters unchanged as much as possible. Explain why you did the change and whether it worked out as intended. Also can use and public embeddings.
Optional: generate a new set of images using the new techniques.
Week 3: 

Working in the Image Space
Week 3 will dive into the image space of text-to-image AIs, take a closer look at the diffusion process that actually generates the image from pure noise, and introduce different methods that we can use to intervene generation via image inputs.
We’ll also investigate how to use design / sketching tools to create base / helper images to guide the generation process.
Week 3
Outline:
-
How is Image to Image Works
-
Denoising Strength
-
-
Generating Image Mid-way
-
How to Properly Configure Image-to-Image
-
-
Generation
-
Modifying the Image
-
What is Inpainting and How Does it work
-
What is Outpainting and How Does it Work
-
-
Guide the Generation with Additional
-
Helper Image
-
Using Masks
-
Sketch Masks
-
-
Inpainting Models
-
Other Img2Img Scripts
-
Assignment
Choose three different img2img techniques from this week and re-generate the image of week 1 or week 2 by using it as the image input and keeping all other parameters untouched if possible. Explain why you did the change and whether it worked out as intended.
Optional: try iteratively using your img2img output as the new input and do a few rounds of generation. See what you’d get, and if you can reduce the artifacts by combining techniques from week 2 and week 3.
Week 4: 

Platforms and Tools
In Week 4 we will look at tools and platforms beyond the official Stable Diffusion implementation that provides alternative image synthesis capabilities utilizing their text-to-image AIs. We’ll compare their differences and highlight unique features of each platform, as well as introducing tools to evaluate and improve image generation.
Week 4
Outline:
-
Platforms / Tools
-
Dall-E 3​
-
Midjourney
-
DreamStudio / ClipDrop
-
Lexica.art
-
-
Also, we'll look into the libraries/scripts/extensions available in SD Web UI and see how they can be used to improve the image generation workflow, such as: ​
-
Image Info​
-
Aspect Ratio
-
Infinite Image Browser
-
Regional Prompter
-
Adetailer
-
-
And other platforms such as Forge to check out Flux models:​
-
Flux - NF4​
-
Week 5: 

Advanced Image Control
In Week 5 we will dive deep into two of the most important components that provides image-based generation guidance — ControlNet and IP-Adapters.
Week 5
Outline:
-
How ControlNet Works
-
ControlNet Models
-
Canny
-
SoftEdge
-
Open-pose
-
Reference-only
-
-
Multi-ControlNet
-
Using ControlNet for Inpainting
-
ControlNet Model Versions
-
IP-Adapters
-
How IP-Adapters Work
-
IP-Adapters Image Prompt
-
-
IP-Adapters Advanced Weighting
-
IP-Adapter Model Versions
Assignment
Make 5 generations using one or more ControlNet models, and 5 generations using IP-Adapter models. Document the before / after, why you choose these models, and whether they’re working as intended.
Week 6: 

Image Model Customization
Week 6 will focus on training a text-to-image LoRA model to create coherent visual subjects and styles. We’ll explore training / fine-tuning methods that target different components of the model and see what they are good at and when to use them.
Week 6
Outline:
-
Understanding LoRA Models
-
Different Types of LoRA
-
Using LoRA models
-
-
Dataset Curation
-
Requirements for an Effective Training Image Dataset
-
Strategies for Creating Dataset for Different Purposes
-
-
Re-training the Stable Diffusion Model
-
Using Tools
-
Using Scripts
-
-
DreamBooth & LORA
-
DreamBooth Fine-tuning
-
LORA Fine-tuning
-
-
Aesthetics Gradients
-
What is Aesthetics Gradient
-
How to Create Your Own Aesthetics Gradient and Apply Them
-
Assignment
You will be paired up in groups of two.
Experiment on training a LoRA model for your classmate. Document your training / tuning process and make 5 generations.
Week 7: 

Animation
Week 7
Outline:
In week 7 we will explore different ways to stitch multiple images together and create animations / videos. We’ll also look at some text-to-videos tools and compare their differences.
-
How to make AI animations:
-
Ways of Connecting Frames
-
Blend Frames
-
Interpolation
-
Warping
-
Creating and Managing the Generation Sequence
-
-
Camera Movements
-
2D vs 3D
-
Timing the Camera Movement
-
-
Txt2Vid Animation with Deforum
-
Txt2Vid Animation with AnimateDiff
-
Other Text to Video Tools
-
Luma / Runway / Kling
-
CogVideoX
-