:iidrr courses

Online Courses

About

Creating Beyond Words: Generative AI as Creative Instruments and Evolving Narratives

In recent years, the remarkable progress in text-to-image AI models has captured widespread attention and sparked mainstream adoption of these innovative co-creative interfaces. This has led to a mix of excitement, curiosity, and concern. Concurrently, the flourishing open-source development of text-to-image models has democratized access to AI tools, extending beyond experts, tech giants, and professional technologists.  

This 7-week course will explore the landscape of text-to-image AIs, focusing on well-known models like Stable Diffusion and the latest Flux model. We'll examine their potential for new modes of content creation and how they can help us reassess our language patterns.  

We'll concentrate on developing effective prompting and image-making practices, explore various image synthesis skills related to text-to-image AIs, train our own models to create custom visuals, and learn to generate animations from text.

Additionally, we'll discuss how these tools can integrate into the workflows of artists and technologists, their potential benefits for researchers, and the important considerations and precautions when creating with these AIs.

Apply Here

The class is structured into two modules:

A 4-week foundation module that establishes a solid base for using text-to-image models and introduces essential tools.

A 3-week advanced module that delves into customization techniques.

Meet your instructor

Tong Wu and Yuguang Zhang

Tong Wu and Yuguang Zhang are a new media artist duo raised on the Internet. They now co-exist with their digital doubles in Brooklyn, New York, and the Chrome browser. Their joint artistic practice, which incorporates fine-tuned generative AI models, immersive installations, web-based interactions, 3D animations, community workshops and performance, explores the dynamic relationships we could cultivate with autonomous intelligent systems, and the societal and cultural shifts that accompany these interactions.

Since deciding on their joint practice in 2021, they have created and led community workshop series that focuses on generative AI as creative and artistic tools at organizations such as CultureHub Art and Technology Community, Yale University, School of Visual Arts, Pratt Institute, BlackStar Film Festival, University of Connecticut, University of Illinois Urbana-Champaign, and Emerson College, among others.

ygdy_a_poster_that_pictures_an_extremely

ygdy_a_poster_that_pictures_an_extremely_wide_shot_of_a_person__94cb777c-ea47-49f5-9903-68

Weekly Outline

Live Course: Nov 2nd, 2024 - Dec 21st, 2024, Every Saturday, 8-10 p.m

Week 1 
Introduction to Text-to-Image AIs

Week 1 lays the foundation of the whole course. It introduces the history of image synthesis AI, how text-to-image models came about, what are the building block ML models / components involved in order to make a text-to-image model, the architecture of the open-source model Stable Diffusion, and how to generate an image with Stable Diffusion.

Week 1 Outline:

Overview of Image Synthesis AIs and Text-to-Image AIs
Intro to Open Source Text-to-Image Model – Stable Diffusion
- What is a diffusion process and what is a diffusion model
- Stable Diffusion and its predecessor Latent Diffusion
Anatomy of Stable Diffusion: what does it consist of and what do they do?
- The Text-Encoder
- The Autoencoder: VAE
- The U-Net
Set Up and Run Stable Diffusion Locally
Generating a Single Image with Stable Diffusion

Assignment

Back to Schedule

Try generating 3 - 5 images using Stable Diffusion and pick your favorite one. Keep a note of every parameter used to generate the image, as well as the information of the model. Write down what you like about it, what you don’t like about it, and how you’d hope it can be improved.

Week 2:  
Stable Diffusion Params & Prompting 101

Week 2 will take a look at different SD models and the key diffusion params that control the image synthesis process.
We will then explore the text space of these models — more specifically, it’s about how to guide the generation towards specific directions by mostly changing the text prompt / text embeddings while keeping all other aspects untouched.

Week 2 Outline:

Understanding Model Versions
- SD 1.5
- SD 2
- SDXL
- SDXL Turbo / LCM
- SDXL Turbo / SD Hyper
- SD 3
- Flux
Understanding the Params
Samplers, Denoising Schedulers
Prompting 101 in WebUI
- Common Prompting Techniques
- Prompt Weights and Prompt Editing
Developing Your Prompting Practices
Learning to Prompt with the Help of Machines
- Text Encoders (CLIP / OpenCLIP)
- Clip-Interrogator
- Negative Prompts
- Text Embeddings and how to use them

Assignment

Back to Schedule

Download a new trained/tuned base model and try 5 generations by re-using a prompt from week 1. Compare the difference, and try to describe it in words.

Make one or more variations or improvements over each of these 5 generations using only the text prmopt/embedding, and keeping all other parameters unchanged as much as possible. Explain why you did the change and whether it worked out as intended. Also can use and public embeddings.

Optional: generate a new set of images using the new techniques.

Week 3:  
Working in the Image Space

Week 3 will dive into the image space of text-to-image AIs, take a closer look at the diffusion process that actually generates the image from pure noise, and introduce different methods that we can use to intervene generation via image inputs.
We’ll also investigate how to use design / sketching tools to create base / helper images to guide the generation process.

Week 3 Outline:

How is Image to Image Works
- Denoising Strength
Generating Image Mid-way
- How to Properly Configure Image-to-Image
Generation
Modifying the Image
- What is Inpainting and How Does it work
- What is Outpainting and How Does it Work
Guide the Generation with Additional
Helper Image
- Using Masks
- Sketch Masks
Inpainting Models
- Other Img2Img Scripts

Assignment

Back to Schedule

Choose three different img2img techniques from this week and re-generate the image of week 1 or week 2 by using it as the image input and keeping all other parameters untouched if possible. Explain why you did the change and whether it worked out as intended.
Optional: try iteratively using your img2img output as the new input and do a few rounds of generation. See what you’d get, and if you can reduce the artifacts by combining techniques from week 2 and week 3.

Week 4:  
Platforms and Tools

In Week 4 we will look at tools and platforms beyond the official Stable Diffusion implementation that provides alternative image synthesis capabilities utilizing their text-to-image AIs. We’ll compare their differences and highlight unique features of each platform, as well as introducing tools to evaluate and improve image generation.

Week 4 Outline:

Platforms / Tools
- Dall-E 3
- Midjourney
- DreamStudio / ClipDrop
- Lexica.art
Also, we'll look into the libraries/scripts/extensions available in SD Web UI and see how they can be used to improve the image generation workflow, such as:
- Image Info
- Aspect Ratio
- Infinite Image Browser
- Regional Prompter
- Adetailer
And other platforms such as Forge to check out Flux models:
- Flux - NF4

Assignment

Back to Schedule

Make 5 generations using different platforms and compare the generations.

Week 5:  
Advanced Image Control

In Week 5 we will dive deep into two of the most important components that provides image-based generation guidance — ControlNet and IP-Adapters.

Week 5 Outline:

How ControlNet Works
ControlNet Models
- Canny
- SoftEdge
- Open-pose
- Reference-only
Multi-ControlNet
Using ControlNet for Inpainting
ControlNet Model Versions
IP-Adapters
- How IP-Adapters Work
- IP-Adapters Image Prompt
IP-Adapters Advanced Weighting
IP-Adapter Model Versions

Assignment

Back to Schedule

Make 5 generations using one or more ControlNet models, and 5 generations using IP-Adapter models. Document the before / after, why you choose these models, and whether they’re working as intended.

Week 6:  
Image Model Customization

Week 6 will focus on training a text-to-image LoRA model to create coherent visual subjects and styles. We’ll explore training / fine-tuning methods that target different components of the model and see what they are good at and when to use them.

Week 6 Outline:

Understanding LoRA Models
- Different Types of LoRA
- Using LoRA models
Dataset Curation
- Requirements for an Effective Training Image Dataset
- Strategies for Creating Dataset for Different Purposes
Re-training the Stable Diffusion Model
- Using Tools
- Using Scripts
DreamBooth & LORA
- DreamBooth Fine-tuning
- LORA Fine-tuning
Aesthetics Gradients
- What is Aesthetics Gradient
- How to Create Your Own Aesthetics Gradient and Apply Them

Assignment

Back to Schedule

You will be paired up in groups of two.
Experiment on training a LoRA model for your classmate. Document your training / tuning process and make 5 generations.

Week 7:  
Animation

Week 7 Outline:

In week 7 we will explore different ways to stitch multiple images together and create animations / videos. We’ll also look at some text-to-videos tools and compare their differences.

How to make AI animations:
- Ways of Connecting Frames
- Blend Frames
- Interpolation
- Warping
- Creating and Managing the Generation Sequence
Camera Movements
- 2D vs 3D
- Timing the Camera Movement
Txt2Vid Animation with Deforum
Txt2Vid Animation with AnimateDiff
Other Text to Video Tools
- Luma / Runway / Kling
- CogVideoX

Assignment

Back to Schedule

Experiment on creating a 20 second short animation

Contact Us

Subscribe for events and artist exclusives.