Unusual Coloring Book - AI-Powered Fairy Tale Coloring Experience

📋 Project Overview & Problem Statement

Challenge: Standard AI coloring books generate disconnected images with no narrative depth. Children's story apps produce text but no visual activity. No existing product combines perspective-shifting storytelling with an interactive coloring experience in the same session.

Solution: Unusual Coloring Book lets the user step inside a classic fairy tale — Cinderella — and choose how each page unfolds from a menu of three pre-written story paths. Each choice triggers a single AI illustration call. The user then colours the image directly in the browser and can export the finished book as a print-ready PDF.

Key Benefits

No waiting for story text — all narrative is pre-written, only the illustration is generated
Meaningful choices — each page offers 3 thematically distinct paths that work together in any combination
Seamless continuity — any page 1 choice connects naturally to any page 2 choice, up through page 6
Interactive coloring canvas — flood-fill, 16-colour palette, undo, clear, page navigation
AI illustration adjustment — type a short instruction to redraw the scene before colouring
PDF export — print-ready layout: one illustration + story text per landscape page

📖 Story Structure

A single story — Cinderella — with 6 chapters, each offering 3 pre-coded narrative paths. 18 story texts and 18 image prompts are all fixed in the backend. The AI only generates the illustration for the chosen path.

CoverAI-generated, cached after first load

Page 1The Morning Before the Ball (A / B / C)

Page 2The Invitation Arrives (A / B / C)

Page 3Getting Ready (A / B / C)

Page 4The Grand Ballroom (A / B / C)

Page 5Something Changes (A / B / C)

Page 6The Morning After (A / B / C)

Every option on page N was written to connect seamlessly to every option on page N+1. The user's path through the story feels unique regardless of which combination they choose.

🖥️ Application Features

🎭 Story Cover Page

The app opens with an AI-generated cover illustration for Cinderella, cached server-side after the first request. The user clicks the cover to begin the story.

📝 Option Selector

Each page presents 3 option cards (A, B, C), each showing a one-line preview and the full story text. Selecting an option triggers a single Gemini image generation call.

🎨 Flood-Fill Coloring Canvas

BFS flood-fill algorithm on an HTML5 Canvas. Click any enclosed area to fill it with the selected colour. Undo stack (10 steps), clear button, and page navigation.

🖼️ Illustration Adjustment

After a page is generated, the user can type a short instruction (e.g. "make the background darker") to redraw the scene. Uses Gemini image editing with the same coloring-book style.

📄 PDF Export

jsPDF generates a landscape A4 PDF. Each page shows the illustration at the top and the story text below. The file is named and downloaded immediately from the browser.

🗂️ Thumbnail Navigation

A strip of thumbnails in the sidebar shows all generated pages. The user can click any thumbnail to jump to that page and colour it.

🤖 AI Integration & Intelligence

🖼️ Gemini Nano Banana 2 — Image Generation

Each illustration is generated from a pre-written prompt describing the scene in coloring-book style: thick black outlines, white fill, child-safe, no shading. The prompt is fixed — only the Gemini call varies per user choice.

💾 Server-Side Caching

Generated images are cached in memory on the backend (by page + option key). If the user or another session requests the same combination, the cached image is returned instantly — no second API call.

✏️ Gemini Image Editing

The user can submit a plain-text instruction to adjust an existing illustration. The backend sends the original image bytes plus the instruction to Gemini, and the updated image replaces the original in the canvas.

📕 Cover Generation

The Cinderella cover illustration is generated once on first load from a fixed, highly-detailed coloring-book prompt. It is cached on the server and served instantly on all subsequent loads.

🏗️ Technical Architecture

Frontend

React 18 TypeScript Vite HTML5 Canvas API jsPDF Fredoka One / Nunito fonts

Backend

Python 3.13 FastAPI Uvicorn google-genai SDK In-memory cache

System Architecture

Vite dev server (port 3000) proxies /api requests to FastAPI (port 8000)
GET /story — returns full story structure: 6 pages × 3 options with story text (no images, instant)
GET /story/cover — returns the cover illustration (generated once, then cached forever)
POST /generate/page-image — takes page_number + option_id, returns base64 illustration (cached after first call)
POST /edit/image — takes an existing image and a text instruction, returns the updated image
All 18 page+option combinations can be pre-warmed by hitting each combination once; subsequent loads are instant

⚙️ Development Setup

Prerequisites

Python 3.10+
Node.js 18+
Gemini API key (from Google AI Studio)

Quick Start (Windows — double-click)

1. Add your Gemini API key to backend/.env:
   GEMINI_API_KEY=your_key_here
   GEMINI_TEXT_MODEL=gemini-3-flash-preview
   GEMINI_IMAGE_MODEL=gemini-3.1-flash-image-preview
   PORT=8000

2. Double-click open-app.bat
   — Opens two terminal windows: UCB-Backend and UCB-Frontend
   — Backend starts at http://localhost:8000
   — Frontend starts at http://localhost:3000

3. Open http://localhost:3000 in your browser

Manual Setup

# Backend
cd backend
python -m venv venv
venv\Scripts\pip install -r requirements.txt
python main.py

# Frontend (separate terminal)
cd frontend
npm install
npm run dev

Environment Variables (backend/.env)

GEMINI_API_KEY=your_key_here
GEMINI_TEXT_MODEL=gemini-3-flash-preview
GEMINI_IMAGE_MODEL=gemini-3.1-flash-image-preview
PORT=8000

📊 Key Metrics

1

AI call per page (image only — no text generation)

18

Pre-coded story paths (6 pages × 3 options)

729

Unique story combinations (3⁶)

0s

Text generation wait (all pre-coded)

Image generation is the only AI call — no text pipeline, no character engine, no story arc generation
Server-side caching means popular combinations load instantly for all users after the first request
jsPDF runs entirely in the browser — no server-side PDF rendering required
The entire story content (18 texts + 18 prompts) is pre-loaded on page init — no AI latency during option selection

🎨 Unusual Coloring Book