Bright Path Tuition Centre

Core Source Code & NL-to-SQL Implementation

Gemini 2.5 Flash FastAPI SQLite Python GitHub Pages

About This Code Showcase

This curated code showcase demonstrates how Bright Path Tuition Centre translates natural-language management questions into safe SQL queries against a 7-table database of 185 children, 22 specialist tutors, and 63 classes.

The snippets below highlight the config-driven schema, seeded demo data, prompt template, and SQL safety validation. Deployment scripts and API keys are omitted for clarity.

Source Files

config.yaml — 7-table schema + business rules + quick actions
seed_generator.py — Builds data.db with 9 seeded demo stories
main.py — FastAPI app, NL-to-SQL via Gemini
index.html — Hybrid demo page (presets + client-side Gemini)
data_context.json — Compact database context for client-side queries
Dockerfile — Cloud Run container
requirements.txt — Python dependencies
README.md — Portfolio-ready documentation

config.yaml — Schema & Business Rules

The config file defines the full 7-table schema, business rules, and quick actions. The AI agent receives this as context in every query. Here are key tables:

config.yaml — Table Definitions (excerpt)

schema:
  tables:

    students:
      # 185 children across Preschool to P6
      columns:
        - student_id: INTEGER PRIMARY KEY
        - name: TEXT
        - year_group: TEXT  # Preschool, P1, P2, ... P6
        - date_of_birth: DATE
        - enrolment_date: DATE
        - parent_contact: TEXT
        - status: TEXT  # active, withdrawn, graduated

    tutors:
      # 22 specialist tutors with subject expertise
      columns:
        - tutor_id: INTEGER PRIMARY KEY
        - name: TEXT
        - subject_expertise: TEXT  # Maths, English, BM, Science
        - hire_date: DATE
        - status: TEXT  # active, on_leave, terminated

    assessments:
      # Score records per child per subject
      columns:
        - assessment_id: INTEGER PRIMARY KEY
        - student_id: INTEGER REFERENCES students
        - subject: TEXT
        - score: INTEGER  # 0-100
        - assessment_date: DATE
        - assessment_type: TEXT  # PBD, UASA, internal

business_rules:
  - "Always refer to learners as 'children', never 'students'"
  - "Year 4 is the MOE anchor year from 2026"
  - "P5/P6 children below P4 baseline score trigger escalation"
  - "Maximum class size is 10 children"
  - "Improvement = latest score minus previous score"
  - "Top improver threshold: 15+ marks between assessments"
                

seed_generator.py — Seeded Demo Stories

The seed generator builds a realistic database with 9 named children whose data tells specific stories — enabling meaningful demo queries:

seed_generator.py — Seeded Story Children

# 9 seeded children with specific learning trajectories
# These stories make demo queries return meaningful results

SEEDED_STORIES = [
    {
        "name": "Ahmad Rizqi",
        "year_group": "P4",
        "story": "Year 4 anchor child — strong in BM, struggling in Maths",
        "scores": {
            "BM": [82, 85, 88],
            "Maths": [45, 48, 52],
            "English": [70, 72, 75],
            "Science": [65, 68, 70]
        }
    },
    {
        "name": "Mei Ling Tan",
        "year_group": "P6",
        "story": "P6 escalation — below P4 baseline in Science, needs urgent help",
        "scores": {
            "BM": [60, 58, 55],
            "Maths": [55, 52, 50],
            "English": [65, 63, 60],
            "Science": [40, 38, 35]
        }
    },
    {
        "name": "Darren Lim",
        "year_group": "P5",
        "story": "Top improver — jumped 20 marks in English after tutor switch",
        "scores": {
            "BM": [70, 72, 75],
            "Maths": [78, 80, 82],
            "English": [50, 55, 70],
            "Science": [72, 74, 76]
        }
    },
    # ... 6 more seeded stories covering:
    # - Preschool readiness check
    # - Consistent high performer (P3)
    # - Withdrawn child (attendance drop)
    # - New enrolment (P1, only 1 assessment)
    # - Multi-subject weakness (P4 anchor concern)
    # - Tutor quality comparison anchor child
]

def seed_database(db_path: str):
    """Build data.db with 185 children (9 seeded + 176 random)"""
    conn = sqlite3.connect(db_path)

    # Insert seeded children first
    for child in SEEDED_STORIES:
        insert_student(conn, child)
        insert_assessments(conn, child)

    # Fill remaining slots with randomised children
    for i in range(176):
        insert_random_student(conn, i)

    conn.commit()
    conn.close()
                

main.py — FastAPI App & NL-to-SQL

The FastAPI backend receives natural language questions, constructs a Gemini prompt with the full schema context, generates SQL, validates it for safety, and returns formatted results:

main.py — Prompt Template

SYSTEM_PROMPT = """You are a data assistant for Bright Path Tuition Centre.
You translate natural language questions into SQLite SQL queries.

IMPORTANT RULES:
- Always refer to learners as "children", never "students"
- Year 4 (P4) is the MOE anchor year from 2026
- P5/P6 children scoring below P4 baseline (60%) trigger escalation
- Maximum class size is 10 children
- Improvement = latest score - previous score
- Top improver threshold: 15+ marks between assessments
- Always use SELECT only — never INSERT, UPDATE, DELETE, or DROP

DATABASE SCHEMA:
{schema_context}

Given the user's question, return ONLY a valid SQLite SELECT query.
Do not include explanations or markdown formatting.
"""

async def generate_sql(question: str) -> str:
    """Send question + schema to Gemini, get SQL back"""
    model = genai.GenerativeModel("gemini-2.5-flash")
    response = model.generate_content(
        SYSTEM_PROMPT.format(schema_context=schema_context)
        + f"\n\nQuestion: {question}",
        generation_config=genai.GenerationConfig(
            temperature=0.0,
            max_output_tokens=500
        )
    )
    return response.text.strip()
                

main.py — SQL Safety Validation

def is_safe_sql(sql):
    """Block anything that isn't a read-only SELECT or CTE (WITH ... SELECT)."""
    upper = sql.upper().strip().rstrip(";").strip()

    # Allow both plain SELECT and WITH...SELECT (CTE pattern)
    if not (upper.startswith("SELECT") or upper.startswith("WITH")):
        return False

    # Word-boundary check so 'created_at' doesn't trip 'CREATE'
    dangerous = ["INSERT", "UPDATE", "DELETE", "DROP",
                 "ALTER", "TRUNCATE", "EXEC",
                 "REPLACE", "ATTACH"]

    for keyword in dangerous:
        if re.search(r"\b" + keyword + r"\b", upper):
            return False

    return True


@app.post("/ask")
async def ask_question(request: QuestionRequest):
    """Main endpoint: question in, answer out"""
    sql = await generate_sql(request.question)

    if not is_safe_sql(sql):
        return {"error": "Generated query failed safety check"}

    # Execute validated SQL against data.db
    conn = sqlite3.connect("data.db")
    cursor = conn.execute(sql)
    columns = [desc[0] for desc in cursor.description]
    rows = cursor.fetchall()
    conn.close()

    # Format results for the frontend
    results = [dict(zip(columns, row)) for row in rows]
    return {"sql": sql, "results": results, "count": len(results)}
                

Technical Implementation Notes

Key Design Decisions

Config-driven schema: The entire database schema and business rules live in config.yaml, making it easy to add tables or update rules without touching application code
Temperature 0.0: Deterministic SQL generation ensures the same question always produces the same query — critical for trust in a management tool
Seeded demo stories: Nine children with scripted learning trajectories ensure demo queries always return meaningful, explainable results
Hybrid deployment: GitHub Pages serves a static demo with client-side Gemini calls; FastAPI serves the full backend for Cloud Run deployment
Read-only SQL enforcement: The is_safe_sql function uses word-boundary regex to block dangerous SQL keywords without false-positives on column names like 'created_at'. Supports both SELECT and WITH/CTE queries.
Malaysian education context: Business rules encode post-UPSR assessment norms (PBD, UASA, Year 4 anchoring) that generic AI tools would not know

Why SQLite?

Zero configuration: No database server to provision — ideal for a single-centre management tool
Fast reads: All analytics queries run in milliseconds on 185 children
Portable: The entire database is a single file, making backups and demo resets trivial

Try Demo View Full Project Details