flowchart LR
Tutor["💬 <b>Tutor</b><br/>System prompt + behaviour<br/>(e.g. Socratic projectile-motion tutor,<br/>never gives the answer)"]
Artefact["🧪 <b>Workbench artefact</b><br/>Interactive piece on the right side<br/>(simulator, dataset, sandbox, ...)"]
Activity["📚 <b>Activity</b><br/>What a student group works through<br/>(chat + workbench together)"]
Tutor --> Activity
Artefact --> Activity
style Tutor fill:#fff8f8,stroke:#901a1e
style Artefact fill:#fff8f8,stroke:#901a1e
style Activity fill:#f5f5f5,stroke:#333,stroke-width:2px
Architecture decisions
ADRs for Strand A — pedagogical bot infrastructure
Working document. ADR-style: context → options → decision → consequences. The Sign-off column tracks handover ownership — every decision needs a second signature before it’s settled.
Order is rough priority: privacy and platform foundation first, infrastructure and mechanics after, AILANG disclosure last.
Overview
A short non-technical pass over the architecture before the implementation details. The diagrams below are what JB and AR will want to walk teachers through; the ADRs and the technical system view below them are the engineering detail underneath.
What students and teachers see — an activity
An activity is what a student group works through. It’s two pieces designed together, not separately:
The tutor and the workbench reference each other. The tutor’s prompt names specific elements of the workbench (“the blue arrow”, “the v_x graph”); the workbench narrates the student’s actions back into the chat (“Adjusted v₀ to 17.5 m/s ✓”). Designed together they form a coherent learning experience; bolted together loosely they don’t.
Terminology bridge. In engineering language, a tutor is a skill and a workbench is an MCP App. The rest of this page uses those terms. They mean the same thing as tutor and workbench.
Who has access to what — the teacher → class → group → activity hierarchy
flowchart LR
Teacher["👤 Teacher\nUCPH login"]
subgraph Classes["Classes"]
ClassA["7B Physics A"]
ClassB["8A Physics A"]
end
subgraph Groups["Groups (anonymous)"]
Group1["bold-kazoo-87"]
Group2["ruby-petal-72"]
Group3["fluffy-goose-56"]
end
subgraph Activities["Activities"]
L1["Boldkast\nprojectile motion"]
L2["Pendul\nharmonic motion"]
L3["..."]
end
Teacher --> Classes
Classes --> Groups
Groups --> Activities
style Teacher fill:#fff8f8,stroke:#901a1e
style Classes fill:#f5f5f5,stroke:#333
style Groups fill:#f5f5f5,stroke:#333
style Activities fill:#f5f5f5,stroke:#333
- Teacher logs in with UCPH credentials. Owns one or more classes.
- Class is the teacher’s roster unit (e.g. “Class 7B Physics A”). Has one or more groups.
- Group is what students actually join — a short, dictation-friendly code (
bold-kazoo-87) the teacher hands out. Three students sharing a phone is the default unit, matching what AR observed in the school visit. Groups are anonymous: no names, no emails, no PII (see ADR-001). - Activity is what each group can access — a paired tutor + workbench artefact. The teacher decides which activities each class can use. An activity can be reused across classes; a class can have many activities available at once.
How the pieces fit together — composable, not monolithic
flowchart TB
subgraph Frontends["Front ends — any number"]
Web["Student web app<br/>(today)"]
CLI["aiplatform CLI<br/>(today)"]
Future["Future fronts:<br/>Telegram, mobile app,<br/>standalone MCP App, ..."]
end
Backend["<b>Backend</b><br/>Activities (skills) · Teacher / class / group auth<br/>Model router · Chat logs · Budget caps · Telemetry"]
subgraph Plugins["Plugins — any number"]
Workbenches["🧪 Workbench artefacts<br/>(MCP Apps: simulators,<br/>data viewers, sandboxes)"]
Tools["🛠 Backend tools<br/>(MCP servers: RAG retrieval,<br/>code execution, document parse)"]
end
LLMs["🧠 AI providers<br/>Claude · Gemini · self-hosted Ollama"]
Web --> Backend
CLI --> Backend
Future --> Backend
Backend --> Workbenches
Backend --> Tools
Backend --> LLMs
style Backend fill:#fff8f8,stroke:#901a1e,stroke-width:2px
style Frontends fill:#f0f4f8,stroke:#3a5a7a
style Plugins fill:#f0f4f8,stroke:#3a5a7a
style LLMs fill:#f5f5f5,stroke:#333
Three things this picture says:
- The backend is one stable surface. Activities, auth, model routing, chat logs, budget caps and telemetry all live there. The same backend serves today’s student web app, today’s CLI (
aiplatform smoke jutland), and any future front end — Telegram, an institutional mobile app, anything that speaks the same protocols. - Front ends are interchangeable. The student web app is just one consumer. A teacher could in principle interact via CLI; future classroom installations could use a touch-screen kiosk app or a Telegram channel; the activities and the data don’t change.
- Plugins extend the backend without modifying it. Workbench artefacts (interactive widgets like the Boldkast simulator) and backend tools (RAG search, code execution, document parsing) each plug in through standardised protocols. A workbench can be used inside an activity, paired with a tutor — or it can be opened standalone (a teacher could embed the Boldkast simulator alone, no tutor, on a different page or in a different app). The pieces compose.
Why this composability matters for AIPLA: activities that teachers and AR build today become reusable beyond AIPLA. The Boldkast activity works on AIPLA’s web app, in the CLI, and would work in any future front end. The same workbench artefact could be embedded in a textbook publisher’s app, or shared with another physics-education project, without rewriting it. The architecture intentionally does not lock anything into a single product surface.
Platform foundation
AIPLA sits on two open layers from the sunholo-data ecosystem, with the AIPLA-specific configuration on top:
- AI Protocol Platform — open-source template (Apache 2.0) for the application layer: AG-UI streaming, A2UI declarative rendering, MCP tool integration, A2A agent discovery, ADK orchestration, multi-provider model routing, OpenTelemetry observability,
LOCAL_MODEworkshop path. Cloud-agnostic — runs on GCP, AWS, Azure, or on-premises. Used inside the Aitana assistant product among others. - Multivac — Sunholo’s wider AI platform, providing the infrastructure layer: managed model access (Vertex AI Gemini, Ollama hosting), Postgres + pgvector, Cloud Storage, identity, telemetry sinks. AIPLA’s prototype runs on Multivac in an EU region; the self-hosting migration target is replacing Multivac with UCPH equivalents while keeping the template layer intact.
AIPLA-specific work is the configuration on top of both: physics skill packs, the capability-floor router wiring, anonymous group IDs (ADR-001), and EU-region constraints. AIPLA’s repo lives at sunholo-data/cphu-aipla-app, instantiated from the ai-protocol-platform template on 2026-05-19. See ADR-002 for the template adoption rationale and scope discipline.
System view
flowchart LR
User["Student group · Teacher<br/>(student: anonymous group ID, no auth<br/>teacher: UCPH SSO for admin)"]
OnDevice["On-device model<br/>Apple Intelligence · Gemini Nano · WebLLM<br/>tier 4 — gold-standard privacy"]
subgraph Platform["Application — AI Protocol Platform template"]
direction LR
Front["Frontend<br/>Next.js + AG-UI"]
Back["Backend<br/>FastAPI + Google ADK"]
Router["Model router<br/>capability-floor driven"]
end
subgraph Multivac["Infrastructure — Multivac (EU region)"]
direction TB
DB[("Postgres + pgvector<br/>logs · RAG · identity")]
Storage[("Cloud Storage<br/>uploads")]
Claude["Anthropic Claude<br/>tier 1 — cloud-agnostic text"]
Gemini["Google Gemini<br/>tier 2 — EU multimodal"]
Ollama["Self-hosted Ollama<br/>tier 3 — case-by-case per task"]
end
User --> Front
Front -.tier 4 routes here.-> OnDevice
Front --> Back
Back --> DB
Back --> Storage
Back --> Router
Router --> Claude
Router --> Gemini
Router --> Ollama
style User fill:#fff,stroke:#666
style OnDevice fill:#fff,stroke:#2d7d3a,stroke-width:2px,stroke-dasharray:5
style Platform fill:#fff8f8,stroke:#901a1e,stroke-width:2px
style Multivac fill:#f0f4f8,stroke:#3a5a7a,stroke-width:2px
style Router fill:#f5f5f5,stroke:#333,stroke-width:2px
Pink: the template (application layer). Blue: Multivac (infrastructure layer). Self-hosting migration target is to replace the Multivac layer with UCPH equivalents while keeping the template unchanged.
Core vs extensions
The template provides a fixed core of capabilities. Everything else AIPLA adds is a swappable extension that plugs in through one of three protocols: MCP (tools and data), MCP Apps (interactive UI surfaces in sandboxed iframes), or A2UI (declarative UI components rendered inline in chat).
This separation matters for three reasons: successors can add new plugins without touching template internals (handover), extensions can be released individually as Apache-2.0 community-contributable modules (open-source default), and the ADR-002 scope-discipline table defines what ships in AIPLA v1 vs. later.
| Layer | Protocol | What it is | Examples for AIPLA | AIPLA v1 |
|---|---|---|---|---|
| Application core | (template, fixed) | AG-UI streaming, ADK orchestration, skills framework, model router, auth/session, log capture, telemetry | n/a — provided | ✅ Adopted as-is |
| Model providers | (template adapter) | Cloud + local model backends | Claude, Gemini, Ollama, on-device — see ADR-003 | ✅ All four tiers |
| Skills (app configuration) | (template) | Teacher-configurable bot setups | Physics tutors per topic, problem-set helpers, lab assistants | ✅ Multiple per topic |
| Data + computation tools | MCP server | Anything that fetches data, runs computation, or wraps an external service | RAG retrieval (ADR-010), code execution sandbox, document parse via AILANG Parse (ADR-004), domain search | ✅ pgvector RAG · code sandbox · AILANG Parse · 🟡 graph DB (stretch — for C3) |
| Interactive UI extensions | MCP Apps | Sandboxed iframes with their own state and UI | GeoGebra widget, Tracker / sensor-data viewer, concept-map editor for student models (C3), simulation workbench (Strand B) | 🟡 concept-map editor (C3 stretch) · Strand-B target |
| Declarative UI extensions | A2UI | Backend-emitted structured UI elements rendered inline in chat | Problem-hint cards · LaTeX/KaTeX formula blocks · structured feedback panels · teacher-config forms | ✅ Hint cards · LaTeX · feedback panels |
Why this matters for the student-model stretch (Strand C3). The most speculative item maps cleanly onto this architecture: a graph database via MCP for storing per-student concept networks, plus a concept-map editor as an MCP App for both teacher reference-model authoring and student-facing formative-feedback display. Both pieces are swappable, both follow open protocols, and both could be released as standalone Apache-2.0 extensions other physics-education projects can reuse.
Index
| # | Decision | Status | Sign-off |
|---|---|---|---|
| 001 | Student identity: no auth, anonymous group IDs | Decided | M, JB ✓ (2026-05-18) |
| 002 | Strand A built on the AI Protocol Platform | Decided | M, JB ✓ (2026-05-18) |
| 003 | LLM provider mix (Claude · Gemini · Ollama · on-device) | Decided | M |
| 004 | Document parsing via AILANG Parse | Decided | M |
| 005 | Chat log storage | Decided (pending consent details) | M |
| 006 | Cloud provider for prototype: GCP EU via Multivac | Decided | M, JB ✓ (2026-05-18) |
| 007 | Cloud region: europe-north1 (Finland) | Decided | M, [JB / UCPH IT TBC] |
| 008 | Model abstraction / routing layer | Decided (via template) | M |
| 009 | Backend stack | Decided (via template) | M |
| 010 | RAG store | Lean: pgvector · graph DB for C3 stretch | — |
| 011 | Multimodal input handling | Decided · via AILANG Parse (swappable backends) | M |
| 012 | AILANG ecosystem in AIPLA (utilities, not runtime) | Decided | M, [JB to be informed] |
| 013 | Artefact safety / content-review pipeline | Decided | M |
| 014 | Per-group / per-class budget enforcement | Decided | M |
| 015 | Unified multi-surface UI — AI directs the layout | Decided | M |
ADR-001 — Student identity: no auth, anonymous group IDs
Context. The brief is explicit that students should not need accounts. The strongest privacy posture for a research project deploying in Danish schools is to collect no student identity data at all — not pseudonyms, not first names, not device identifiers, nothing.
The brief also notes the typical configuration: one shared phone per three students. The natural unit of analysis is therefore the group, not the individual student. Research outcomes are interpreted at the group / class level, not per-student.
Decision. No student authentication. Students join sessions by entering a system-assigned anonymous group ID (e.g., grp-7B-3, or a short opaque token). The teacher generates group IDs when setting up an activity; each group enters its ID on the shared device to join. Chat logs key on group ID. Nothing personally identifying about any student is captured anywhere in the system.
Teacher auth is separate and minimal: required only for admin functions (creating a class session, generating group IDs, uploading lesson materials, viewing aggregated research data). Use UCPH institutional SSO where available (Firebase Auth federated with UCPH IDP, or whatever JB’s preferred path is). Teachers are existing institutional users; this is the cheapest GDPR posture for the teacher side.
Why this works.
- Privacy by design (GDPR Article 25), enforced by construction — the system is architecturally incapable of collecting student personal data, rather than relying on operational discipline to anonymise after the fact. UCPH data-protection review becomes substantially simpler because the personal-data category is empty by design.
- Data minimisation (GDPR Article 5) — only the strictly necessary group-level signals are processed. There is no “we might need it later” data hoarded.
- No identity-map maintenance — researchers don’t need a translation layer between pseudonyms and real students; there is no real-student data to translate.
- Friction-free onboarding — students don’t sign up, don’t enter emails, don’t reset passwords. Open the URL, enter the group ID, start working.
- Matches the actual classroom configuration — one phone, three students, group is the natural unit anyway.
Trade-offs.
- Cannot follow individual students across sessions or topics. Research is per-group, not per-student. This is a deliberate constraint, not a bug; brief frames research at the classroom level.
- Group IDs can persist across sessions for longitudinal per-group analysis without ever becoming identifying.
Consequences for ADR-005. Anonymisation collapses to a non-issue: there is no PII to anonymise. The “separate identity map that researchers cannot access casually” from the earlier draft is removed — the map doesn’t exist.
Open question for JB. Confirm teacher-auth mechanism (UCPH SSO, Firebase federated, or other) and the granularity of group IDs (per-class only, or finer for small-group work within a class).
ADR-002 — Strand A built on the AI Protocol Platform
Context. Building Strand A’s pedagogical bot infrastructure from scratch within the four-month contract is feasible but tight: GDPR-grade auth, multi-provider model routing, log capture with anonymisation, multimodal upload handling, teacher-facing skill configuration, and the supporting telemetry are each multi-week tracks. The AI Protocol Platform (Apache 2.0, public repo on the sunholo-data org) is an open-source template that already solves most of these. It is cloud-agnostic (GCP, AWS, Azure, on-prem) and designed by M to serve a class of projects including AIPLA — already used inside Sunholo’s wider Multivac AI platform offering and in the Aitana assistant product.
What the template provides off the shelf:
- Protocols — AG-UI streaming, A2UI declarative rendering, MCP tool integration, A2A agent discovery
- Orchestration — Google ADK with sessions, memory, artifacts, evaluation
- Multi-provider model routing — Gemini, Claude, OpenAI through one interface
- Observability — OpenTelemetry → Cloud Trace + Cloud Logging + BigQuery, all internal-trust-boundary
LOCAL_MODE— clone-to-working-chat-UI in under 30 minutes, no cloud credentials required. Critical for postdoc-2 onboarding and for any teacher dialogue session where we want a portable demo.- Skills wizard — non-developer skill creation, matching the brief’s “teacher bot configuration” requirement
- LaTeX rendering — already supported via the ailang-parse integration (relevant for physics formulae output)
Options considered.
- Build Strand A bot infrastructure from scratch in Python/FastAPI on Cloud Run (originally implied by ADR-006)
- Adopt LangChain or a similar open-source framework as the base
- Adopt the AI Protocol Platform template as the foundation and add AIPLA-specific configuration on top
Decision. Adopt the AI Protocol Platform template. AIPLA work is the configuration, skill packs, EU-region pinning, anonymous group join (ADR-001), and self-host migration path on top of an already-mature template.
Scope discipline. The template has more surface area than AIPLA needs in four months. The risk is that because features exist, we feel obliged to use them. Explicit opt-in list for AIPLA v1:
| Feature | AIPLA v1 | Notes |
|---|---|---|
| Web frontend + AG-UI streaming | ✅ Use | Core teacher/student UI |
| Skills wizard + skill configuration | ✅ Use | Matches brief requirement directly |
| Multi-provider model routing | ✅ Use | Wired to capability-floor eval |
| LOCAL_MODE | ✅ Use | Postdoc-2 onboarding, portable demos |
| Anonymised chat-log capture | ✅ Use | Core research data requirement |
| LaTeX rendering | ✅ Use | Physics formula output |
| A2UI declarative components | ✅ Use | Problem-set hint cards, LaTeX formula blocks, structured feedback, teacher-config forms |
| MCP tool integration | ✅ Use | Code-exec sandbox, document parsing, artefact rendering, RAG retrieval |
| MCP Apps (sandboxed iframes) | ✅ Use | Hosts teacher-generated interactive artefacts (HTML simulations, problem workbenches) in chat. Gated by ADR-013 safety pipeline. Pattern based on AR’s existing GenAI trials. |
| A2A agent discovery | ❌ Skip v1 | Useful long-term, not required for the pilot |
| Telegram / Email / WhatsApp channels | ❌ Skip v1 | Available; not needed for STX teacher pilot |
| Vertex AI Search | ❌ Skip v1 | pgvector covers AIPLA’s RAG needs per ADR-010 |
| Real-time collaboration features | ❌ Skip v1 | Out of brief scope |
Disclosure (handover & conflict of interest). The AI Protocol Platform is M’s own work. This is parallel to the AILANG disclosure in ADR-012 and needs the same explicit visibility with JB. Three mitigating points:
- The template is Apache 2.0 licensed and the repo is public; AIPLA’s open-source-at-end-of-project goal is fully compatible
- The template is multi-purpose — designed for a class of projects, not AIPLA-specific. AILANG itself is not required to use the template (it’s used inside the platform for LaTeX parsing and capability benchmarks, but the template supports plain Python stacks)
- M’s architectural authority on AIPLA (per JB’s brief) covers this kind of choice, but the choice should be named rather than implicit
Consequences.
- Build-time saving. ~6 of the 9 Strand A build weeks are absorbed by the template. Frees time for physics-specific extensions, the eval, and the teacher pilot.
- UCPH self-host implications. A few template defaults (Firestore, Firebase Auth for non-LOCAL_MODE) need explicit UCPH equivalents in the self-hosting page; the IT conversation must be informed by actual dependencies, not a generic GCP stack.
- Handover surface. Successors learn the template via its own docs and
LOCAL_MODE— no cloud credentials required to come up to speed. - Update cadence. Template tracks its own roadmap. Needs explicit version-pinning and upstream-tracking in the handover package.
ADR-003 — LLM provider mix (4 tiers: AI API / Server / Server-local / On-device)
Context. AIPLA needs strong text reasoning for problem-set hints, strong multimodal for worksheet photos, EU data residency for student-facing use, and a credible path to self-hosted models for the UCPH on-prem migration target. No single tier gives the best answer across all of these — the strategy is to route per task class via the capability-floor eval.
Decision. Four tiers, distinguished by where the model runs and what hardware is required:
| Tier | Where it runs | Hardware needed | Example models | Why this tier exists |
|---|---|---|---|---|
| 1 — AI API | Cloud-hosted service | None on our side — API call | Claude Opus 4.7 (cloud-agnostic via Anthropic / Bedrock / Vertex); Gemini 2.5 / 3.1 Pro (Vertex AI, EU regions); GPT-5.5 | Highest current capability; fastest path to working prototype; EU residency available |
| 2 — Self-hosted server | UCPH GPU cluster (or equivalent) | 4–8× H100 / H200 / B200 (~280–800+ GB VRAM total, NVLink) | DeepSeek V4 Pro (1.6T MoE, 49B active); DeepSeek V4 Flash (284B MoE, 13B active); future large open-weight | Open-weight at near-frontier capability — DeepSeek V4 Pro is at ~90% on GPQA Diamond. Full institutional data sovereignty; zero API spend. |
| 3 — Server-local | Single workstation or small server (M’s 128 GB Mac during proto, or a single H100/A100) | ~30–60 GB unified memory or single high-end GPU | Qwen 3.5 27B; Gemma 4 31B; Phi-4 14B; smaller DeepSeek distills | Mid-size open-weight that fits without cluster hardware. Good for development, demos, smaller departmental deployments, and LOCAL_MODE portability. |
| 4 — On-device | Student device (phone, tablet, laptop) | iPhone 15 Pro+, Pixel 8+, Samsung S24+, modern laptop with NPU | Apple Intelligence (~3B); Gemini Nano (3.25 B); WebLLM browser models; small Phi / Gemma variants | Gold-standard privacy — data never leaves the device. Also enables offline use (playground, poor-WiFi labs). Constrained to lighter tasks: summarisation, formatting, light Q&A. |
Concrete model selection per task is driven by capability-floor eval results — see Evaluation.
v0.1 active model (2026-05-20): gemini-3.5-flash on Vertex AI global endpoint (GA 2026-05-19). Cross-provider fallback documented: Claude Sonnet 4.6. Router-overridable per ADR-008. Choice is provisional — calibrate against the capability-floor eval once it’s running on AIPLA tasks.
Consequences.
- The router handles all four tiers behind one interface; on-device adds a thin client-side adapter.
- Claude cloud-agnosticism is the GCP hedge — moving off GCP would change only auth endpoints, not providers. Gemini stays Vertex-EU for multimodal.
- Tiers 2 and 3 are both Ollama-runnable open-weight; the distinction is GPU cluster (Tier 2: DeepSeek V4 Pro class) vs single workstation (Tier 3: Qwen / Gemma 4 class).
- Migration trajectory: as the eval’s local-readiness fraction rises per task class, traffic shifts from Tier 1 toward Tiers 2–4. Router falls back tier 4 → 3 → 2 → 1 when a higher-privacy tier is unavailable.
ADR-004 — Document parsing via AILANG Parse
Context. AIPLA bots ingest a mix of teacher and student documents: lesson plans (DOCX), worksheets, sensor exports (CSV/XLSX), email threads (EML), slides (PPTX), lab reports (PDF), photos (PNG/JPG). Passing every upload to a multimodal LLM is wasteful in tokens, slow, and weaker on privacy than necessary.
What AILANG Parse (Apache 2.0) provides.
flowchart LR
Upload[Uploaded document]
Parse{AILANG Parse}
Det["Deterministic XML parse<br/>13 formats · zero LLM tokens<br/>DOCX · PPTX · XLSX · ODT · ODP · ODS<br/>HTML · MD · CSV · EPUB · EML · MBOX · TEX"]
AI["AI multimodal extraction<br/>2 formats · routed via model router<br/>PDF · images"]
Out["Structured blocks<br/>+ markdown"]
Upload --> Parse
Parse --> Det
Parse --> AI
Det --> Out
AI --> Out
style Det fill:#e8f5e9,stroke:#2d7d3a
style AI fill:#fce7e8,stroke:#901a1e
style Parse fill:#f5f5f5,stroke:#333,stroke-width:2px
The deterministic path is load-bearing for privacy: a Word doc or Excel sheet is parsed from its XML directly — the content never reaches an LLM. That covers most formats in Danish stx physics teaching (lesson materials, problem sets, lab data, email).
The AI path is only invoked for genuinely image-shaped content (scanned PDFs, hand-drawn diagrams) and routes through the same capability-floor-driven model router as chat tasks.
Decision. Document upload pipeline routes through AILANG Parse first. Office and structured formats take the deterministic path. PDF and image extraction routes through the template’s model router; the capability-floor eval determines which model handles which kind of PDF/image task.
Why this is a strong GDPR move.
| Format | What gets parsed | Where the content goes |
|---|---|---|
| DOCX, PPTX, XLSX, ODT, ODP, ODS | Lesson plans, slides, worksheets, exports | Local parsing — no external call |
| CSV, EML, MBOX, HTML, MD, TEX, EPUB | Tabular data, email, structured text | Local parsing — no external call |
| PDF, JPG, PNG | Scans, hand-drawn diagrams, screenshots | Multimodal LLM via model router (EU region; local Ollama future) |
For a UCPH data-protection review, “13 of 15 supported formats never leave the trust boundary” is a stronger story than any single-vendor multimodal approach.
Disclosure. AILANG Parse is part of the AILANG ecosystem (ADR-012) — same disclosure considerations apply. It is published as a standalone Apache 2.0 library with Python, JavaScript, and Go SDKs and is used by projects beyond AIPLA. The AI Protocol Platform template integrates it natively.
Consequences.
- Cost & latency. Substantially fewer LLM tokens on ingest; deterministic parsing is sub-second vs. multi-second multimodal calls.
- UCPH self-host friendliness. AILANG Parse runs anywhere — CLI, Python SDK, or local service. No cloud dependency.
- Calibration tracking. Extraction quality is part of the capability-floor benchmark (T5 worksheet OCR, T6 tabular data) — exactly what the eval is designed for.
ADR-005 — Chat log storage
Context. Researcher access to chat logs is a core brief requirement. With ADR-001 eliminating student authentication entirely, the anonymisation question largely collapses: there is no student PII to anonymise.
This is the privacy-by-design (GDPR Article 25) commitment made concrete. The relevant regulatory frameworks are:
- GDPR — general personal-data processing (Article 25 privacy by design, Article 5 data minimisation, Article 35 DPIA for educational contexts involving minors)
- ePrivacy Directive — electronic communications specifically; chat logs fall within its scope independently of GDPR. ePrivacy obligations sit alongside GDPR, not inside it.
Decision. Log every chat interaction, keyed by anonymous group ID, into a researcher-accessible BigQuery dataset via the template’s OpenTelemetry sink (per Multivac’s OBSERVABLE-BY-DEFAULT axiom). Retention period and access scope follow the consent form (JB owns).
What gets stored:
- Full prompt and response content (group ID is the only “identifier”)
- Timestamps, skill / topic context, model used (the latter feeds the capability-floor eval)
- Any uploaded resources by reference (content sits in Cloud Storage, EU-region)
Observability instrumentation (as of v0.1+ build): the OTel pipeline is now wired at the following points, all keyed on anonymous group ID:
| Signal | What it captures | Used for |
|---|---|---|
| Group-join span | Group code, activity, timestamp | Session boundary; feeds teacher dashboard “active now” |
| Chat turn span | Role, turn index, model, latency | Research log; capability-floor eval input |
| Workbench state write | mcp_app_context.{skill}.{field}, value, timestamp |
Session report sim-run aggregates; teacher analytics |
| Progress checklist tick | Step index, timestamp | Teacher dashboard; maps to DRA coverage in v1.2 |
| Proactive greet span | Whether greet fired, latency | Tutor responsiveness monitoring |
All spans go to Cloud Trace (real-time) + BigQuery (research-scale query). No span contains PII — group ID is the only identifier, per ADR-001.
What does not get stored:
- Real names, email addresses, UCPH IDs, device identifiers — none of these enter the system per ADR-001
- Any data linking a group ID back to specific students
Pending. Consent form drives final retention period and the scope of researcher access; the working assumption above can be tightened or extended once the consent text is settled.
DPIA recommendation. Even though personal data is out of scope by design, an educational research project involving minors benefits from a brief Data Protection Impact Assessment (GDPR Article 35). The DPIA documents why personal-data processing was avoided architecturally and is useful evidence for both UCPH data-protection review and external scrutiny (Ministry, parents). M has prior privacy-by-design and ePrivacy work in Danish digital contexts and can draft the DPIA scaffold; JB and UCPH data-protection sign off.
ADR-006 — Cloud provider for prototype: GCP EU via Multivac
Context. JB’s brief requires EU-hosted, GDPR-compliant infrastructure with no PII leakage to US providers where avoidable, holding up to UCPH data-protection review. Mid-July prototype deadline (~9 weeks) constrains us toward a managed stack.
Options considered.
- GCP EU regions (Vertex AI, Cloud Run, Firestore) — full stack from one vendor with EU residency
- AWS EU (Bedrock + Lambda) — comparable but Bedrock’s Claude routing has US-control concerns
- Azure EU — workable but M less familiar
- Mistral La Plateforme + EU-hosted backend (Scaleway / Hetzner) — most EU-native but more pieces to assemble
- Pure self-host (UCPH server) — only viable if UCPH IT responds with a usable timeline
Decision. GCP EU for prototype, deployed via Multivac (Sunholo’s AI platform — see Platform foundation). Vertex AI in europe-north1 (Finland) or europe-west3 (Frankfurt) — pick one in ADR-007. Single-vendor EU-resident stack: cleanest DPA story for UCPH review.
Consequences.
- Multivac provides managed access to Vertex AI, Ollama hosting, Postgres, and Cloud Storage. AIPLA’s infrastructure surface is Multivac’s surface in the prototype.
- UCPH self-host migration becomes the dual track — see Self-hosting. The application layer (template) is portable; the infrastructure layer (Multivac) is what gets swapped for UCPH equivalents.
- Cost is API-billed, not infrastructure-capex; tracked per-task via the capability-floor evaluation
- M has Vertex AI familiarity (AILANG uses it) — reduces ramp time
ADR-007 — Cloud region
Context. GCP offers several EU regions; the relevant question is which best fits a Danish research project with student data and a sustainability-conscious institution.
Options shortlist.
| Region | Jurisdiction | Power source | Notes |
|---|---|---|---|
| europe-north1 (Finland) | EU · Nordic | 100% renewable (Hamina datacenter) | Strongest carbon-neutrality story; Nordic regional alignment; Vertex AI Gemini available |
| europe-west3 (Frankfurt) | EU · Germany | Mixed | Most-used EU region; first to get new GCP features; broadest service availability |
| europe-west1 (Belgium) | EU | Mixed | Equivalent to Frankfurt on services; less differentiated |
| europe-west4 (Netherlands) | EU | Mixed | Similar to Belgium |
Decision. europe-north1 (Finland) as primary, with europe-west3 (Frankfurt) as fallback if a specific service availability gap appears during build.
Why Finland over Frankfurt:
- Nordic alignment — Danish school data hosted in a Nordic country is a marginally cleaner political story than Germany; UCPH data-protection review and parent/teacher communication both benefit
- 100% renewable power at the Hamina datacenter is a genuine differentiator for an academic project with sustainability stakeholders
- Latency from Copenhagen is comparable to Frankfurt (~20–30ms ballpark either way) — not a meaningful difference for chat-style UX
Frankfurt fallback is reserved for a specific scenario: if a Vertex AI feature AIPLA needs lands in europe-west3 but not europe-north1 during the contract window, swap is cheap (regional config change, no architectural impact).
v0.1 reality (2026-05-20). Gemini 3.5 Flash is not GA in europe-north1 at the time of the Jutland deploy. v0.1 uses Vertex AI global endpoint with a project-level Data Residency policy pinning storage and processing to the EU. Same compliance posture; region-config refinement deferred until europe-north1 reaches GA for the chosen model. Cloud Run service itself remains in europe-north1.
Pending. Confirm with JB / UCPH IT that this composition (global endpoint + EU Data Residency policy) is acceptable. If UCPH data-protection requires strict regional pinning at the endpoint level, fall back to europe-west3 with whichever Gemini variant is GA there.
ADR-008 — Model abstraction / routing layer
Context. Brief implies and the 2026-05-15 conversation confirmed: provider should be a config swap, not a code change. Capability-floor eval determines which model is routed for which task class.
Decision. Use the AI Protocol Platform template’s built-in model router, which already abstracts across Claude, Gemini, OpenAI, and Ollama behind one interface. Adoption follows from ADR-002. The capability-floor eval matrix drives per-task routing config; the four-tier model mix in ADR-003 defines the destinations.
Consequences. Provider swap is a config change, not a code change. The on-device tier (tier 4) needs a thin client-side adapter that the router can fall through to, since on-device dispatch is browser-side. AILANG-Parse-related ML calls run through this same router for their AI path (PDFs and images — see ADR-004).
ADR-009 — Backend stack
Decision. Determined by ADR-002 (adopting the AI Protocol Platform template): Python 3.11+ / FastAPI / Google ADK, packaged via uv, deployed as a Cloud Run service on Multivac in the prototype phase.
Why this stack:
- FastAPI — the template’s HTTP/SSE surface; AG-UI streaming, structured tool calls, and OpenAPI generation come standard
- Google ADK — agent orchestration (sessions, memory, artifacts, eval); aligns with ADR-011 for multimodal handling via Gemini and with the model router in ADR-008
uv— fast Python package management; reproducible builds for handover- Cloud Run — containerised, EU-region available, autoscaling; portable to UCPH Kubernetes per the self-hosting migration
Consequences. Successors operate the same stack the template documents — Multivac and Aitana use it, so the operational patterns are well-trodden. AIPLA’s contribution is the physics-specific skills and MCP extensions on top, not the backend plumbing itself.
ADR-010 — RAG store
Context. Two retrieval needs, with different shapes:
- Text RAG (priority) — teachers upload curriculum extracts, problem sets, lab guides. Bots retrieve at query time. Vector similarity over chunked documents. Must be EU-resident.
- Concept-network storage (stretch — for the student-models work in C3) — per-student concept graphs of nodes and relations, compared to a reference model. Graph traversal, topology comparison.
Options.
| Option | Text RAG fit | Concept-graph fit | Notes |
|---|---|---|---|
| Vertex AI Vector Search | Good | Poor | Managed, EU-region, expensive at low volume |
| ChromaDB on Cloud Run | Good | Poor | Simple, cheap; vector-only |
| pgvector on Postgres | Good | Partial | Relational + vectors in one place; concept graphs via edge tables work but are awkward |
| Neo4j / Memgraph (graph DB) | Poor | Excellent | Native graph traversal; usable via MCP server |
| Dual store — pgvector for text + graph DB for C3 | Best | Best | More moving parts; concept-graph layer only activates if C3 stretch is pursued |
Lean.
- For v1 (text RAG): pgvector on Postgres — combines metadata, ACLs, and vectors in one place; cleanly migratable to UCPH-hosted Postgres later. Wraps as an MCP server per the Core vs extensions framing.
- For C3 stretch (concept networks): add a graph DB (Neo4j AuraDB EU, Memgraph, or pgvector + edge tables if we want a single-store fallback) as a second MCP server. Decision deferred until the C3 scoping note recommends investment — see strands.qmd.
Why this works. Both retrieval surfaces are MCP servers — separate plugins, no template changes needed to swap or add either. UCPH self-host equivalents exist for both: Postgres+pgvector and Memgraph (Apache 2.0) both run on UCPH-managed VMs.
Pending. Confirm teacher-upload volume estimates with JB before final scale sizing. Graph-DB decision waits on C3 scoping investment recommendation.
ADR-011 — Multimodal input handling
Context. Students and teachers upload a mix of formats: photos of hand-drawn free-body diagrams (JPG/PNG), screenshots of Tracker output (PNG), worksheet scans (PDF), CSV / Excel exports from sensors, lesson plans (DOCX/PPTX), email threads (EML). Each of these has different optimal handling — and from a privacy standpoint, the goal is to send as little to a remote model as possible.
Decision. All uploads route through AILANG Parse first (see ADR-004). AILANG Parse selects the right backend per format:
| Format type | Backend | Where the content goes |
|---|---|---|
| DOCX, PPTX, XLSX, ODT, ODP, ODS, CSV, EML, HTML, MD, TEX, EPUB | Deterministic XML parser | Local — content never reaches an LLM |
| PDF, JPG, PNG | AI backend (currently cloud multimodal) | Cloud LLM via the model router — tier 2 (Gemini EU) or tier 1 (Claude) |
The AI-backend layer is swappable: AILANG Parse currently calls cloud multimodal models, but on its roadmap are local OCR models. Local OCR currently underperforms cloud multimodal on physics-shaped content (hand-drawn diagrams, equations) — once it closes the gap, the swap is a config change and the capability-floor eval captures when that’s worth doing.
Code execution (geometry / arithmetic checks, CSV computation, simulation snippets) is separate from input handling and plugs in as an MCP server extension. Likely candidates: Cloud Run Jobs sandbox, or a containerised Python/JS sandbox MCP server. The student-facing bot calls the code-exec MCP server as a tool, not as part of upload processing.
Consequences.
- The privacy story is concrete: 13 of 15 supported upload formats are extracted locally with no LLM involvement. Only image-shaped content goes to a model.
- The AI backend for image extraction is the same as the chat-routing AI — one router, one telemetry surface, one billing surface.
- Migration path is clear: when local OCR matures (AILANG Parse roadmap), the image-extraction path shifts from cloud to local without touching application code.
- Code execution is intentionally a separate extension; we can ship multimodal upload handling in v1 and add code-exec when needed without re-architecting either.
ADR-012 — AILANG ecosystem in AIPLA (utilities, not runtime)
Context. AILANG is M’s research language for AI applications (Apache 2.0, sunholo-data/ailang). It has its own runtime, routing, and model integrations. The question for AIPLA is whether the production stack uses AILANG as the runtime, uses AILANG-built tools as components, or both.
Decision. AIPLA’s production runtime is the AI Protocol Platform template’s Python + Google ADK stack — not AILANG. AIPLA does, however, use specific AILANG-built tools as standalone components, all open-source and individually swappable:
| AILANG-built tool | Role in AIPLA | How it integrates |
|---|---|---|
| AILANG Parse | Document parsing — see ADR-004 | MCP server. Used for its deterministic-XML-parsing privacy properties, not because of its origin. |
| AILANG capability benchmarks | Reference data for the capability-floor eval starting model panel | Eval methodology is independent of AILANG; benchmarks are a published starting reference. |
| Future utilities | Additional MCP servers as appropriate | Each plugs in independently; no AILANG fluency required to maintain the runtime. |
Disclosure (handover & conflict of interest). Two layers:
- AILANG itself is M’s research work — disclosed to JB. AIPLA does not depend on AILANG-the-runtime, so successors do not need to learn AILANG to maintain AIPLA. They maintain Python.
- AILANG-built utilities used in AIPLA are standalone open-source projects with non-AILANG SDKs (Python, JS, Go). Choosing them is no different in principle from choosing any other open-source tool — the disclosure is so JB can see the lineage explicitly.
Consequences.
- Successors maintain AIPLA in Python — no AILANG fluency required
- Each AILANG-built MCP server is swappable (AILANG Parse could be replaced by Docling or MarkItDown, though none currently match its deterministic-XML privacy)
- The eval framework is AIPLA’s own; AILANG’s benchmarks are a starting reference, not a dependency
ADR-013 — Artefact safety: content-review pipeline for generated HTML
Context. AIPLA bots generate two kinds of artefacts that get rendered in front of students:
- Static illustrations (SVG, PNG) — embedded inline via A2UI
- Interactive HTML simulations — embedded via MCP Apps in sandboxed iframes (form factor anchored on AR’s existing GenAI trials — see
notes/2026-05-18-aswin-trials-analysis.md)
Generated HTML is untrusted by default — the model can be prompted, accidentally or intentionally, into emitting external <script src=…>, fetch() calls, or unsafe DOM. A naive “model outputs HTML → iframe renders it” pipeline fails the GDPR/safety bar AIPLA must hold.
Decision. All generated artefacts pass through a server-side review pipeline before they reach the student-facing iframe. Pipeline runs as an MCP server (physics-artefact-render, swappable).
| Check | What it catches |
|---|---|
| HTML parse + tag allow-list | No <script src=…>; only inline scripts permitted |
| Static analysis on inline JS | No fetch(, XMLHttpRequest, WebSocket, eval, no external URLs in string literals |
| Subresource scanning | No <img src="http*://...">, no external CSS/font imports |
| Size limits | Max ~200 KB per artefact; oversized payloads rejected |
| Headless render preview | Confirms the artefact actually displays without runtime errors |
Iframe-side defence in depth:
sandbox="allow-scripts"only — explicitly noallow-same-origin,allow-top-navigation,allow-popups- Content Security Policy:
default-src 'none'; script-src 'unsafe-inline'; style-src 'unsafe-inline' - All AI communication via
postMessageto the parent frame, never directly from iframe to a model API - Resource limits: max DOM size, runtime budget before iframe gets killed
Library bypass. The vetted artefact library (GitHub-hosted for portability) holds teacher-reviewed starting points. Artefacts pulled from the library skip the pipeline once they’ve been approved into it.
Consequences.
- ~1–2s added latency between bot output and student-visible render. Acceptable.
- Teacher-preview-before-publish becomes a UX requirement, not optional.
- The pipeline is one MCP server — successors can extend the rules without touching the backend.
- v1 scope: the pipeline ships when AI-generated artefacts ship. v1 uses a hand-curated sim library (built with AR), so the pipeline isn’t on the August critical path. It’s architectural groundwork for Year-2 when teachers generate artefacts interactively.
ADR-014 — Per-group / per-class budget enforcement
Context. The brief requires central billing (no per-user keys, no per-user Anthropic accounts). That solves who pays; it doesn’t solve how much each cohort can spend. Without per-group / per-class budgets, one class running heavy multimodal queries could drain the project’s monthly cap and block others. AIPLA’s centralised API-key model (no user-provided keys per ADR-013) is itself a security feature — but only if AIPLA can also bound the spend it enables.
Decision. Budgets enforced at the model router at two levels, with skill-specific multipliers:
| Scope | Default budget (placeholder — calibrate post-eval) | Behaviour |
|---|---|---|
| Per group ID (anonymous, per ADR-001) | ~€2 / week | Soft warning at 80%; hard block at 100% |
| Per class | ~€50 / month | Same — applies across all groups in the class |
| Per skill (multiplier) | Image gen ≈ 5×, multimodal ≈ 3×, text ≈ 1× | Multiplies the group/class counter per request |
Counters derive from the OpenTelemetry usage stream already in ADR-005 — no separate counter store needed for v1. Teacher dashboard exposes current usage and time-to-reset; admin can raise/lower limits per class.
Consequences.
- Predictable cost per cohort. JB can tell teachers “your class gets X € / month” with confidence.
- One badly-prompted skill or runaway loop in one class doesn’t drain another’s budget.
- The numbers above are placeholders; real usage data from the eval calibrates them post-Jutland.
- JB’s GCP billing alert (1000 DKK / month) remains the outermost ceiling — these per-cohort caps sit inside it.
- Selling point for UCPH data-protection: no PII to API providers, no per-user keys, predictable spend, no surprise bills.
- Platform support landed 2026-05-19 (Sprint 2.12).
backend/budget/enforcer.pyships aBudgetEnforcerProtocol; the platform consults it before every LLM call and records realised cost afterwards. Per-skill opt-in viatool_configs.budget—identity_key,cost_multiplier,exempt. Block → AG-UIRUN_ERROR+BudgetBannerUI. AIPLA configures rather than builds. - v1 scope: per-class enforcement plus skill multipliers (now available in the platform with no extra work). Two-level (per-class + per-group) enforcement is a thin custom enforcer over the same Protocol if needed; defer unless evidence calls for it.
ADR-015 — Unified multi-surface UI: AI directs the layout
Context. Teachers and students have different needs from AIPLA — teachers configure skills, review artefacts, monitor budgets; students engage with tutors and consume artefacts. The reflex is to build a separate teacher dashboard alongside the student chat. Both that reflex and the opposite reflex (“just stuff everything inline in a scrolling chat”) are wrong:
- Separate apps doubles the v1 surface area
- Chat-as-prime-driver makes dashboards scroll away into history; teachers lose context
Decision. Use a single UI app composed of multiple named surfaces. The AI directs which surface each output goes to. Chat is one surface among several — a conversational signal channel, not the primary UX driver.
A2UI’s surfaceId mechanism is built for this: a surface is “a canvas for components (dialog, sidebar, main view)” with its own component tree and data model. The agent targets a surface by name; updates flow there independently of other surfaces.
AIPLA surface layout (v1):
| Surface | Role | Typical occupants |
|---|---|---|
chat |
Conversational channel | User input, agent text responses, transient acknowledgements |
workspace |
Primary work area — current active view | Live dashboards (class-status), open artefact (physics-sim-builder MCP App), search results |
sidebar |
Persistent context | Current class, current group, quick-switch, available skills |
modal |
Blocking focused task | Artefact-approval review, confirm-class-deletion, login |
Mapping skills to surfaces:
| Teacher task | Skill | Target surface |
|---|---|---|
| See current class usage + budget | class-status |
workspace (live A2UI dashboard) |
| Generate a physics simulation | physics-sim-builder |
workspace (MCP App) |
| Review and publish pending artefacts | review-artefacts |
modal (focus on approval) |
| Create class / generate group IDs | manage-class |
workspace |
| Search anonymised chat logs | chat-log-search |
workspace |
| Configure a problem-set helper bot | problem-set-helper-config |
workspace |
| Switch between classes | (sidebar component) | sidebar |
Why this works:
- One UI app to maintain. No
/teacherroutes, no separate auth flow, no second deploy target. - Dashboards don’t scroll away. A teacher’s
class-statuslives inworkspaceand updates live; the chat conversation continues inchatwithout burying it. - Aligns with the AI Protocol Platform template philosophy — skills emit rich UI via A2UI / MCP Apps. Multi-surface is what A2UI was designed for.
- Auth-driven role filtering. UCPH SSO sessions get teacher-role skills; anonymous group-ID sessions (per ADR-001) get student-role skills. The skill registry filters by role and the AI picks the appropriate target surface per skill.
- Mobile-friendly. Renderer collapses surfaces sensibly on small screens (workspace foregrounded, chat as a tab); A2UI doesn’t care about layout — that’s the renderer’s job.
Consequences.
- Significant v1 scope reduction: one auth path, one UI deploy, one set of components to test.
- The agent must learn to pick the right surface per skill — surface choice is part of skill metadata, defaulted in the skill template, overridable by the agent if context demands (e.g., a small ack goes to
chat, the full result goes toworkspace). - Renderer (frontend) must support the four surfaces and route components from A2UI’s
surfaceIdto the right region. The AI Protocol Platform template’s reference frontend already ships multi-surface support (A2UISurfaceMount,SurfaceRegistry) — verified 2026-05-19. - Mobile renderer collapses surfaces gracefully.
- v1 scope: chat-primary with a
workspacesurface for embedded sims. Full four-surface layout (sidebar,modal) is Year-2; the architecture supports it but v1 doesn’t need it shipped. Over-deliver if time permits.
Live updates without new chat messages
Both A2UI and MCP Apps support server-pushed updates to existing rendered components — and updates flow to the target surface independently of the chat surface.
- A2UI binds data values to component paths. The agent emits a
dataModelUpdateevent (e.g.,surfaceId: "workspace", path/class/budget_used = 42) and only the affected component patches (Agentic UI write-up) - MCP Apps receive
ui/toolResultmessages overpostMessagewhenever the host’s tool data changes (MCP Apps spec)
Practical pattern for AIPLA — which skills benefit from live updates vs one-shot:
| Pattern | Skills | Why |
|---|---|---|
Live-update on workspace (data-bound, server pushes patches) |
class-status, review-artefacts, chat-log-search |
Background state changes as students use the system — teacher shouldn’t have to refresh |
One-shot on workspace (generate → render → publish) |
physics-sim-builder, illustration-builder, manage-class, misconception-pair |
Single user action completes a discrete task |
For live-update skills the rendered component stays in workspace and continues receiving patches; the chat surface continues underneath without burying it. The teacher’s view of class status is always at hand, not scrolled away.
Workbench → chat narration: “human tool-use cards”
The reverse direction also matters: when a student manipulates a workbench artefact (moves a slider, switches a preset, checks a self-assessment item), the chat surface should reflect that action so the tutor’s responses can reference it. This is the inverse of dashboard live-updates — student-side actions narrated into the conversation.
The pattern that landed in v0.1 production (2026-05-21) is human tool-use cards: a sandboxed iframe MCP App emits structured postMessage events on slider-end and preset-change; the host renders each as a small inline card in chat (e.g., “Justerede v₀ til 17.5 m/s ✓”); the agent’s prompt actively references those cards plus the current iframe-context state. The iframe-message harness is a standard host-side pattern — any future MCP-App artefact gets this for free, with sandbox auth via e.source (not just origin) and slider debouncing built in.
Why this matters: the workbench is no longer silent to the chat. Student exploration on the sim becomes part of the conversation timeline, and the tutor’s Socratic prompts can substantively reference what the student actually did (“what do you notice about how this much lower gravity on the Moon affects the flight path?”) — not just respond to questions in the abstract.
Architectural principle: the artefact is the sim, not the lesson
Established during the LED Planck integration (2026-05-27). The initial attempt wrapped the full standalone lab — checklist, data table, Planck calculator, error display, instructions — inside the workbench iframe. This was wrong.
The rule: a workbench artefact contains the interactive simulation element only. Everything else belongs to the platform:
| What | Where |
|---|---|
| Interactive simulation / virtual lab core | Workbench artefact (sandboxed iframe) |
| Instructions and task framing | Tutor system prompt |
| Data recording and results | Lab notebook workbench type (v1.1) or tutor-guided freeform |
| AI hints and Socratic questions | Tutor |
| Adaptive quizzes | Tutor |
| Self-assessment checklist | Platform progress component |
This is the Boldkast model: the Boldkast artefact is the physics simulator only — sliders, trajectory canvas, readouts. The tutor is the lesson. When onboarding an external teaching tool (jitt.dk apps, KineBot, new lab HTML), extract the sim core and strip the rest; the tutor prompt replaces what was stripped.
The rule holds for all five workbench types (App, Drawing board, Experiment tool, Video analysis, Lab notebook). The artefact produces state the tutor can read; the tutor makes the pedagogical use of that state.