# DocWright Architecture

DocWright is a self-hostable, FOSS collaborative document-authoring and
publishing platform: a cross between Overleaf, Google Docs, and a desktop DTP
program, with **sovereign, Git-backed files**, print-grade PDF, LaTeX-quality
math, and BIM awareness. It runs on **PHP 8.3+ (class-first) · Python 3.11+ ·
PostgreSQL 16+**, behind nginx, with **no Docker** and **no full-framework
lock-in**, and ships under **GPL-3.0-or-later**.

This document describes the system as it actually exists in the repository, and
is honest about what is implemented versus planned (see
[Implementation status](#implementation-status)).

---

## 1. The central decision: one canonical model, many backends

Everything hangs off a single design choice: **one canonical document model
with multiple render backends.** The document is stored once as a strict,
versioned AST (`spec/document.schema.json`); every editor reads and writes that
tree, and every export (PDF, DOCX, web) derives from it. This is what reconciles
the otherwise-conflicting requirements — true WYSIWYG *and* LaTeX quality *and*
faithful DOCX *and* portable sovereign files — without any one of them
compromising the others.

The unifying trick for print fidelity is **CSS Paged Media**: a single
stylesheet (`styles/theme.css` + `styles/master-pages.css` in the bundle) drives
both the in-browser paged preview and the print PDF, so what the user sees on
screen is what the PDF prints.

## 2. The two-runtime split: PHP orchestrates, Python computes

- **PHP** owns the request/response world: authentication, sessions, RBAC,
  projects, sharing, API keys, webhooks, job orchestration, and the REST API. It
  is class-first (typed properties, `strict_types`, PSR-12, constructor DI) and
  ships **its own** PSR-4 autoloader, DI container, router, and migrator — the
  core carries **no hard Composer dependency** (see `THIRD_PARTY.md`).
- **Python** owns the heavy lifting that is best served by the FOSS CLI
  ecosystem: PDF rendering (headless Chromium + Paged.js primary; WeasyPrint
  fallback; Tectonic for LaTeX), DOCX via Pandoc/LibreOffice, image proxies via
  libvips, PDF imposition via pikepdf/pypdf, BIM parsing via IfcOpenShell, and
  transliteration/spell/grammar. These run as **queued CLI jobs** with
  `nice`/`ionice`, content-addressed caching, and per-user concurrency caps.
- A small **Python realtime service** (pycrdt over WebSockets) is the planned
  home for CRDT collaboration, because PHP is a poor fit for many long-lived
  sockets.

## 3. System diagram (spec §4)

```
                    ┌─────────────────────────── Browser (SPA) ───────────────────────────┐
                    │  WYSIWYG editor (ProseMirror/TipTap)   LaTeX source mode (CodeMirror6)│
                    │  Math input (MathLive)  Paged.js live pixel-accurate paged preview    │
                    │  Yjs CRDT client  •  low-res image proxies  •  zoom/impose UI          │
                    └───────▲───────────────────────▲───────────────────────▲───────────────┘
                            │ REST/JSON + API keys   │ WebSocket (CRDT sync)  │ static web output
                            │                        │                        │
        ┌───────────────────┴──────────┐   ┌─────────┴──────────┐             │
        │  PHP app (class-first)        │   │ Python realtime svc │            │
        │  Auth, RBAC, users, groups,   │   │ (pycrdt/Yrs +       │            │
        │  quotas & priority, projects, │   │  websockets)        │            │
        │  sharing, API keys, webhooks, │   └─────────┬──────────┘            │
        │  job orchestration, REST API  │             │ snapshots             │
        └───────┬───────────────┬───────┘             │                       │
                │               │ enqueue (priority)  │                       │
                │               ▼                      ▼                       │
                │       ┌──────────────────────────────────────┐              │
                │       │ PostgreSQL 16                          │             │
                │       │ • app data (users, ACL, metadata…)     │             │
                │       │ • job queue (FOR UPDATE SKIP LOCKED,   │             │
                │       │   priority column, per-user limits)    │             │
                │       └───────────────▲──────────────────────┘              │
                │                        │ claim jobs                          │
                ▼                        │                                     ▼
        ┌───────────────────────────────┴─────────────────────────────────────────┐
        │  Python worker pool (CLI tools, nice/ionice, cached, content-addressed)   │
        │  • headless Chromium + Paged.js    → PRIMARY hi-print PDF (pixel-parity)    │
        │  • WeasyPrint (Python)             → lightweight/fallback PDF from same CSS  │
        │  • Tectonic (LaTeX)                → TeX-grade PDF for math/academic         │
        │  • Pandoc + LibreOffice headless   → DOCX (export & import)                  │
        │  • libvips/pyvips                  → hi-res original + low-res web proxy     │
        │  • pikepdf/pypdf                  → booklet imposition, poster tiling, n-up │
        │  • IfcOpenShell                    → BIM/IFC parse, element GUID linking      │
        │  • Aksharamukha/indic-translit/Varnam → Indian-language transliteration       │
        │  • Hunspell + LanguageTool (self-hosted) → spell & grammar                    │
        │  • Graphviz/Mermaid CLI            → smart-art/diagram rendering              │
        └───────────────────────────────────────────────────────────────────────────┘
                                            │ reads/writes
                                            ▼
        ┌───────────────────────────────────────────────────────────────────────────┐
        │  Project store on disk = ONE Git repo per project (the "file", sovereign)  │
        │  document.json (AST) │ *.tex (if LaTeX mode) │ assets/ (content-addressed)   │
        │  styles/ (CSS tokens + master pages) │ refs.bib / CSL │ bim/ links │ meta   │
        └───────────────────────────────────────────────────────────────────────────┘
```

## 4. The plugin core (thin core + first-party plugins)

DocWright is a **small, stable kernel plus composable plugins.** The kernel owns
only what must be central; every feature area is either a bundled first-party
provider or a plugin, registered through a typed extension-point registry.

The kernel lives in `app/src/Kernel/` and is semantically versioned
(`Kernel::API_VERSION`, currently `1.0.0`). Its stable surface:

| Kernel component | File | Role |
|---|---|---|
| `Kernel` | `Kernel.php` | Owns the container, event bus, hooks, extension registry, config, logger, router; boots providers with fault containment |
| `Container` | `Container.php` | PSR-11-style DI: bind / singleton / instance / alias / tag, plus reflection autowiring and a circular-dependency guard |
| `EventBus` | `EventBus.php` | Synchronous + deferred domain events (`document.saved`, `render.completed`, …) and global observers (audit/tracing) |
| `HookRegistry` | `HookRegistry.php` | WordPress-style **actions** (do-on-event) and **filters** (transform-a-value) so plugins deepen a subsystem without forking it |
| `ExtensionPoints` | `ExtensionPoints.php` | Typed registry of named extension points; every contribution is tagged with its plugin id so a failing/disabled plugin can be cleanly withdrawn |
| `ServiceProvider` | `ServiceProvider.php` | The `register()` / `boot()` unit every subsystem and plugin implements |
| Contracts | `Kernel/Contract/*.php` | Interfaces a contribution must satisfy: `RenderBackend`, `ExportFormat`, `AuthProvider`, `ApiModule`, `WorkflowPack` |

`app/bootstrap.php` builds the kernel, registers the first-party providers, then
loads discovered plugins (`app/src/Plugins/PluginManager.php`) in dependency
order on top of the same core. The **core-vs-plugin boundary** and the full
extension-point catalogue are documented in [`PLUGINS.md`](PLUGINS.md).

First-party providers wired in `bootstrap.php` (each guarded by `class_exists`,
so a subsystem that is not yet built is simply skipped):

```
Support\CoreServiceProvider          Auth\AuthServiceProvider
Session\SessionServiceProvider       Rbac\RbacServiceProvider
Projects\ProjectServiceProvider      Jobs\JobServiceProvider*
Workflow\WorkflowServiceProvider*    Comments\CommentServiceProvider*
Sharing\SharingServiceProvider*      Feedback\FeedbackServiceProvider*
Api\ApiServiceProvider*
```

\* declared in the bootstrap list but not yet present as classes — see
[Implementation status](#implementation-status).

## 5. The sovereign store: one Git repo per project

Each project **is** a Git repository under `DOCWRIGHT_DATA_DIR` (default
`/opt/docwright/data`). The PostgreSQL row only *indexes* the on-disk bundle —
the bundle is the source of truth, portable, and readable without DocWright.

`app/src/Git/GitStore.php` wraps the `git` CLI (argv arrays, never a shell
string, so there is no shell-injection surface; libgit2 bindings could replace
it transparently). It provides `init`, `commitAll`, `tag`, `log`, `show`,
`diff`, `restore`, and `archiveZip`.

**Every save is a commit** (`ProjectService::saveDocument`): the AST is
validated against `spec/document.schema.json`, written to disk, committed
(autosave → `Autosave`; named version → `Version: <label>` plus an annotated
git tag), and indexed as a row in `versions`. Revision history, diff, restore,
and the export-zip all come from this one primitive. A **restore** checks the
tree out at a revision and commits it forward (history is never rewritten).

Canonical bundle layout (per spec §5):

```
<project>.git/
├── docwright.json        project manifest (title, type, mode, template, locale)
├── document.json         the canonical AST (spec/document.schema.json)
├── content/*.tex         present when LaTeX mode is used
├── styles/               theme.css (tokens) + master-pages.css (@page rules)
├── assets/               content-addressed originals + assets/proxies/*.webp
├── refs/                 library.bib / CSL styles (never secrets)
├── bim/links.json        document anchors → IFC GlobalId + cached metadata
└── .docwright/           git-ignored local caches (render output, hashes)
```

## 6. The Postgres priority job queue

Heavy work is dispatched through a **PostgreSQL-backed priority queue** — no
Redis, to keep the footprint minimal and the stand-alone install
dependency-free. The `jobs` table (migration `0004_jobs.sql`) is consumed with
`FOR UPDATE SKIP LOCKED`, ordered by `priority` then age, with:

- a `priority` column (lower = served first) fed by per-user/-group tiers,
- `payload_hash` for **content-addressed caching** (identical renders reuse
  output),
- an `idempotency_key` unique index for safe API retries (`Idempotency-Key`),
- per-owner in-flight caps (`jobs_owner_inflight_idx`) to enforce fair use so no
  single user can monopolise the workers.

`app/src/Jobs/` contains `JobQueue`, `JobHandler`, and `RenderService` classes;
the worker pool and the provider that wires them are not yet built (planned).

## 7. Isolation model (both directions)

DocWright must not affect, and must not be affected by, anything else on the
host. This is achieved with OS-native mechanisms, never containers:

- a dedicated system user (`docwright`, no login);
- a **dedicated PostgreSQL database + role**, reached over the local unix socket
  with peer auth (passwordless DSN `pgsql:host=/var/run/postgresql;dbname=docwright`);
- a **dedicated PHP-FPM master** (a full standalone FPM config, not a pool
  dropped into the system php-fpm) on its own unix socket with bounded
  `pm.max_children` and resource limits;
- **namespaced systemd units** (`docwright-web`, and the planned
  `docwright-realtime`, `docwright-worker@`) on **configurable** ports — nothing
  hard-codes a well-known port;
- project-local dependencies only: a Python venv and bundled `vendor-bin/`
  binaries under the prefix; no global npm/pip installs at runtime;
- **no global config mutation** — `ops/` ships nginx / php-fpm / systemd
  templates that the installer renders into the prefix and enables; it never
  rewrites shared files.

See [`INSTALL.md`](INSTALL.md) for the concrete install flow.

---

## Data model (ERD overview)

All objects live in the app's own database (`migrations/0001`–`0010`). Grouped
by subsystem:

### Kernel / core (`0001_core.sql`)

```
plugins(id, version, enabled, capabilities, config)          — plugin registry (§4A)
audit_log(id, at, actor_id, actor_kind, action, object_*, ip)— immutable security log
secrets(id, owner_id → users, name, ciphertext)              — AEAD-encrypted creds
settings(key, value)                                         — instance-wide settings
```

### Identity, RBAC, sessions (`0002_identity.sql`)

```
users(id, username, email, password_hash[argon2id], totp_*, priority_tier, ...)
groups(id, name, priority_tier)          user_groups(user_id, group_id)
roles(id, name, is_system)               permissions(id, name, plugin_id)
role_permissions(role_id, permission_id)
user_roles(user_id, role_id)             group_roles(group_id, role_id)

sessions(id, user_id, idle_expires_at, absolute_expires_at, ip, ua, stepup_at, revoked)
refresh_tokens(id, user_id, token_hash, expires_at, used_at, revoked, parent_id)
quotas(subject_type, subject_id, max_concurrency, priority_tier, storage_mb, render_per_day)
usage_counters(subject_type, subject_id, day, renders, storage_mb, in_flight)
user_preferences(user_id, prefs)         — keymap overrides, format-painter defaults
```

### Projects & versions (`0003_projects.sql`)

```
projects(id, slug, title, doc_type, edit_mode, template_id, repo_path,
         owner_id → users, locale, workflow_state, archived)
project_members(project_id, user_id, role)   — per-project role atop global RBAC
templates(id, name, doc_type, skeleton_path, builtin, created_by)
versions(id, project_id, git_ref, label, kind[autosave|named|signoff],
         author_id, signer_id, signoff_hash)  — indexes Git history
```

### Jobs (`0004_jobs.sql`)

```
jobs(id, type, status, priority, owner_id, project_id, payload, payload_hash,
     result, progress, attempts, idempotency_key, locked_by, available_at, ...)
job_logs(id, job_id, at, level, message)
```

### Sharing (`0005_sharing.sql`)

```
share_links(id, token, project_id, version_ref, scope[view|comment|edit],
            allow_anonymous, expires_at, max_uses, uses, revoked)
```

### Comments & RFI threads (`0006_comments.sql`)

```
comments(id, project_id, parent_id → comments, thread_root_id, anchor,
         author_id, body, kind[comment|markup|rfi|action_item],
         state[open|responded|resolved|closed],
         assignee_user_id, assignee_group_id, response_restricted, is_private, due_at)
comment_participants(comment_id, user_id)
```

### Workflow engine (`0007_workflow.sql`)

```
workflow_definitions(id, version, object_type, initial_state, definition[jsonb],
                     source[builtin|admin|plugin:<id>], active)   — states/transitions as data
workflow_instances(id, definition_id, object_type, object_id, current_state,
                   assignee_user_id, assignee_group_id, due_at)
workflow_transitions_log(id, instance_id, transition_id, from_state, to_state,
                         actor_id, reason, at)                    — immutable audit
```

### API keys & webhooks (`0008_api.sql`)

```
api_keys(id, owner_id, name, prefix, key_hash, scopes, rate_limit_per_min,
         ip_allowlist, expires_at, last_used_at, revoked)
webhooks(id, owner_id, url, secret[HMAC], event_types, active)
webhook_deliveries(id, webhook_id, event_type, payload, status, attempts,
                   next_attempt_at, response_code)
```

### Feedback & suggestions (`0009_feedback.sql`)

```
feedback(id, kind, message, context, submitter_id)
suggestions(id, title, body, status[new|triaged|planned|in_progress|done|declined],
            priority, tags, votes, submitter_id)
suggestion_votes(suggestion_id, user_id)
suggestion_activity(id, suggestion_id, actor_id, action, detail, at)
```

### Bootstrap data (`0010_bootstrap_data.sql`)

Idempotent seed of the five system roles (`admin`, `manager`, `editor`,
`reviewer`, `viewer`), the RBAC permission catalogue and role→permission grants
(deny-by-default; every capability is an explicit grant), the three built-in
workflow definitions (`document-control`, `comment-rfi`, `suggestion`), and the
six built-in template records.

---

## Implementation status

Honest snapshot of what exists in code today, mapped to the spec's milestones
(§20). The database schema (all migrations) and the extension-point catalogue
are complete and ahead of the PHP wiring; several subsystems have their tables
and extension seams defined but their providers/workers not yet built.

| Area | Milestone | Status | Evidence in repo |
|---|---|---|---|
| Plugin kernel (DI, events, hooks, extension points) | M0 | **Implemented & live** | `app/src/Kernel/*`, `PluginManager`, `example-hello` |
| Install / ops / doctor / migrations | M0 | **Implemented & live** | `install.sh`, `uninstall.sh`, `Makefile`, `ops/`, `Console/Doctor.php`, `migrations/*` |
| Example plugin proving every seam | M0 | **Implemented** | `plugins/example-hello/` (php + js + manifest + migrations dir) |
| Identity / auth (password, Argon2id, TOTP) | M1 | **Implemented & live** | `app/src/Auth/*`, `PasswordAuthProvider` on `auth.providers` |
| Sessions (Postgres store, idle/absolute timeouts, step-up, log-out-everywhere) | M1 | **Implemented & live** | `app/src/Session/*`, `sessions`/`refresh_tokens` tables |
| RBAC + quotas/priority tiers | M1 | **Implemented & live** | `app/src/Rbac/*`, `0002`/`0010` |
| Projects, sovereign Git store, save-as-commit, history, restore, export-zip | M2 | **Implemented & live** | `app/src/Projects/*`, `app/src/Git/GitStore.php` |
| Baseline editor + paged preview + command palette/shortcuts/format painter | M2/M3 | **Implemented (baseline)** | `web/js/editor.js`, `web/views/app/*` — dependency-free; rich ProseMirror/TipTap/CodeMirror/MathLive not yet vendored (`web/js/vendor/` empty) |
| Document AST + JSON Schema | M2 | **Implemented** | `spec/document.schema.json` |
| Priority job queue + worker pool | M4 | **Implemented & live** | `app/src/Jobs/*`, `JobQueue` (FOR UPDATE SKIP LOCKED, per-user concurrency, content-addressed cache), `docwright-worker@` units running |
| Render: web/static-HTML export | M4 | **Implemented & live** | `Jobs/Handler/WebExportHandler` — zero extra binaries |
| Render: PDF (Chromium primary / WeasyPrint fallback), DOCX (Pandoc) | M4/M7 | **Implemented; binaries pending** | `Jobs/Backend/*`, `Jobs/Handler/{Pdf,Docx}*` — report cleanly until `bootstrap-vendor.sh` vendors Chromium/WeasyPrint |
| Workflow engine | M7A | **Implemented & live** | `app/src/Workflow/*`; RBAC/assignee/step-up gated + audited; drives Draft→In Review→Approved→Issued |
| API platform (OpenAPI 3.1, scoped keys, async jobs, AST/operations API for AI) | M9 | **Implemented & live** | `app/src/Api/*`, `app/src/ApiKeys/*`, `/api/openapi.json`, `/api/docs`, JSON-Patch ops w/ ETag/If-Match |
| Sharing (links, QR, expiry, scope∩RBAC) | M9 | **Implemented & live** | `app/src/Sharing/*`, `/s/{token}`, client QR via vendored `web/js/vendor/qrcode.js` |
| Feedback & suggestion tracker | M9/16A | **Implemented & live** | `app/src/Feedback/*`, kanban board, upvote, RBAC triage |
| Collaboration / realtime CRDT | M8 | **Planned** | `realtime/` empty; `AsyncAPI` seam only |
| Comments / directed RFI threads | M8 | **Schema only** | `0006`; `app/src/Comments/` empty (workflow `comment-rfi` defined) |
| Rich editor libs (ProseMirror/CodeMirror/MathLive/Paged.js) | M3/M6 | **Baseline live; libs pending** | `web/js/editor.js` dependency-free baseline; heavy libs vendored at install |
| Webhooks (HMAC-signed) / AsyncAPI | M9 | **Schema + event bus** | `0008`, `webhook.events` extension point + events fired; delivery worker pending |
| Indian languages / spell / grammar | M10 | **Planned** | `translit.schemes` / `spellgrammar.engines` extension points; `workers/translit`, `workers/spellgrammar` empty |
| BIM / IFC | M11 | **Planned** | `bim.adapters` extension point; `bim/links.json` bundle slot; `workers/bim` empty |

**Live deployment:** M0–M2 plus the job queue/render pipeline (web export), the
API platform (incl. the AI operations API), the workflow engine, sharing, and
the feedback tracker are deployed and running at <https://docwright.jsp.net.in>.

Where a subsystem is "schema only", the tables, seed data, and kernel extension
points are in place so the feature can be delivered as a first-party provider or
plugin without a core change — that is the whole point of the thin-core design.
