Documents the two-container setup, volume/auth gotchas, nginx SSL configuration, control center startup sequence, and usage connector source status. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
190 lines
11 KiB
Markdown
190 lines
11 KiB
Markdown
# petersweb-infra/nixos — CLAUDE.md
|
|
|
|
## What this repo is
|
|
|
|
NixOS configuration for a single Hetzner server ("mainframe") running Philip Peterson's personal/Quine Foundation infrastructure. One machine, one flake configuration: `nixosConfigurations.mainframe`.
|
|
|
|
## Applying changes
|
|
|
|
```bash
|
|
./apply.sh # git pull + nixos-rebuild switch --flake .#mainframe
|
|
# or manually:
|
|
nixos-rebuild switch --flake /root/petersweb-infra/nixos#mainframe
|
|
```
|
|
|
|
## File layout
|
|
|
|
| Path | Purpose |
|
|
|---|---|
|
|
| `flake.nix` | Single flake, defines `nixosConfigurations.mainframe` |
|
|
| `hetzner.nix` | Hardware config: GRUB on `/dev/sda`, static networking, openssh |
|
|
| `linux.nix` | Main system config: services, secrets, docker containers, ACME certs |
|
|
| `nginx.nix` | Nginx virtual hosts and reverse proxies |
|
|
| `firewall.nix` | Open TCP ports |
|
|
| `disk-config.nix` | disko disk layout |
|
|
| `cloned_repos/` | `pullomatic` configs for auto-pulling git repos to `/etc/pullomatic/` |
|
|
| `arion/` | Arion (docker-compose-like) for Forgejo |
|
|
| `arion-riverside/` | Arion for the Riverside service |
|
|
| `pullomatic/` | Rust tool that watches git remotes and pulls on a schedule |
|
|
| `invoke-ddns/` | Python DDNS updater for NearlyFreeSpeech DNS |
|
|
| `secrets/` | agenix-encrypted secrets |
|
|
| `keys/` | SSH public keys used as age recipients |
|
|
| `system/` | User definitions and home-manager config |
|
|
| `pdxdestiny/` | Static site files for pdxdestiny.com |
|
|
| `vnc-desktop/` | Dockerfile + build scripts for the KDE Plasma VNC desktop container |
|
|
|
|
## Secrets (agenix)
|
|
|
|
Secrets live in `secrets/*.age`. They are encrypted with the key in `keys/mainframe.pub` (which is identical to `/root/.ssh/id_rsa_nix.pub` on the server).
|
|
|
|
**Important:** Agenix uses three identity paths for decryption (see activation script):
|
|
1. `/etc/ssh/ssh_host_rsa_key`
|
|
2. `/etc/ssh/ssh_host_ed25519_key`
|
|
3. `/root/.ssh/id_rsa_nix` ← **this is the actual working key**
|
|
|
|
The decrypted secrets land at `/run/agenix/<name>` at boot.
|
|
|
|
### Secret format matters
|
|
|
|
The NixOS `gitea-actions-runner` module reads the token via `EnvironmentFile=`, so the secret file must be in `KEY=VALUE` format:
|
|
|
|
- `forgejo-runner-token.age` → must contain `TOKEN=<raw_token>` (not just the raw token)
|
|
- `nearlyfreespeech.age` → contains `NEARLYFREESPEECH_API_KEY=...` and `NEARLYFREESPEECH_LOGIN=...`
|
|
- `webdav.age` → contains `WEBDAV_PASSWORD=...`
|
|
- `anthropic-api-key.age` → contains `ANTHROPIC_API_KEY=...`
|
|
- `postmark.age` → contains `POSTMARK_SERVER_TOKEN=...`
|
|
|
|
### Re-encrypting a secret
|
|
|
|
```bash
|
|
# Encrypt new content for the mainframe key
|
|
printf "TOKEN=newvalue\n" | nix run nixpkgs#age -- \
|
|
-r "$(cat /root/petersweb-infra/nixos/keys/mainframe.pub)" \
|
|
-o /root/petersweb-infra/nixos/secrets/forgejo-runner-token.age
|
|
|
|
# Verify it decrypts correctly
|
|
nix run nixpkgs#age -- -d -i /root/.ssh/id_rsa_nix \
|
|
/root/petersweb-infra/nixos/secrets/forgejo-runner-token.age
|
|
```
|
|
|
|
Note: `secrets/default.nix` is the agenix recipients file. Agenix looks for `secrets.nix` by default — to use the CLI with this repo's `default.nix`, you'd need a symlink or pass the path manually. Use `age` directly instead (as above).
|
|
|
|
## Key services
|
|
|
|
| Service | Description |
|
|
|---|---|
|
|
| `gitea-runner-ubuntu.service` | Forgejo (Gitea) Actions CI runner, uses docker images |
|
|
| `forgejo-arion.service` | Forgejo itself, run via Arion/Podman |
|
|
| `riverside-arion.service` | Riverside app, run via Arion/Docker |
|
|
| `podman-navidrome.service` | Navidrome music server on port 4533 |
|
|
| `podman-nextcloud.service` | Nextcloud/SSH container on port 8087 |
|
|
| `podman-sync.io.service` | sync.io app on port 9090 |
|
|
| `podman-blog-quine.service` | Blog on port 3010 |
|
|
| `podman-coldairnetworks.service` | Cold Air Networks site on port 3012 |
|
|
| `podman-vnc-desktop.service` | KDE Plasma desktop, noVNC on port 6080 (localhost only) |
|
|
| `build-vnc-image.service` | Builds the VNC desktop image from `vnc-desktop/`; runs before `podman-vnc-desktop` |
|
|
| nginx | Reverse proxy + ACME certs for multiple domains |
|
|
|
|
## Virtualisation
|
|
|
|
- **Podman** is used for all OCI containers (`virtualisation.oci-containers.backend = "podman"`) — navidrome, nextcloud, blog, VNC desktop, etc. — and for Forgejo via Arion.
|
|
- **Docker** is still present for the Riverside Arion stack.
|
|
- `DOCKER_HOST` for the gitea-runner is set to `unix:///run/podman/podman.sock`.
|
|
- The gitea-runner runs docker images for CI jobs, so the `gitea-runner` user is in the `docker` and `podman` supplementary groups.
|
|
|
|
## VNC desktop
|
|
|
|
`podman-vnc-desktop.service` runs a KDE Plasma desktop inside a container, accessible via noVNC at `localhost:6080` (reverse-proxied by nginx). The image is built locally — no registry involved.
|
|
|
|
- **Image source**: `vnc-desktop/Dockerfile` (Ubuntu 24.04, TigerVNC, KDE, Firefox, patched Discover)
|
|
- **Auto-rebuild**: `build-vnc-image.service` runs on boot and on `nixos-rebuild switch` whenever `vnc-desktop/` changes. The trigger is `vncContext = builtins.path { path = ./vnc-desktop; }` — a Nix store path that invalidates when any file in the directory changes.
|
|
- **Auto-restart**: `podman-vnc-desktop.service` has `restartTriggers = [ vncContext ]`, so the container restarts automatically after a rebuild during `nixos-rebuild switch`.
|
|
- **Secrets**: `VNC_PASSWORD` and `ROOT_PASSWORD` come from `age.secrets.vnc-password`.
|
|
- **Discover logging**: `vnc-desktop/discover-logging/` contains a build-time patch (`patch.py`) that instruments `PKTransaction.cpp` with `qWarning` calls to diagnose hanging installs. Logs visible via `podman logs vnc-desktop`.
|
|
|
|
## Networking / DNS
|
|
|
|
- Dynamic DNS via `invoke-ddns` (NearlyFreeSpeech provider).
|
|
- ACME certs issued via DNS challenge for `philippeterson.com` and `webdav.philippeterson.com`.
|
|
- Forgejo accessible on ports 3000 (HTTP) and 2200 (SSH).
|
|
|
|
## OpenClaw
|
|
|
|
OpenClaw runs as two Arion/Podman containers defined in `arion-openclaw/arion-compose.nix`, both using `network_mode = "host"` so they share the host's `127.0.0.1`.
|
|
|
|
| Container | Name | Port | Role |
|
|
|---|---|---|---|
|
|
| `openclaw-gateway` | `node:22-alpine` | 18789 (WebSocket) | OpenClaw Gateway (`openclaw@latest`) |
|
|
| `openclaw` | `node:22-alpine` | 4310 (HTTP) | OpenClaw Control Center (SSR UI) |
|
|
|
|
### Volumes and paths
|
|
|
|
| Host path | Container path | Notes |
|
|
|---|---|---|
|
|
| `/var/openclaw/gateway` | `/app` (gateway), `/gateway` (app) | npm install location for `openclaw` package |
|
|
| `/var/openclaw/app` | `/app` | Control center git clone + runtime files |
|
|
| `/root/.openclaw` | `/root/.openclaw` | OpenClaw home; shared **read-write** by both containers |
|
|
|
|
`/root/.openclaw` must be **writable** in the app container (not `:ro`) — the CLI writes state files at startup and connection probes fail with EROFS otherwise.
|
|
|
|
The CLI's effective state dir is `/root/.openclaw/.openclaw/` (double-nested: the CLI treats `OPENCLAW_HOME` as HOME and appends `.openclaw/` internally).
|
|
|
|
### Auth and connectivity
|
|
|
|
- Gateway runs with `--auth none --dev`. In `--auth none` mode, clients must still present either a device identity (challenge-response) or any token via `OPENCLAW_GATEWAY_TOKEN`.
|
|
- `OPENCLAW_GATEWAY_TOKEN=openclaw-local-dev` is set in the app container — this lets the CLI probes connect immediately without waiting for device auto-approval.
|
|
- Device identity lives at `/root/.openclaw/.openclaw/identity/device.json`. In `--dev` mode the gateway auto-approves the local device after first contact.
|
|
- The control center calls `openclaw status --json` and `openclaw gateway status --json` as CLI subprocesses (not via WebSocket directly). The binary path is set via `OPENCLAW_BIN_PATH=/gateway/node_modules/.bin/openclaw`.
|
|
|
|
### nginx
|
|
|
|
`claw.quineglobal.com` is proxied to `127.0.0.1:4310`. Key settings:
|
|
- `forceSSL = false; addSSL = true` — Cloudflare Flexible SSL sends plain HTTP to origin; `forceSSL = true` would create a redirect loop.
|
|
- `basicAuthFile = "/var/openclaw/htpasswd"` — credentials: `ironmagma / Nargism333`.
|
|
- WebSocket upgrade headers are set (`Upgrade`, `Connection: upgrade`) so the control center's live-update SSE works through the proxy.
|
|
|
|
### Control center startup sequence
|
|
|
|
The app container startup script (in `arion-compose.nix`):
|
|
1. `apk add git`
|
|
2. Clones `https://github.com/TianyiDataScience/openclaw-control-center.git` to `/app/repo` (once)
|
|
3. Patches `src/ui/server.ts` and `src/runtime/ui-preferences.ts` via `sed` to default language to `"en"` instead of `"zh"`
|
|
4. `npm install && npm run build && npm run dev:ui`
|
|
|
|
### Usage connector sources
|
|
|
|
The Settings → Usage panel tracks 6 data sources. Current status:
|
|
|
|
| Source | Status | How to connect |
|
|
|---|---|---|
|
|
| Context capacity | Connected | `runtime/model-context-catalog.json` exists at `/var/openclaw/app/repo/runtime/` |
|
|
| Provider attribution | Connected | Derived from context catalog |
|
|
| Digest history | Partial (auto) | Builds up as the monitor runs over time |
|
|
| Request counts | Not connected | Needs real AI requests through the gateway |
|
|
| Budget limit | Not connected | Add cost thresholds to agent config |
|
|
| Subscription usage | Not connected | Add `runtime/subscription-snapshot.json` or provider billing snapshot |
|
|
|
|
The `model-context-catalog.json` format:
|
|
```json
|
|
{ "models": [{ "match": "gpt-5.5", "contextWindowTokens": 200000, "provider": "openai" }, ...] }
|
|
```
|
|
`match` is compared case-insensitively against the model name reported by the runtime.
|
|
|
|
### Restarting / rebuilding
|
|
|
|
After changing `arion-compose.nix`, a `nixos-rebuild switch` regenerates the compose YAML but **does not recreate running containers**. You must force recreation:
|
|
```bash
|
|
podman rm -f openclaw # or openclaw-gateway
|
|
systemctl restart arion-openclaw
|
|
```
|
|
|
|
### Cloudflare SSL gotcha
|
|
|
|
This server sits behind Cloudflare in **Flexible** mode (Cloudflare → origin over plain HTTP). Any `nginx.nix` virtualHost for a Cloudflare-proxied domain must use `forceSSL = false; addSSL = true`, not `forceSSL = true`. The latter causes an infinite redirect loop because Cloudflare sends HTTP but nginx redirects to HTTPS, which Cloudflare re-proxies as HTTP again.
|
|
|
|
## Known gotchas
|
|
|
|
- `gitea-runner` is a `DynamicUser` in the systemd service, so it has no persistent uid. Setting `age.secrets.forgejo-runner-token.owner = "gitea-runner"` causes a chown error at activation; use `owner = "root"` instead (the service reads it via `EnvironmentFile` which runs as root before privilege drop).
|
|
- `secrets/default.nix` must have the public key from `keys/mainframe.pub` as the recipient — if the host SSH keys change, you must also update `mainframe.pub` and re-key all secrets.
|
|
- `pullomatic` uses `/root/.ssh/id_rsa.pem` (a PEM-format SSH key) to pull private git repos.
|
|
- **ACME cyclic dependency list**: `linux.nix` has a `systemd.services.nginx.after = lib.mkForce [...]` list that breaks a systemd cycle between nginx and ACME services. Every new domain added with `enableACME = true` in `nginx.nix` **must** also have its `acme-selfsigned-<domain>.service` added to this list in `linux.nix`, otherwise nixos-rebuild will fail with a cyclic dependency error.
|