Compare commits

..

1 Commits

Author SHA1 Message Date
Andras Bacsai c45a0b35ec feat(service): wire format and show-sensitive flags to get and list commands 2026-04-16 11:30:44 +02:00
85 changed files with 118 additions and 11535 deletions
-8
View File
@@ -42,14 +42,6 @@ linters:
exhaustive:
default-signifies-exhaustive: true
revive:
rules:
- name: var-naming
arguments:
- []
- []
- - skipPackageNameChecks: true
staticcheck:
checks: ["all", "-ST1005", "-S1016"]
-326
View File
@@ -174,332 +174,6 @@ type Resource struct {
- UUIDs are more secure (don't expose database sequencing)
- Coolify API uses UUIDs as the primary resource identifier
## `coolify init` — WireGuard mesh + Podman bootstrap (alpha, v5)
**This subcommand is an outlier**: it does NOT talk to the Coolify API. It SSHes into remote hosts and installs/configures WireGuard, Podman, the bridge network, and a firewall scaffold. It's the fleet-provisioning command tree consumed by the v5 control plane (coold), split into three intent-scoped subcommands — `bootstrap`, `extend`, `upgrade` — plus a read-only `plan`. Coolify's backend calls `extend` when the operator adds a server and `upgrade` when agent versions move; direct-CLI operators run `bootstrap` for the initial install.
### What it does
- Establishes a full-mesh WireGuard overlay across N hosts.
- Each host gets a mgmt IP `/32` from `--wg-mgmt-pool` (default `100.64.0.0/16`, RFC 6598 CGNAT) on `wg0`.
- For every namespace (see **Namespaces** below; default: just `default`), each host gets a container subnet `/<container-prefix>` carved from the shared `--container-pool` (default `10.210.0.0/16`, default prefix `/24`). Each namespace is owned by its own Podman bridge named `coolify-<namespace>-mesh` (default → `coolify-default-mesh`).
- Installs Podman + enables `podman.socket` + creates every namespace bridge + installs `coolify-mesh-fw.service` (always; required for v5 runtime).
- Downloads and installs coold + corrosion (v5 control-plane agents; always) from GitHub releases on each remote host. Release tag controlled by `--coold-version` / `--corrosion-version` (default `nightly`). coold receives the full namespace list via `COOLD_NAMESPACES=<ns>:<network>:<gateway-ip>,...` so it can bind DNS and track rules per namespace.
- Installs default-deny firewall scaffold by default — host-global `COOLIFY-INTRA` + empty `COOLIFY-ALLOW` chains, with FORWARD jumps for every namespace subnet. Use `--skip-default-deny` to fall back to blanket-allow (mode A) for testing.
### Architecture (why this layout)
The mgmt pool and container pool are **separate** so the Podman bridge can own the full container `/24` without conflicting with `wg0`. Pattern adopted from uncloud (psviderski/uncloud).
WG config per host (e.g. host A with two namespaces `default` + `alpha`):
```
[Interface]
Address = 100.64.0.1/32 # mgmt IP, NOT in container pool
ListenPort = 51820
PrivateKey = <gen on host>
[Peer] # one per other host
PublicKey = <peer pubkey>
AllowedIPs = 100.64.0.2/32, 10.210.1.0/24, 10.220.1.0/24 # mgmt + every namespace subnet
Endpoint = <peer SSH ip>:51820
```
Critical: `AllowedIPs` lists the peer's full per-namespace `/24`s so the kernel routes each namespace subnet via `wg0`. Namespace order is deterministic (sorted) so `wg0.conf` is stable across re-runs.
Every namespace bridge `coolify-<ns>-mesh` is created with `--disable-dns --label io.coolify.managed=true --label io.coolify.namespace=<ns>` — the bridge gateway `:53` is reserved for coold's embedded cluster DNS (see CONTROL_PLANE.md §5). Pre-alpha networks with `dns_enabled=true` are detected on re-run and recreated.
Firewall service (`coolify-mesh-fw.service`) installed unconditionally and stays host-global:
- POSTROUTING `RETURN` rule per namespace subnet prevents Podman MASQUERADE from rewriting container egress source on `wg0`.
- Mode A (`--skip-default-deny`): blanket FORWARD ACCEPT for every namespace subnet.
- Mode B (default): `COOLIFY-INTRA` chain (ESTABLISHED accept → `COOLIFY-ALLOW` → DROP), FORWARD jumps for `-s/-d <ns-subnet>` per namespace. v5 control plane (coold) fills `COOLIFY-ALLOW`.
### Cross-host vs intra-host firewall
- **Cross-host default-deny WORKS** — those packets cross interfaces (wg0 ↔ bridge) and traverse iptables FORWARD. Empirically verified.
- **Intra-host (same bridge) is NOT enforced** — Linux + netavark + Ubuntu 24.04 quirk: bridge L2 traffic bypasses iptables FORWARD even with `bridge-nf-call-iptables=1`. v5 control plane handles intra-host isolation via per-app podman networks (`--opt isolate=true`), not iptables.
### Subcommands
Three intent-scoped subcommands. Each runs the same probe → plan → filter → apply → verify pipeline; what differs is the filter applied to the action list. The filter lives in `internal/wireguard/intent.go` (`ValidateIntent` + `filterByIntent`). Suppressed actions surface on `plan.Skipped` so the preview shows operators what would have fired and why.
```bash
coolify init plan --servers IP1,IP2,IP3 --ssh-key KEY [--intent bootstrap|extend|upgrade]
coolify init bootstrap --servers IP1,IP2,IP3 --ssh-key KEY [--yes]
coolify init extend --servers IP1,IP2,IP3,IP4 --new-hosts IP4 --ssh-key KEY [--allow-replace]
coolify init upgrade --servers IP1,IP2,IP3 --ssh-key KEY --coold-version v1.7.0 [--allow-nightly]
```
- `plan` is read-only: probes, reconstructs, shows what the selected intent would execute. Default intent is `bootstrap` (broadest preview).
- `bootstrap` is the first-time install — every applicable action on every host. Keeps the interactive alpha gate (unless `--yes`, `COOLIFY_NON_INTERACTIVE=1`, or non-TTY). 2-phase parallel: phase 1 = install + keygen + podman + socket + IP forward. Re-probe. Phase 2 = write WG config + enable/reload service + create podman networks + install firewall + install coold/corrosion (+ scheduler on `--central` + builder on `--builder-hosts`).
- `extend` adds the hosts listed in `--new-hosts` (required subset of `--servers`) to an existing mesh. Brand-new hosts get the full first-time install. Existing hosts get **only peer-refresh** actions (WG config rewrite picks up the new peer's mgmt `/32` + namespace `/24`s in `AllowedIPs`, corrosion peer list refreshed, firewall unit reinstalled only when the namespace list changed). Agent binaries are not re-downloaded on existing hosts. Destructive-replace actions (podman network recreate because of `dns_enabled=true` drift or a subnet/label mismatch) are **blocked on existing hosts** unless `--allow-replace` is passed. The corrosion-schema wipe-DB branch is never unlocked — resolve schema drift with `upgrade` on a fresh schema.
- `upgrade` bumps agent binaries across every host. Only binary-fetch actions (`install-coold`, `install-corrosion`, `install-scheduler`, `install-builder`) and their follow-up service restarts (`install-coold-service`, `install-corrosion-service`, `install-scheduler-service`) run. WG config, podman networks, firewall rules, and the corrosion schema stay untouched. `nightly` tags are rejected by default (they force a re-install every run); pin a version with `--coold-version=v1.7.0` etc. or pass `--allow-nightly`.
`extend` and `upgrade` skip the interactive alpha gate because they are the paths the Coolify backend calls in production. `bootstrap` keeps the gate for direct-CLI runs.
### Flags (defined in `cmd/init/flags.go`)
Persistent (inherited by `plan`, `bootstrap`, `extend`, `upgrade`):
| Flag | Default | Purpose |
|---|---|---|
| `--servers` | required | comma-separated SSH IPs (full list of every host in the mesh, including already-converged ones on extend/upgrade) |
| `--ssh-key` | required | path to SSH private key |
| `--ssh-passphrase-prompt` | false | prompt for key passphrase (also reads `COOLIFY_SSH_PASSPHRASE` env) |
| `--ssh-user` | `root` | SSH user |
| `--ssh-port` | `22` | SSH port |
| `--wg-mgmt-pool` | `100.64.0.0/16` | mgmt IP pool, /32 per host on wg0 |
| `--container-pool` | `10.210.0.0/16` | container pool, carved per host |
| `--container-prefix` | `24` | per-host container subnet prefix |
| `--wg-interface` | `wg0` | WG iface name on remote |
| `--wg-listen-port` | `51820` | WG UDP port |
| `--namespaces` | `default` | comma-separated list of namespaces. Each creates its own `coolify-<ns>-mesh` bridge with its own per-host `/24` carved from `--container-pool` |
| `--skip-default-deny` | false | skip the default-deny firewall scaffold. Default installs COOLIFY-INTRA + empty COOLIFY-ALLOW chains for cross-host deny |
| `--coold-version` | `nightly` | release tag to download for coold (e.g. `nightly`, `v1.2.3`). `nightly` always re-downloads on every run; pinned tags skip when the on-host version marker matches. Fetched from `coollabsio/coold` GitHub releases on the remote host. |
| `--corrosion-version` | `nightly` | release tag to download for corrosion. Same drift semantics as `--coold-version`. Fetched from `coollabsio/corrosion` GitHub releases. |
| `--scheduler-version` | `nightly` | release tag for scheduler (only fetched when `--central` is set). |
| `--corrosion-gossip-port` | `8787` | corrosion SWIM gossip port (bound to wg0 mgmt IP) |
| `--corrosion-api-port` | `8080` | corrosion HTTP API port (bound to 127.0.0.1) |
| `--central` | `""` | SSH address of the central VM (must be in `--servers`). When set, scheduler installs there and per-host JWTs are pushed to every peer. Empty = skip scheduler setup. |
| `--enable-builder` | true | cluster-wide shorthand: enable the builder capability on every host (requires `--central`). Ignored when `--builder-hosts` is set. |
| `--builder-hosts` | `[]` | explicit subset of `--servers` to enroll with the builder capability. Takes precedence over `--enable-builder`. |
| `--builder-capacity` | `2` | concurrent builds per host (`COOLD_BUILDER_CAPACITY`) |
| `--builder-cpu-quota` | `200%` | systemd CPUQuota per build subprocess |
| `--builder-memory-max` | `2G` | systemd MemoryMax per build subprocess |
| `--builder-timeout-secs` | `1800` | wall-clock cap per build |
| `--concurrency` | `10` | parallel SSH connections |
| `--ssh-timeout` | `30s` | SSH connect timeout |
| `--yes`, `-y` | false | skip alpha confirmation prompt (honored by `bootstrap`; `extend` and `upgrade` always skip it) |
Subcommand-local:
| Flag | Subcommand | Default | Purpose |
|---|---|---|---|
| `--intent` | `plan` | `bootstrap` | preview filter: `bootstrap` (all actions), `extend` (treat `--new-hosts` as fresh, existing hosts peer-refresh only), `upgrade` (version bumps only) |
| `--new-hosts` | `extend` | required | comma-separated subset of `--servers` that is brand-new this run. Only these hosts receive the full install; all other hosts get peer-refresh only. |
| `--allow-replace` | `extend` | false | unlock destructive-replace actions on existing hosts (e.g. recreating a drifted podman bridge). Off by default — drifted existing hosts surface as skipped actions. |
| `--allow-nightly` | `upgrade` | false | permit `nightly` as a version tag. Off by default because `nightly` re-installs every run instead of only when the pinned version changes. |
### Namespaces
Namespaces are the tenancy unit the mesh carries. A namespace is:
- **A podman bridge network** on every host, named `coolify-<ns>-mesh` (default → `coolify-default-mesh`), labelled `io.coolify.managed=true` + `io.coolify.namespace=<ns>`.
- **A per-host `/<container-prefix>` subnet** carved from the shared `--container-pool`. Allocation is deterministic across `(namespace, host)` pairs so re-runs reproduce the same layout.
- **A DNS view** coold serves on that bridge's gateway: records take the shape `<container>.<namespace>.coolify.internal`. Bare `<container>.coolify.internal` is deliberately NXDOMAIN — callers must fully qualify.
- **A firewall tenant**: allow-rule cids hash the namespace in, so identical src/dst/proto/port tuples in different namespaces are distinct rules. iptables chains stay host-global (`COOLIFY-INTRA` / `COOLIFY-ALLOW`) for alpha; namespace isolation comes from separate podman bridges + namespace-qualified allow rules.
Config knobs:
- `coolify init bootstrap --namespaces default,alpha,beta` provisions every namespace on every host in one pass. Re-running `bootstrap` (or running `extend` with the new namespace in `--namespaces`) installs only the new per-namespace assets (bridge + FORWARD jumps + WG `AllowedIPs` refresh + firewall unit reinstall because of unit-hash drift). Removing a namespace is **not** idempotent today — destroy/rebuild is the documented path for alpha.
- `coolify firewall --namespace <ns>` (default `default`) scopes allow/revoke/list/containers to one namespace. `list` and `containers` also accept `--all-namespaces` for cross-namespace observability.
- coold receives the full namespace list via `COOLD_NAMESPACES=<ns>:<network>:<gateway-ip>,…` (see `internal/services/coold.go`). DNS binds and rule storage derive from that.
Deliberately deferred (tracked in the active plan):
- Per-namespace iptables chains. Host-global keeps kernel state simple; revisit when a user asks for kernel-enforced per-namespace default-deny.
- Cross-namespace L2 bridging. Different namespaces = different podman bridges = no intra-host connectivity. Cross-namespace flows require explicit allow rules + dual-attach containers.
- Wildcard / DNS search domain. Start strict; loosen once real workloads push back.
### Code layout
- `cmd/common/` — flag structs shared between `init` and `firewall`.
- `sshmesh.go``SSHMeshFlags` + `BindSSHMeshFlags`, `BuildSSHClient`, `ParseSSHTimeout`, `ResolvePassphrase`, `Validate`.
- `meshnet.go``MeshNetFlags` (namespaces + container pool/prefix) + `BindMeshNetMultiFlags` (init-style: many namespaces) + `BindMeshNetSingleFlags` (firewall-style: one namespace) + `PodmanNetworkFor(ns)` + `ValidateNamespaces` / `ValidateNamespace` (DNS-label check).
- `cmd/init/` — Cobra subcommands (`init`, `init plan`, `init bootstrap`, `init extend`, `init upgrade`).
- `flags.go``InitFlags` struct (embeds `common.SSHMeshFlags` + `common.MeshNetFlags`) + bindings + SSH client builder. Carries subcommand-scoped knobs: `NewHosts`, `AllowReplace`, `AllowNightly`, `Intent`.
- `desired.go``buildDesired(flags)`: flag → `wireguard.DesiredMesh`. One source of truth so every subcommand produces the same struct modulo `Intent`.
- `plan.go``runPlan`: validate, `buildDesired`, `ValidateIntent`, build SSH client, probe, `BuildPlan`, render actions + skipped rows. `--intent` flag selects the filter for preview.
- `apply.go``runApply(ctx, cmd, flags, applyOptions)`: shared pipeline for all three executing subcommands. `applyOptions{SkipAlphaGate, Header}` differentiates them.
- `bootstrap.go``NewBootstrapCommand`: sets `flags.Intent = "bootstrap"`, keeps alpha gate.
- `extend.go``NewExtendCommand`: binds `--new-hosts` + `--allow-replace`, validates subset, sets `flags.Intent = "extend"`, skips alpha gate.
- `upgrade.go``NewUpgradeCommand`: binds `--allow-nightly`, sets `flags.Intent = "upgrade"`, skips alpha gate.
- `init.go` — registers the four subcommands; package is `initcmd` (not `init` — Go reserved keyword).
- `internal/wireguard/` — pure Go logic (no SSH, no I/O — `apply.go` is the SSH boundary).
- `state.go``ServerState` (with `Namespaces map[string]*NamespaceServerState`), `MeshState`, `DesiredMesh` (with `Intent`, `NewHosts`, `AllowReplace`, `AllowNightly`). `Intent` enum: `IntentBootstrap` (zero value), `IntentExtend`, `IntentUpgrade`.
- `intent.go``ValidateIntent` (pre-plan invariants: extend needs `NewHosts ⊆ Hosts`; upgrade rejects nightly unless opted-in), `filterByIntent` (mutates `plan.Actions` + `plan.Skipped`), `categorize` (action → `catSafeAlways` / `catPeerRefresh` / `catDestructiveReplace` / `catVersionBump` / `catWipeDB` / `catCorrosionSchemaFirstWrite`).
- `subnet.go``Allocate` (per `(namespace, host)` pair: `map[ns]map[host]*net.IPNet`) + `AllocateMgmtIPs` (per-host /32) + conflict detection. Provably stable: adding host D never shifts A/B/C.
- `config.go``RenderConfig` + `WriteConfigCommand` for `wg0.conf` (Address /32, AllowedIPs = mgmt /32 + every peer namespace subnet, deterministic order).
- `reconstruct.go``Probe` (per-namespace podman network inspect + label read) + `Reconstruct` (parallel) + `parseConfigFile`.
- `plan.go``BuildPlan` (pure: desired - actual = actions, then `ValidateIntent` + `filterByIntent`). `Plan.Skipped []SkippedAction` carries intent-filtered entries with reasons. Podman actions carry a `Namespace` field; one create/recreate action per namespace per host.
- `apply.go``ApplyMesh` (2-phase fanout via `internal/ssh/fanout.go`). Phase 2 loops over namespaces per host; firewall unit takes the union of every namespace subnet.
- `firewall.go``coolify-mesh-fw.service` unit generator (two-mode: blanket allow vs default-deny, one FORWARD/POSTROUTING pair per namespace subnet).
- `internal/ssh/` — generic SSH runner + parallel `ForEachServer[T]`.
- `test/fixtures/wg/wg0.conf` — fixture for parser tests.
### Key invariants
- **Reconstructed-only state**: no local state file. Every run re-probes via SSH. State lives on the hosts.
- **Idempotent**: re-running with no changes produces an empty plan. State drift triggers re-converge (e.g. flipping `--skip-default-deny` reinstalls the firewall service; bumping `--coold-version` re-fetches the binary).
- **Intent gates destruction**: `extend` on an existing host never re-downloads agents, never wipes the corrosion DB, and never recreates a drifted podman bridge without `--allow-replace`. Suppressed actions surface on `plan.Skipped` with a reason. `upgrade` never touches WG / podman / firewall / schema.
- **Private key never leaves host**: WG private key generated on remote via `wg genkey`; config written using `$PRIVKEY=$(cat /etc/wireguard/privatekey)` shell expansion.
- **Atomic config writes**: write to `.conf.tmp`, `mv` to `.conf`.
- **Non-disruptive WG reload**: service-restart uses `systemctl restart wg-quick@wg0 || wg syncconf wg0 <(wg-quick strip wg0)` — the fallback updates peers in kernel without tearing the tunnel.
- **Stable subnet assignment**: existing valid assignments are preserved across re-runs; adding a host never shifts existing `(namespace, host)` `/24`s. Only invalid (out-of-pool, wrong prefix, duplicate, network/broadcast IP) trigger reassignment with a warning.
- **Firewall reinstall is content-hashed**: `coolify-mesh-fw.service` is only rewritten when its expected unit text differs from the on-host sha256, so noisy restarts don't happen on converged re-runs.
### Future control plane (v5 / coold)
`coolify init` owns **fleet provisioning**: first-time bootstrap, adding hosts, and bumping agent versions — each via its own intent-scoped subcommand. Day-to-day container/firewall ops are the v5 control plane's job. See `CONTROL_PLANE.md` for the full spec, including:
- coold per-host agent (REST API on wg0, bind-mounts `/run/podman/podman.sock`, NEVER exposes socket on TCP).
- Service discovery via embedded DNS in coold + Corrosion-replicated sqlite (no env injection, no container restart on backend movement).
- Allow-rule persistence via coold's own DB + `iptables-restore --noflush` or `nft -f` batch (NOT systemd dropins per rule — doesn't scale).
- Cross-host allow rules go on the **destination host** (where DROP would otherwise fire).
When extending `coolify init`, defer dynamic responsibilities to coold. Bootstrap stays narrow: scaffold the mesh, install runtime, prep firewall chains. `extend` and `upgrade` stay narrower still: add peers and bump binaries, nothing else. coold owns everything that changes at runtime.
### Testing init
Tests live in `internal/wireguard/*_test.go` and `cmd/init/*_test.go`:
```bash
go test ./internal/wireguard/... ./cmd/init/... -v
```
Use the SSH `Runner` interface for mocking — never open real SSH connections in unit tests. `internal/ssh/fanout.go` is generic; reuse for any per-server fanout.
## `coolify firewall` — cross-host allow-rule client (alpha, v5)
**This subcommand is the second outlier** (alongside `coolify init`): it does NOT talk to the Coolify API. It is a thin REST client of the **coold** per-host agent installed by `coolify init` (coold install is unconditional as of v1.6.3). `allow` / `revoke` / `list` all go through coold's REST API (`/api/v1/firewall/allow`). `containers` stays SSH+podman because coold has no container surface yet. Transport is **SSH-bounce**: the laptop running the CLI is not a mesh peer, so it SSHes into the target host and the shell there runs `curl "http://$(wg0-mgmt-ip):8443/api/v1/firewall/..."` against coold on localhost.
coold owns all kernel-rule + persistence logic (iptables/nft backend detection, `/etc/coolify/allow.rules` snapshot, `coolify-mesh-allow.service`). The CLI never writes iptables or systemd units directly.
### What it does
- Discovers containers on the selected namespace's `coolify-<ns>-mesh` bridge (default `coolify-default-mesh`) across all listed hosts (SSH + `podman ps`). `--all-namespaces` fans out across every managed namespace.
- `POST /api/v1/firewall/allow` / `DELETE /api/v1/firewall/allow/{id}` / `GET /api/v1/firewall/allow` against coold on the host that **owns the destination IP** (per `CONTROL_PLANE.md §3`: rules go on dst host).
- Per-host bearer tokens fetched on demand from `/etc/coolify/api-token` (see `EnsureCooldAPITokenCommand` in `internal/services/coold.go` — each host generates its own random 32-byte hex token at install time).
- Idempotent at the coold level: POST of an identical tuple returns the existing id; DELETE of an unknown id returns 204.
### Subcommands
```bash
coolify firewall containers [--namespace <ns>] [--all-namespaces] # discover containers on coolify-<ns>-mesh (SSH+podman)
coolify firewall list [--namespace <ns>] [--all-namespaces] # GET /allow on every host and merge
coolify firewall allow --namespace <ns> --from <ref> --to <ref> [--port N] [--proto tcp|udp] [--bidirectional]
coolify firewall revoke --namespace <ns> --from <ref> --to <ref> [--port N] [--proto tcp|udp] [--bidirectional]
```
`<ref>` accepts: container name (unique across mesh), `host:name`, short 12-char podman ID, or raw IP.
### Flags
Persistent (inherited from `cmd/common/sshmesh.go` — shared with `coolify init`):
| Flag | Default | Purpose |
|---|---|---|
| `--servers` | required | comma-separated SSH IPs |
| `--ssh-key` | required | SSH private key path |
| `--ssh-passphrase-prompt` | false | prompt for passphrase (also `COOLIFY_SSH_PASSPHRASE` env) |
| `--ssh-user` | `root` | SSH user |
| `--ssh-port` | `22` | SSH port |
| `--concurrency` | `10` | parallel SSH connections |
| `--ssh-timeout` | `30s` | SSH connect timeout |
Firewall-specific persistent:
| Flag | Default | Purpose |
|---|---|---|
| `--namespace` | `default` | mesh namespace the command operates on. Derives podman network `coolify-<ns>-mesh` for container discovery and is sent to coold as part of every rule payload / list query |
| `--all-namespaces` | false | applies to `list` + `containers` only — fans out across every namespace the mesh carries (`allow` / `revoke` still require a specific `--namespace`) |
| `--coold-port` | `8443` | TCP port coold's REST API listens on (wg0 mgmt IP). Must match `COOLD_API_BIND` emitted by `internal/services/coold.go` |
| `--coold-token` | `""` | **optional** bearer-token override (also reads `COOLIFY_COOLD_TOKEN` env). When empty (the default), the CLI SSHes each host and reads `/etc/coolify/api-token` — tokens are per-host, not centrally shared |
Allow/revoke local:
| Flag | Default | Purpose |
|---|---|---|
| `--from` | required | source container ref or raw IP |
| `--to` | required | destination container ref or raw IP |
| `--port` | `0` | dst port (0 = any) |
| `--proto` | `tcp` | `tcp`, `udp`, or `""` (any — requires `--port=0`) |
| `--bidirectional` | false | also install reverse rule on src host (needed for server-initiated flows; conntrack ESTABLISHED handles client-initiated replies) |
### Rule identity
`cid = sha256(namespace|src|dst|proto|port)[:12]`. Namespace defaults to `"default"` on the wire when empty so legacy coold peers keep working. coold computes the cid server-side on POST and returns it in the body; the CLI surfaces it as the user-facing rule ID in `firewall list` output and uses it for DELETE. Stable across calls: `revoke --namespace … --from … --to …` rebuilds the same cid and matches. Identical src/dst/proto/port tuples in different namespaces produce different cids and are managed independently.
### SSH-bounce transport
Every coold call is wrapped in a single SSH command that first discovers the host's own wg0 mgmt IP and then curls coold on localhost:
```sh
# emitted for POST / DELETE (hard-fails if wg0 missing — no coold means nothing to apply to)
MGMT=$(ip -4 -o addr show wg0 2>/dev/null | awk '{print $4}' | cut -d/ -f1)
test -n "$MGMT" || { echo "coold mgmt IP (wg0) not found on $(hostname)" >&2; exit 1; }
curl -fsS --max-time 10 \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-X POST -d '{"src":"...","dst":"...","proto":"tcp","port":80}' \
"http://$MGMT:8443/api/v1/firewall/allow"
```
`list` uses the **soft** variant: missing wg0 emits `[]` and exits 0 so a partially-deployed mesh doesn't abort the whole fanout.
### Per-host token resolution
`cmd/firewall/helpers.go::tokenResolver` hands out tokens per host with a sync.Mutex-guarded cache:
- `--coold-token` (or `COOLIFY_COOLD_TOKEN` env) set → closure returns the override for every host; no SSH fetch.
- Otherwise → first access per host SSHes `cat /etc/coolify/api-token`, caches the result for the rest of the run. Token-fetch failures surface as a `ServerResult.Err` on the owning host (won't poison others).
The cache is scoped to one CLI invocation — no on-disk caching.
### Persistence across reboots
**coold owns this now.** On every API mutate, coold regenerates `/etc/coolify/allow.rules` (flat `iptables-save` fragment) and the companion `coolify-mesh-allow.service` restores it on boot via `iptables-restore --noflush`. Pre-coold persistence scaffolding was removed from the CLI when it migrated to REST — same file format, different writer.
### Code layout
- `cmd/common/sshmesh.go` — shared SSH/mesh flag struct `SSHMeshFlags` (+ `BindSSHMeshFlags`, `BuildSSHClient`, `ParseSSHTimeout`, `ResolvePassphrase`, `Validate`).
- `cmd/common/meshnet.go` — shared namespace plumbing: `MeshNetFlags` (namespaces + container pool/prefix), `BindMeshNetMultiFlags` (init: many), `BindMeshNetSingleFlags` (firewall: one), `PodmanNetworkFor(ns)`, `ValidateNamespaces` / `ValidateNamespace`.
- `cmd/firewall/` — Cobra layer.
- `firewall.go``NewFirewallCommand()` parent + subcommand registration.
- `flags.go``FirewallFlags` embeds `common.SSHMeshFlags` + `Namespace` + `AllNamespaces` + `CooldToken` + `CooldPort` + `WGInterface`. `PodmanNetworkName()` derives the bridge name from `Namespace`. `ResolveCooldToken()` returns the override or `""` (meaning "fetch per host").
- `allow.go``allowRevokeFlags`, `emitAllowRevoke` (discover → resolve → build rule with namespace → coold POST/DELETE per rule, resolving token per host).
- `list.go``emitList` fans out `CooldList` via `CooldListAll`, forwarding the namespace query param (or omitting it under `--all-namespaces`).
- `containers.go``containers` subcommand (still SSH+podman). Without `--all-namespaces`: single bridge. With `--all-namespaces`: SSH per host for `podman network ls --filter label=io.coolify.managed=true`, then per-namespace fanout.
- `resolve.go``resolveEndpoint(ref, []Container)` (name / host:name / short-id / raw IP).
- `helpers.go``discoverAllViaPkg`, `discoverAcrossNamespaces`, `discoverNamespacesOnHosts`, `tokenResolver` (per-host cached bearer-token closure).
- `internal/firewall/` — REST client + discovery.
- `coold_client.go``FetchCooldToken`, `CooldApply`, `CooldRevoke`, `CooldList(… , namespace)`, `CooldListAll(… , namespace)`. `buildCurlAllow/Revoke/List`, `shellSingleQuote`, `mgmtIPScript` / `mgmtIPScriptSoft`. `cooldRulePayload` carries `namespace` (required on wire; empty normalized to `"default"`).
- `discover.go``Container` (with `Namespace`), `discoverScript`, `DiscoverContainers(… , namespace, network)`, `DiscoverAll`, `DiscoverAllNamespaces` (fan-out over a `networkFor(ns)` mapper).
- `rule.go``AllowRule` (with `Namespace`), `ComputeID(namespace, src, dst, proto, port)`.
- `internal/models/firewall.go` — table/JSON row types (`ContainerRow`, `AllowRuleRow`) both now carry a `Namespace` column.
- `internal/services/coold.go``EnsureCooldAPITokenCommand` (installer writes `/etc/coolify/api-token`, mode 0600), `CooldServiceUnit` emits `COOLD_API_BIND=<mgmt-ip>:8443` + `COOLD_API_TOKEN_FILE=/etc/coolify/api-token` + `COOLD_NAMESPACES=<ns>:<network>:<gateway-ip>,…`.
### Key invariants
- **Destination-host ownership**: every rule lives on exactly one host — the one whose `/24` contains the destination IP. `--bidirectional` adds the reverse rule on the src host.
- **coold is the only kernel writer**: the CLI never runs `iptables` or touches `/etc/coolify/allow.rules` directly. Everything flows through coold's REST API.
- **Per-host tokens by default**: each coold generates its own random token at install. `--coold-token` is an escape hatch for homogeneous test / CI environments, not the common path.
- **Bidirectional is opt-in**: conntrack ESTABLISHED accept (installed by `coolify-mesh-fw.service`) handles reply packets for client-initiated flows. Only set `--bidirectional` for protocols that actually open new connections in both directions.
- **Rule identity is hash, not UUID**: coold computes it server-side so CLI and any future writer agree on the same id for the same tuple.
- **Namespace is part of identity**: `cid = sha256(namespace|src|dst|proto|port)[:12]`. Same tuple in two namespaces = two distinct rules. Empty-string namespace normalizes to `"default"` on the wire so legacy coold peers keep working.
- **Transient token exposure on remote `/proc`**: `curl -H "Authorization: Bearer $TOKEN"` is visible in `/proc/<curl-pid>/cmdline` for the ~ms lifetime of the call, root-only. Acceptable for alpha; TLS + stdin-fed tokens are a follow-up.
### Testing firewall
```bash
go test ./internal/firewall/... ./cmd/firewall/... ./cmd/common/... -v
```
Uses `fakeCooldRunner` / `cmdFakeRunner` pattern (substring → canned stdout map) — same as `cmd/init/plan_test.go`. All SSH calls mocked at the `ssh.Runner` boundary; no real SSH in unit tests. Token-fetch, mgmt-IP script, curl shape, JSON payload, and error propagation are all covered.
### End-to-end flow (verified on real hosts)
After `coolify init bootstrap --servers A,B --namespaces default,alpha ...` ran (coold must be up):
1. Baseline cross-host traffic DROPped by `COOLIFY-INTRA` in every namespace.
2. `coolify firewall containers --servers A,B --ssh-key KEY --all-namespaces` → discovery table columned by namespace.
3. `coolify firewall allow --servers A,B --ssh-key KEY --namespace default --from client --to web --port 80` → CLI SSH-fetches each host's token, POSTs to coold (body includes `"namespace":"default"`), traffic flows in the `default` namespace only.
4. Same tuple with `--namespace alpha` → separate cid, separate rule; doesn't affect `default`.
5. `coolify firewall list --servers A,B --ssh-key KEY --all-namespaces` → merged rules across every namespace on every host with their coold-assigned `cid:…` IDs.
6. `coolify firewall revoke --namespace <ns> …` → coold DELETE, rule gone, traffic DROPped again.
7. Reboot → `coolify-mesh-allow.service` (installed by coold) restores from `/etc/coolify/allow.rules`.
Add `--coold-token <hex>` only when every host was bootstrapped with the same token (CI fixtures, homogeneous test clusters).
## Testing Requirements
**CRITICAL: All code changes MUST include tests. This is non-negotiable.**
-759
View File
@@ -1,759 +0,0 @@
# Coolify v5 Control Plane — Server Management Spec
This document lists everything the Coolify v5 control plane must implement on top of the host provisioning performed by the `coolify init` subcommand tree (`bootstrap` for first install, `extend` for adding hosts, `upgrade` for bumping agent versions) to fully manage a fleet of mesh-connected hosts.
## Architecture overview
```
┌─────────────────────────────────────┐
│ Coolify central UI / API │
│ - Multi-tenant (cloud) or 1-tenant │
│ (self-hosted); same binary │
│ - WSS / gRPC bidi stream listener │
│ on :443 (public) │
│ - Routes commands by host_id │
└────────────────────▲────────────────┘
│ outbound TLS :443 (WSS / gRPC bidi)
│ long-lived, resumable, jittered reconnect
│ per-host JWT (issued at enroll)
┌─────────────────┴──────────────────┐
│ (per-customer gateway, │
│ OPTIONAL — one mesh host │
│ proxies N coolds → 1 stream) │
└─────────────────▲──────────────────┘
│ same stream protocol, over wg0
┌────────────────────┴────────────────┐ ┌─────────────────────────┐
│ coold (per-host agent) │ │ /run/podman/podman.sock│
│ - Dials central (or gateway) out │──┤ bind-mount, host-only │
│ - Local REST on wg0 :8443 │ │ (NEVER on network) │
│ (intra-mesh callers: CLI, peers) │ └─────────────┬───────────┘
│ - Bearer-token authn (both paths) │ │
│ - Talks ONLY to local podman sock │ ▼
└─────────────────────────────────────┘ ┌─────────────────────────────┐
│ podmand (containers, nets) │
└─────────────────────────────┘
```
**Key principles**:
1. **`/run/podman/podman.sock` is never exposed on TCP.** coold bind-mounts it and proxies a curated API. Central Coolify never touches the raw podman socket directly.
2. **coold always dials outbound — never accepts inbound from central or public internet.** One topology for self-hosted and cloud SaaS. Works through any NAT/corp firewall, scales to thousands of hosts per central region (10k+ idle streams are cheap). No "add central to every customer's wg0" — central never joins any mesh.
3. **coold still exposes a local REST API on wg0 mgmt IP** for intra-mesh callers only (the `coolify firewall` CLI via SSH-bounce, other coolds in the same mesh, a per-customer gateway if deployed). Never reachable from public internet; wg0 is the only L3 boundary that can hit it.
4. **Per-customer gateway (optional)**: for large customers, one host in the mesh runs a stream aggregator that dials central once and proxies commands to the other coolds over wg0. Reduces stream fan-out at central from N-per-customer to 1-per-customer; adds one hop of latency. Transparent to both ends — same protocol each side.
## What `coolify init bootstrap` already provides
| Layer | Component | State |
|---|---|---|
| L3 mesh | WireGuard `wg0` per host with mgmt `/32` from `--wg-mgmt-pool` (default `100.64.0.0/16`) | Installed, configured, active |
| L3 mesh | Peer `AllowedIPs = <peer-mgmt>/32, <peer-container>/24` | Configured |
| Container runtime | Podman (distro apt) | Installed |
| Container runtime | `podman.socket` (rootful, `/run/podman/podman.sock`) | Enabled, active |
| Container network | `coolify-mesh` bridge per host with `/24` from `--container-pool` (default `10.210.0.0/16`), gateway `.1` | Created |
| Routing | `net.ipv4.ip_forward=1` (persisted via `/etc/sysctl.d/99-coolify-mesh.conf`) | Enabled |
| Firewall (mode A — `--podman` only) | `coolify-mesh-fw.service` with FORWARD ACCEPT for container subnet + POSTROUTING RETURN to skip podman MASQUERADE on wg0 | Active |
| Firewall (mode B — `--default-deny`) | `COOLIFY-INTRA` chain (ESTABLISHED/RELATED accept → COOLIFY-ALLOW → DROP), FORWARD jumps for `-s/-d <container-subnet>`, blanket ACCEPT removed | Active when set |
| Allow chain | `COOLIFY-ALLOW` (empty filter chain) | Created, ready for runtime rules |
Each host has a stable `(mgmt-ip, container-subnet)` pair. The bootstrap is idempotent — re-running `apply` only changes what drifted.
---
## What v5 control plane MUST implement
### 1. Inventory & state sync
- **Discovery**: query each host's `podman.socket` (over wg0 mgmt IP) for: containers, networks, volumes, images, system stats.
- **Drift detection**: periodically reconcile desired state (Coolify DB) against actual (podman API). Re-converge or alert.
- **Mesh join/leave**: when a host is added or removed from the cluster:
- Add → invoke `coolify init extend --servers <full list> --new-hosts <new host>` (installs the new host end-to-end, regenerates wg0 config on every existing peer with the new mgmt IP + namespace `/24`s, leaves agent binaries on existing hosts untouched).
- Remove → not supported by a first-class subcommand today. Documented workaround for alpha: tear the host out-of-band (stop services, drop it from DNS) and re-run `coolify init bootstrap` with the reduced `--servers` list on a maintenance window; a dedicated `remove-host` flow is a follow-up.
### 2. Container lifecycle
Every container op is a command sent over coold's outbound stream (central → coold) or a local REST call on coold's wg0 listener (intra-mesh → coold). coold executes the command against the local `/run/podman/podman.sock` Unix socket and streams results back.
- Create container with `--network coolify-mesh` and explicit `--ip` from the host's `/24`.
- Reserve container IPs in the control plane DB. Allocator skips `.1` (bridge gateway), reserves `.2` for coold itself, `.3-.254` for app containers.
- Start, stop, restart, remove.
- Stream logs via `/containers/{id}/logs?follow=true` (coold relays podman API frames over the open control stream).
- Health checks via `/containers/{id}/healthcheck/run`.
- Resource limits, env vars, mounts, volumes, secrets — all standard podman API surfaced through coold.
#### coold is a primitive proxy, not an app brain
coold follows the **kubelet analogue**: it knows containers, images, volumes, networks, iptables, and Corrosion writes. It does **not** know apps, compose, Dockerfiles, buildpacks, or Nixpacks. Central Coolify is the apiserver+controllers: it parses app-level config and compiles it into a sequence of primitive ops streamed to coold.
Test for "should this live in coold?": could a second orchestrator (a Nomad-style competitor) reuse this coold with a different app model? If yes → coold. If no → central.
#### Wire surface (enumerable)
Same endpoint set on both transports (outbound stream from central, local REST on wg0 for intra-mesh callers). New verbs require a coold release — there is no `/podman/raw` passthrough.
```
# Images
POST /api/v1/images/pull {ref, auth?} -> {digest}
GET /api/v1/images -> [{ref, digest, size}]
DELETE /api/v1/images/{ref}
# Containers (filtered podman surface)
POST /api/v1/containers <create spec> -> {id}
POST /api/v1/containers/{id}/start
POST /api/v1/containers/{id}/stop {timeout?}
POST /api/v1/containers/{id}/restart
DELETE /api/v1/containers/{id} {force?}
GET /api/v1/containers/{id} (inspect)
GET /api/v1/containers/{id}/logs?follow=true (streamed)
POST /api/v1/containers/{id}/exec {cmd, tty?} (streamed)
POST /api/v1/containers/{id}/healthcheck/run
# Volumes
POST /api/v1/volumes {name, driver, labels}
DELETE /api/v1/volumes/{name}
GET /api/v1/volumes/{name}
# Networks (bootstrap creates coolify-mesh; extra per-app nets created here)
POST /api/v1/networks {name, driver, options, labels}
DELETE /api/v1/networks/{name}
GET /api/v1/networks
# Firewall (coold = sole writer)
POST /api/v1/firewall/allow {src, dst, proto?, port?} -> {id}
DELETE /api/v1/firewall/allow/{id}
GET /api/v1/firewall/allow
# Service endpoints (Corrosion writer; used by central to register deploys)
POST /api/v1/services/register
DELETE /api/v1/services/{id}/endpoints/{container_id}
GET /api/v1/services/{id}/endpoints
# DNS (diagnostics)
GET /api/v1/dns/lookup/{name}
GET /api/v1/dns/stats
# Host facts (read-only; central scrapes these for observability + scheduling)
GET /api/v1/host/info (podman info, kernel, wg state, load)
GET /api/v1/host/containers (podman ps -a)
GET /api/v1/host/stats (podman stats snapshot)
```
**Deny filter on `POST /containers`** (defense-in-depth even though central is trusted):
- Block `--privileged`, `--cap-add=SYS_ADMIN/NET_ADMIN` unless host is marked `allow_privileged=true`.
- Block host-path bind mounts outside a configurable allowlist (default: none).
- Block host netns (`--net=host`) unless the container is coold itself.
Anything not above is not coold's job. No `/apps`, `/deployments`, `/compose`, `/build`, `/podman/raw`. coold does not parse compose, Dockerfiles, buildpacks, or any app-level config — central compiles these into sequences of the primitive ops above and streams them down.
#### Networks
Default = shared `coolify-mesh` bridge. Containers get `.coolify.internal` DNS + flat L3 across the mesh. Users may define extra podman networks per app (docker-compose `networks:` style) via `POST /networks` + container attach on create. Central compiles compose into network-create + container-attach primitives.
#### coold deployment
coold runs as a privileged container on each host (or as a host systemd service). `coolify init bootstrap` puts it in place at install time (and `coolify init upgrade` bumps its version later): binary, systemd unit with `COOLD_API_BIND=<wg0-mgmt-ip>:8443`, random per-host bearer token at `/etc/coolify/api-token` (mode 0600), outbound stream config written atomically to `/etc/coolify/coold.env`.
Reference container spec (equivalent to systemd-service deployment):
```bash
podman run -d --name coold --restart=always \
--network coolify-mesh --ip 10.210.X.2 \
-v /run/podman/podman.sock:/run/podman/podman.sock \
-v /etc/coolify/coold:/etc/coolify/coold:ro \
--security-opt label=disable \
-p 100.64.0.X:8443:8443 \
ghcr.io/coollabs/coold:latest
```
- **Outbound stream**: coold dials `wss://<central-host>/v1/agent` (or gRPC bidi) on start, presenting its per-host JWT. Central routes commands to it by host id over the open stream. Stream is the primary control channel for both self-hosted and cloud SaaS — same code path, same binary.
- **Local REST on wg0 mgmt IP (`100.64.0.X:8443`)**: accepts intra-mesh callers only (the `coolify firewall` CLI via SSH-bounce, other coolds in the same mesh, a per-customer gateway). Not reachable from public internet — wg0 is the L3 boundary. Bearer-token auth on every request.
- **No inbound from central**: central never dials coold. All mutations arrive over the coold-initiated stream; no `COOLIFY-ALLOW` rule for "central → host:8443" needed. Works through NAT/corp firewalls.
#### Control channel transport (stream)
Two candidates; spec-time decision, not per-host:
| Option | Pros | Cons |
|---|---|---|
| **gRPC bidi stream over HTTP/2** *(chosen)* | typed Protobuf schemas, native server-streaming for logs/exec, versionable wire | stricter proxy requirements (some corp proxies still mangle HTTP/2); larger runtime |
| WebSocket (WSS over :443) *(fallback)* | traverses every proxy, tiny overhead, libs everywhere | framing is custom-on-top; manual request/response correlation |
**Decision: gRPC bidi + Protobuf.** Typed schemas + native server-streaming for logs and exec outweigh the proxy risk; WSS remains the documented fallback if gRPC-through-proxy issues show up in the field. Both run on :443, so customer-side egress rules stay unchanged either way.
#### Enrollment
coold registers once at install using a one-time token from central:
```bash
coolify init bootstrap \
--central-url https://cloud.coolify.io \
--enroll-token <one-time-hex>
```
1. coold POSTs `(host_id, wg0_mgmt_ip, container_subnet, enroll_token)` to `https://<central>/v1/enroll`.
2. Central validates the enroll token (scoped to a tenant, single-use, short TTL) and issues a long-lived per-host JWT + TLS-pinned central cert. Response stored in `/etc/coolify/coold.env` (mode 0600).
3. coold burns the enroll token and switches to JWT for the persistent stream.
4. Central revokes by invalidating the JWT in its own DB; next stream reconnect fails auth and the host is quarantined until re-enrolled.
#### Reconnect + fleet-restart storms
Single-central-restart would otherwise trigger simultaneous reconnects from every host. Mitigations:
- **Jittered backoff**: exponential from 1s up to 60s with full jitter. 10k hosts reconnecting spread across ~minutes, not seconds.
- **Resumable streams**: stream carries a monotonic `last_seq` per host so central can replay missed commands after reconnect without central-side queueing beyond an in-memory ring buffer.
- **Region sharding**: DNS round-robin or geo-steering across multiple central stream gateways; each gateway holds O(10k) streams. Stateful routing via consistent-hashing on host_id so a host lands on the same gateway across reconnects (cache affinity).
#### Per-customer gateway (optional)
For customers with 50+ hosts, one designated mesh host runs a **gateway mode coold** (same binary, different role):
- Dials central like any other coold.
- Accepts incoming streams from its peer coolds over wg0 (they dial `wss://<gateway-mgmt-ip>:8443/v1/agent-peer` instead of central).
- Relays commands down, responses up. Maintains O(hosts-in-mesh) inbound streams + 1 outbound to central.
Saves N-1 WAN streams at central per customer; costs one hop of latency + one more thing to keep alive. Opt-in via `coolify init bootstrap --gateway-for-mesh` on the chosen host; peers get `--via-gateway <gateway-mgmt-ip>` at install.
### 3. Network policy (firewall)
When host has `--default-deny` enabled, **all cross-host container traffic is dropped by default**. The control plane decides who talks to whom.
#### Division of labour: bootstrap vs coold vs central
| Layer | Owner | Responsibility |
|---|---|---|
| Chain scaffold (COOLIFY-INTRA, COOLIFY-ALLOW, FORWARD jumps, conntrack early-accept, POSTROUTING RETURN) | `coolify init bootstrap` (also reconverges on `extend`) | Install + idempotently re-converge on flag change. Never touches individual allow rules. |
| Rule metadata (who/when/why, audit log, RBAC, tenant scoping, app→rule mapping) | **Coolify central DB** | Authoritative store. All rich queries, audit trails, and access control live here. |
| Raw rule tuples `(src, dst, proto, port)` on the host | **coold** (single writer) | Apply to kernel + snapshot to `/etc/coolify/allow.rules` for reboot. Stateless-ish — just a cache of what the caller (central Coolify or `coolify firewall` CLI) told it to apply. No metadata, no DB. |
**Key split**: central Coolify owns rich state (metadata, audit, RBAC). Per-host coold owns only the raw rules needed to program the kernel + survive reboot. This keeps coold small and lets a single central DB be the source of truth for all cross-cutting concerns.
**App-topology compilation happens in central.** coold applies the rule tuples it is told to apply; it does not generate rules from app intent (e.g. "allow service `web``db`"). Central compiles that from the app model and sends individual `POST /firewall/allow` frames.
**`coolify init` is intentionally not the rule store.** Bootstrap creates the empty allow chain. coold is the sole writer into it. Callers reach coold via two paths: (a) central Coolify over the coold-initiated outbound stream, (b) intra-mesh callers (`coolify firewall` CLI via SSH-bounce, other coolds, optional per-customer gateway) via coold's local REST API on wg0 mgmt IP.
#### Reboot persistence
Works the same pre- and post-coold because both use the same file format:
- `/etc/coolify/allow.rules` — filter-table fragment, `:COOLIFY-ALLOW` + `-A COOLIFY-ALLOW` lines only. Written atomically (`.tmp` + `mv`) on every rule change.
- `/etc/systemd/system/coolify-mesh-allow.service``Type=oneshot`, `After=coolify-mesh-fw.service`, `Wants=coolify-mesh-fw.service`. `ExecStart=iptables-restore --noflush /etc/coolify/allow.rules`. `--noflush` means only `COOLIFY-ALLOW` is populated; nothing else is disturbed.
coold owns the file: it rewrites `/etc/coolify/allow.rules` on every successful API mutate, keeping it in sync with the live kernel. The `coolify firewall` CLI never touches the file — it POSTs/DELETEs through coold and coold handles persistence + systemd unit install. One writer, one format.
#### Allow-rule lifecycle
For an allow `(srcIP, dstIP)`:
- Add ACCEPT to `COOLIFY-ALLOW` on the host that **owns dstIP** (where DROP would otherwise fire).
- For bidirectional traffic (e.g. TCP, ICMP echo+reply), add the reverse `(dstIP, srcIP)` on the host that owns srcIP. (Reply packets traverse THAT host's FORWARD chain when arriving back, and dst-side check fires there.)
- **One unidirectional allow = one rule on one host. One bidirectional allow = two rules on two hosts.**
- Conntrack ESTABLISHED early-accept (installed by bootstrap) handles in-flow follow-up packets — no need to add per-packet rules.
#### Persistence + scale model
Per-rule systemd dropins do NOT scale (1000 rules × `daemon-reload` + restart = minutes, fs clutter, audit nightmare). Instead, coold is a thin rule-applier backed by central:
```
coold service (per host)
├─ Snapshot file: /etc/coolify/allow.rules (flat iptables-save fragment)
├─ Boot: systemd unit runs iptables-restore --noflush from file
├─ API mutate: apply iptables -A/-D → regen snapshot via iptables-save
└─ Reconcile: central periodically diffs its DB vs coold's live
`iptables -S COOLIFY-ALLOW`; pushes deltas to re-converge
```
Source of truth for **the set of rules that should exist** = central Coolify DB. Source of truth for **what's programmed in the kernel right now** = kernel itself, mirrored to `/etc/coolify/allow.rules` for reboot. coold does not keep its own DB.
#### Write ordering (crash/reboot safety)
Every mutating call from central → coold follows this sequence:
1. **Central writes to its own DB first** (with its own audit/tenant metadata). Durable with the rest of Coolify's state.
2. **Central sends command over the open stream** to coold with just `(src, dst, proto, port)`. No inbound connection to coold — the stream was already established by coold at boot.
3. **coold applies `iptables -A/-D`** to kernel.
4. **coold regenerates `/etc/coolify/allow.rules`** via `iptables-save` (atomic `.tmp` + `mv`).
5. **coold returns success to central** over the same stream (response carries the request id).
6. **On any failure in 35**, central marks the row "pending" in its DB and retries / surfaces to operator. Nothing is lost because step 1 is already durable.
Consequences:
- **Crash between steps 3 and 4** → kernel has the rule, file doesn't. Reboot loses the rule. Central's reconcile loop detects divergence (its DB has the rule, live kernel doesn't after boot) and re-pushes. Safe, with a small drift window bounded by reconcile cadence.
- **Crash between steps 4 and 5** → kernel + file both updated, but central didn't get the ack. Central retries; `iptables -C` guard makes the retry a no-op. Safe.
- **coold down when central wants to mutate** → central queues the change and retries on reconnect. No state loss on either side.
- **Central DB is authoritative** — a reboot can only *shrink* the live rule set compared to central's view, never grow it.
Bulk ops (`/bulk`) ship the whole batch in one REST call. coold applies via `iptables-restore --noflush` / `nft -f` (atomic transaction), then regens snapshot once.
Apply paths:
| Backend | Bulk apply (1000 rules) | Atomicity |
|---|---|---|
| `iptables -A` per rule | ~5s | per-rule |
| `iptables-restore --noflush` (preferred for iptables-legacy) | ~50ms | per-batch |
| `nft -f /tmp/rules.nft` (preferred when host uses nftables backend) | ~10ms | atomic transaction |
coold detects backend (`iptables --version` or presence of nftables socket) and picks. Bootstrap doesn't care.
For **systemctl restart coolify-mesh-fw.service** (e.g. a `coolify init bootstrap` re-run after a flag flip, or `coolify init extend` reinstalling the unit because the namespace list changed): the unit flushes COOLIFY-INTRA but **never flushes COOLIFY-ALLOW** — existing rules survive. If somehow lost (manual `iptables -F COOLIFY-ALLOW`, crash mid-write), central's reconcile loop compares its own DB against `iptables -S COOLIFY-ALLOW` from each host and re-pushes any missing tuples within the reconcile interval.
#### Allow API surface
Same method/path set is served on both transports — stream (central → coold) and local REST (intra-mesh → coold). Stream = JSON-RPC frames carrying the same `(method, path, body)` tuple; REST = plain HTTP on wg0 mgmt IP :8443.
```
POST /api/v1/firewall/allow {src, dst, proto?, port?, comment?} → returns id
DELETE /api/v1/firewall/allow/{id}
GET /api/v1/firewall/allow list
GET /api/v1/firewall/allow/{id} show + match counters
POST /api/v1/firewall/allow/bulk {add: [...], remove: [...]} atomic batch
POST /api/v1/firewall/reconcile force full reload
```
coold translates each row into the right iptables/nft fragment. Per-port: `-p tcp --dport <N>`. Source/dest IP, CIDR, or set reference (for grouping like "all-frontend-ips").
For very large rule sets: use **nftables sets** so a rule references a set name, and the set membership changes are O(1):
```
nft add element ip filter coolify_allowed_pairs { 10.210.0.10 . 10.210.1.10 }
```
One static rule like `ct state new ip saddr . ip daddr @coolify_allowed_pairs accept` evaluates in O(log n) regardless of set size. coold maintains the set rather than thousands of rules. Optional optimization for v5+.
#### Intra-host isolation (NOT enforced by `--default-deny`)
Linux + netavark + Ubuntu 24.04: bridge L2 traffic bypasses iptables FORWARD even with `bridge-nf-call-iptables=1`. **Containers on the same host's `coolify-mesh` bridge can always reach each other.**
Two paths for v5 to enforce intra-host isolation:
- **(Recommended) Per-app podman networks**: each Coolify service = own podman network with `--opt isolate=true`. Different networks can't talk by default; use `podman network connect` for cross-app.
- Trade-off: each network needs its own `/24` from container pool → wastes pool. Or carve `/27`s (allocator extension needed).
- **(Alternative) ebtables L2 filter**: `ebtables --logical-in podman1 --logical-out podman1 --ip-src X --ip-dst Y -j ACCEPT/DROP`. Independent toolchain, separate persistence. Bridge name discovery needed.
v1 ships without intra-host enforcement. v5 picks one path.
### 4. Container IP allocation per host
The bootstrap gives each host a `/24` (e.g. `10.210.0.0/24`). The control plane:
- Reserves `.1` (bridge gateway, skip).
- Allocates `.2-.254` for containers, deduplicated against running `podman ps` IPs.
- Pins IP via `podman run --ip <IP>` so DNS/firewall rules stay stable.
- Detects exhaustion early; alerts user to grow `--container-pool` or `--container-prefix`.
For `/24` per host: 253 containers max. For higher density: re-bootstrap with `--container-prefix 23` or larger pool.
### 5. Service discovery
**Pattern**: embedded DNS server in coold, backed by [Corrosion](https://github.com/superfly/corrosion) (CRDT sqlite gossiped via SWIM across the mesh). No env injection. No container restarts on backend movement.
#### Why DNS-via-coold over alternatives
| Approach | Stable target? | Backend move = restart? | Complexity |
|---|---|---|---|
| Env injection (`DB_HOST=10.210.5.42`) | no — IP changes | yes (rolling redeploy on every change) | medium (template engine + dep graph) |
| **Embedded DNS in coold** | **yes (hostname)** | **no** | **low (~200 LoC)** |
| VIP per service | yes (IP) | no | high (keepalived/BGP/IPVS) |
| Per-host HTTP/TCP proxy | yes (port) | no | medium (proxy config) |
DNS chosen: smallest moving parts, works for any protocol, standard `getaddrinfo()` path, ubiquitous client support.
#### Corrosion schema (replicated sqlite)
```sql
CREATE TABLE services (
id TEXT PRIMARY KEY, -- "myapp.db"
coolify_app_id TEXT NOT NULL,
name TEXT NOT NULL, -- "db"
namespace TEXT NOT NULL, -- "myapp"
port INTEGER, -- canonical port (informational)
updated_at INTEGER NOT NULL -- ms epoch (CRDT clock)
);
CREATE TABLE service_endpoints (
service_id TEXT NOT NULL,
container_id TEXT NOT NULL,
host_mgmt_ip TEXT NOT NULL, -- 100.64.0.X (host running the container)
container_ip TEXT NOT NULL, -- 10.210.X.Y
healthy INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
PRIMARY KEY (service_id, container_id)
);
```
Each coold writes its own host's container facts. Reads are local sqlite (sub-ms). Gossip handles distribution; convergence ~1s in small clusters.
#### Embedded DNS server
```go
// pseudocode — ~200 LoC total
func (c *Coold) serveDNS() {
pc, _ := net.ListenPacket("udp", "10.210.X.1:53") // bridge gateway IP
for {
buf := make([]byte, 512)
n, addr, _ := pc.ReadFrom(buf)
go c.handle(buf[:n], addr, pc)
}
}
func (c *Coold) handle(query []byte, src net.Addr, pc net.PacketConn) {
msg := dns.Unpack(query)
name := msg.Questions[0].Name // "myapp.db.coolify.internal."
if !strings.HasSuffix(name, ".coolify.internal.") {
// Forward to upstream (configurable; default 1.1.1.1).
pc.WriteTo(c.upstream.Query(msg), src)
return
}
serviceID := strings.TrimSuffix(name, ".coolify.internal.")
var ips []string
c.corrosion.Query(`
SELECT container_ip FROM service_endpoints
WHERE service_id = ? AND healthy = 1
`, serviceID).Scan(&ips)
if len(ips) == 0 {
pc.WriteTo(dns.NXDOMAIN(msg), src); return
}
pc.WriteTo(dns.AnswerA(msg, ips, ttl=5), src)
}
```
Listens on **bridge gateway IP** (`10.210.X.1:53`) of the host's `coolify-mesh` bridge — reachable from every container in the host's `/24` via standard kernel routing.
#### Container creation hook
Every container coold creates gets:
```
podman run --dns 10.210.X.1 --dns-search coolify.internal ...
```
App code uses short names: `getaddrinfo("myapp.db", ...)` → libc appends search suffix → `myapp.db.coolify.internal` → coold answers from local Corrosion.
#### Resolution flow
```
1. App in container A on host-1 (10.210.0.10) calls getaddrinfo("myapp.db")
2. libc reads /etc/resolv.conf:
nameserver 10.210.0.1
search coolify.internal
3. UDP query "myapp.db.coolify.internal" → 10.210.0.1:53
4. coold@host-1 reads local Corrosion → 10.210.5.42 (running on host-3)
5. Reply: A 10.210.5.42, TTL=5
6. App opens TCP to 10.210.5.42:5432
7. Routed via wg0 (peer host-3's AllowedIPs covers 10.210.5.0/24)
→ bridge → container
8. (If --default-deny is on, COOLIFY-ALLOW on host-3 must permit
10.210.0.10 → 10.210.5.42.)
```
#### Backend movement (zero restart on dependents)
```
T+0: myapp.db @ 10.210.5.42 on host-3. Endpoint row gossiped.
T+10s: User redeploys myapp.db on host-3.
coold@host-3:
- new container at 10.210.5.43
- INSERT new endpoint row (10.210.5.43)
- DELETE old endpoint row (10.210.5.42)
- kill old container
Corrosion gossips delta.
T+11s: All hosts have updated state.
T+15s: App on host-1 has stale TCP to 10.210.5.42 — broken when old container died.
App's reconnect logic re-resolves myapp.db → 10.210.5.43 → reconnects.
App container NEVER restarted, env NEVER changed.
```
App must have reconnect logic (every reasonable DB/cache client does). DNS provides the new IP transparently.
#### TTL
5s. Trade-off:
- Lower = faster failover, more queries.
- Higher = quieter DNS, slower failover.
Apps with infinite-cache resolvers (Java's `networkaddress.cache.ttl=-1`) won't see updates. Document for users; not coold's problem.
#### Multi-replica services
Resolver returns ALL healthy A records. Apps with proper conn pools (postgres, redis clients) handle multi-target naturally. No client-side LB protocol needed.
#### Health & staleness
- coold marks `healthy=0` on healthcheck fail. DNS stops returning that IP within next query.
- Stale-row TTL: rows older than 60s without heartbeat are pruned (owning coold heartbeats every 15s).
#### TLD
`.coolify.internal``.internal` is RFC 6761 reserved for private use. Won't collide with public TLDs. Configurable per-cluster.
#### Failure modes
| Failure | Behaviour |
|---|---|
| coold dies | Cluster DNS resolution stops. systemd restarts coold (~3s). Existing connections survive. Same profile as k8s losing CoreDNS. |
| Corrosion split-brain | Each partition serves local view; CRDT merges cleanly when partition heals. May serve stale IPs during partition. |
| Backend healthy in DB but unreachable | DNS returns IP → app connection fails → app retries. If multi-replica, may pick different one on retry. |
| Container has no `--dns` (created outside coold) | No cluster resolution. Document: only coold-managed containers get discovery. |
| Cross-region high latency | Slower convergence; stale DNS for 1030s. Acceptable v1. |
#### API surface
Same dual-transport model as the firewall API — stream from central, REST from intra-mesh callers.
```
POST /api/v1/services/register {service_id, app_id, name, namespace, port, container_id, container_ip, host_mgmt_ip}
DELETE /api/v1/services/{service_id}/endpoints/{container_id}
GET /api/v1/services/{service_id}/endpoints
GET /api/v1/services?namespace=myapp
GET /api/v1/dns/lookup/{name} (debug — what coold would answer)
GET /api/v1/dns/stats (qps, hit/miss/forward counts)
```
Most ops are automatic side effects of deploy/scale/health-check. Central rarely calls `/services/register` directly — coold registers on container create, deregisters on remove.
coold writes Corrosion rows on behalf of central (explicit `POST /services/register` frames); it does not infer service identity from container labels. Central supplies `service_id` explicitly so naming policy stays in one place.
#### Bootstrap impact
Minimal. `coolify init bootstrap` creates every `coolify-<ns>-mesh` Podman network with `--disable-dns` so netavark never starts aardvark-dns on the bridge gateway `:53`. coold owns that socket. Bridge gateway IP was always reserved by `MachineIP()`.
Pre-alpha deployments that created the network without `--disable-dns` are detected at plan-time (probe reads `podman network inspect .DNSEnabled`). A `recreate-podman-network` action drops and recreates the network — same subnet, same gateway, but with DNS disabled. Any attached containers are disconnected via `podman network rm -f`.
#### Port 53 conflict handling
Three layers protect coold's `10.210.X.1:53` socket:
| Layer | Mechanism | Covers |
|---|---|---|
| 1. Bootstrap | `podman network create --disable-dns` (+ drift recreate) | aardvark-dns squat |
| 2. Bind target | coold binds **bridge gateway IP only**, not `0.0.0.0` and not wg0 mgmt IP | host wildcard DNS daemons (dnsmasq/pihole on `0.0.0.0:53`) and wg0 bloat |
| 3. Preflight | `net.Listen("tcp", gateway+":53")` probe before `ListenPacket` | clear actionable error + systemd `Restart=on-failure` retry |
systemd-resolved on Ubuntu binds `127.0.0.53:53` — no conflict with bridge gateway.
Bind rule: coold DNS is container-facing only (listen on bridge gateway IP). coold REST API is operator-facing (listen on wg0 mgmt IP, port 8443). Separate concerns, separate sockets.
### 6. Ingress (public traffic → containers)
`coolify init` doesn't manage public ingress. v5 deploys a reverse proxy (Traefik/Caddy) per host or HA pair:
- Listens on host public IP `:80/:443`.
- Routes `Host: app.example.com` → container IP (over container bridge or wg0 if cross-host).
- Cert management via ACME.
- Coolify generates proxy config from app routing rules.
Important: ingress proxy needs its own podman network OR can share `coolify-mesh`. Sharing means proxy can reach all containers — fine since it's the entrypoint.
### 7. Deployment workflows
Deploy is a **central-side state machine** that compiles app intent (compose / Dockerfile / buildpack / Nixpacks / raw image) into a sequence of coold primitives (see §2 wire surface). coold does not participate in planning — it executes one primitive per frame.
#### Build pipeline (not in coold)
```
git push
Central receives webhook
Builder (BuildKit / Buildpacks / Nixpacks) ← coold NOT involved
- Self-hosted: first mesh host by default;
central may pin via target_host_id per build.
- Cloud: central-run.
Push to registry (registry.coolify.io or customer's) ← coold NOT involved
Central deploy controller → primitive op stream → coold on target host
```
coold's only role in the build path: `POST /images/pull` once the tag exists in the registry.
#### Deploy flow (T0T10 — every frame = one §2 primitive)
```
T0 Central builder clones source, invokes BuildKit / buildpack / nixpacks.
Output: OCI image @ registry.coolify.io/tenant/web:v2.
T1 Central deploy controller picks target host H (scheduler = least-loaded / pin).
T2 Frame: POST /images/pull {ref: "registry.coolify.io/tenant/web:v2"}
coold@H calls podman.sock /images/create, streams progress back.
T3 Frame: POST /volumes {name: "web-data", driver: "local"}
coold@H idempotent; no-op if exists.
T4 Frame: POST /containers (central templates from compose + resolved secrets)
body:
{
"image": "registry.coolify.io/tenant/web:v2",
"name": "web-v2-a3f91",
"network": "coolify-mesh",
"ip": "10.210.H.42",
"dns": ["10.210.H.1"],
"dns_search": ["coolify.internal"],
"env": {"DATABASE_URL": "postgres://…"},
"mounts": [{"volume": "web-data", "target": "/data"}],
"healthcheck": {"test": ["CMD","curl","-f","http://localhost/"], "interval": "5s"},
"labels": {"coolify.app": "web", "coolify.version": "v2"}
}
coold checks deny filter → calls podman.sock /containers/create → returns id.
T5 Frame: POST /containers/{id}/start
coold starts container.
T6 Central polls GET /containers/{id} or subscribes to events.
Wait for healthy; abort + rollback on timeout.
T7 Frame: POST /services/register
coold writes Corrosion row. Gossip distributes; DNS now answers new IP.
T8 Frame: POST /firewall/allow (on dst host — coold = sole kernel writer)
{src: proxy-ip, dst: 10.210.H.42, proto: "tcp", port: 80}
T9 Central ingress controller regenerates proxy config (Caddy/Traefik/nginx)
→ upstreams point to new container IP.
Frame: POST /containers/{proxy-id}/exec (reload) or proxy-specific reload.
T10 Cutover complete. Central retires the old container:
POST /containers/{old-id}/stop {timeout: 10}
DELETE /containers/{old-id}
DELETE /services/web/endpoints/{old-container-id}
DELETE /firewall/allow/{old-rule-id}
```
Every T-frame is one of the narrow primitives in §2. coold never runs compose, never builds, never picks hosts, never reads app config. If a future verb is needed, it gets added to §2 and the coold release, not smuggled through a passthrough.
**coold non-goals for deploy**: no compose parser, no buildpacks, no Dockerfile handler, no Nixpacks, no scheduler, no ingress templating, no rollback orchestration, no secrets store.
### 8. Storage & volumes
- Local podman volumes per host (`/var/lib/containers/storage/volumes`).
- Cross-host: distributed FS (out of scope) OR pin stateful services to a host (anti-affinity rules in scheduler).
- Backup: `podman volume export` + scp to backup target. Coolify orchestrates schedule.
- **v5 alpha decision**: stateful services **pin to host**. Cross-host volume movement / distributed FS is post-alpha.
### 9. Scheduling
**Placement lives in central.** coold provides facts (`GET /host/info`, `/host/stats`, `/host/containers`); central consumes them, picks the target host, and sends the resulting primitives. coold has no placement logic.
When user creates an app, central decides which host runs it:
- Round-robin / least-loaded / explicit pin.
- Pinned services (DB, persistent volumes) tracked in central DB.
- Re-schedule on host failure (wg0 down, last-handshake stale).
Failure detection: central polls `wg show wg0 latest-handshakes` via `GET /host/info` on every host, parses seconds-since-handshake; alerts if > N seconds.
### 10. Observability
coold exposes read-only `/host/*` endpoints surfacing the facts below. Central (or a central-side scraper) pulls from each host and feeds Prometheus / VictoriaMetrics. coold does **not** push metrics.
Per host metrics (over wg0 via coold endpoints):
- `GET /host/info` → podman info (version, storage driver, free space), kernel, wg state, load.
- `GET /host/containers``podman ps -a --format json` state.
- `GET /host/stats``podman stats --no-stream --format json` CPU/mem per container.
- Wg handshake + transfer bytes via `GET /host/info` (`wg show wg0 dump` internally).
- `iptables -nvL COOLIFY-ALLOW` match counters (for audit) exposed through `GET /firewall/allow` with counters.
Stream into central time-series store (Prometheus / VictoriaMetrics).
### 11. Updates
- Coolify runtime image self-updates (container restart with new image).
- WireGuard / Podman package updates: `coolify init bootstrap` re-runs idempotently and picks up newer packages from apt. Agent (coold/corrosion/scheduler/builder) bumps go through `coolify init upgrade --coold-version vX.Y.Z` etc. Schedule periodic re-apply (weekly?).
- Mesh config changes (new host, removed host) trigger re-apply on all hosts; control plane orchestrates.
### 12. Security posture
- **Private keys never leave hosts**: WG private key generated on remote, never transits SSH (already done by bootstrap).
- **Podman socket access**: `/run/podman/podman.sock` stays as a rootful Unix socket on each host — **NEVER exposed on TCP**. Only **coold** (per-host agent, see §2) has access via bind-mount. coold surfaces a curated REST API over wg0 with TLS + bearer auth. This means:
- Compromise of a non-coold container does NOT grant podman API access.
- coold enforces bearer-token authn and can deny dangerous flags (e.g. `--privileged`) at the API surface. RBAC, per-user/tenant scoping, and business audit live **only** in central Coolify (see §3 split).
- No `podman system service tcp://...` listener; no need for socket-level TLS.
- Central Coolify only knows the coold endpoint, not the underlying socket.
- **SSH access**: bootstrap uses key-based SSH. Control plane should rotate SSH keys per agent install, store in encrypted DB. After bootstrap, day-to-day ops go via coold REST — SSH is for re-bootstrap only.
- **Host firewall (iptables INPUT chain)**: bootstrap doesn't lock down INPUT. v5 should drop public access to ports other than `:51820/udp` (WG), `:22/tcp` (SSH), `:80/:443` (ingress). coold's `:8443` binds to the wg0 IP only, so it's already not on the public interface.
- **coold port reachability**: central never dials in — coold's outbound stream is the control path — so no `COOLIFY-ALLOW` rule for central is needed. coold's local REST on wg0 mgmt IP (`:8443`) is reachable only from inside the mesh, and is used by (a) the `coolify firewall` CLI via SSH-bounce, (b) other coolds in the same mesh, (c) an optional per-customer gateway. Nothing on the public internet reaches coold. Outbound TLS :443 to central must be permitted by the customer's egress firewall — standard for any SaaS agent.
- **Audit**: central Coolify is the sole authoritative audit log — who-when-why metadata for every COOLIFY-ALLOW change. coold writes only an ops/debug request log (request id, endpoint, status, duration) for troubleshooting; it never sees the identity of the human caller, only the bearer token used to reach it.
### 13. Failure modes & recovery
| Failure | Detection | Recovery |
|---|---|---|
| Host SSH unreachable | bootstrap apply error | Manual investigation; node marked unhealthy in DB |
| WG peer offline (`latest_handshake > 180s`) | `wg show` poll | Mark unhealthy; re-schedule containers if pinning permits |
| Podman socket unreachable | API call timeout | Restart `podman.socket`; if persistent, re-bootstrap |
| Firewall service failed | `systemctl is-active != active` | Re-run `coolify init bootstrap`; service is idempotent |
| Container OOM/crash | `podman events` watcher | Restart per restart policy; alert after N crashes |
| Container subnet exhausted | allocator returns error | Alert; offer apply with bigger `--container-prefix` |
| Mgmt IP exhausted | allocator returns error | Alert; rare for /16 |
| `coolify-mesh` bridge missing | probe `podman network exists` returns no | Re-run apply |
| User manually deletes COOLIFY-ALLOW chain | runtime check | Re-run apply (recreates chain via service restart) |
### 14. Multi-tenancy (deferred)
If Coolify ever supports tenant isolation:
- Tenant = own podman network namespace per host.
- Allows always scoped within tenant; cross-tenant requires explicit allow.
- Pool subdivided per tenant. Allocator extension.
Not in v1 or v5 initial.
---
## Out of scope (now and likely v5)
- Rootless containers (would need user namespace mapping, separate sockets per user).
- IPv6 mesh (`fdcc::` style, ip6tables mirror).
- Hardware-level isolation (SELinux profiles, AppArmor).
- Live migration (qemu/criu).
- Distributed storage (Ceph/Longhorn).
- macvlan / SR-IOV networking.
- Autoscaling.
- BGP / external network announcements.
---
## Quick reference — operations the agent CLI should expose
(Future `coolify-cli` subcommands beyond `init`)
```
coolify deploy <app> # build + push + run
coolify scale <app> --replicas N
coolify firewall containers --servers A,B ... # discover mesh containers (SSH+podman)
coolify firewall list --servers A,B ... # list allow rules across hosts (coold GET /allow, SSH-bounced)
coolify firewall allow --from <ref> --to <ref> --port N # add allow rule (coold POST /allow, SSH-bounced)
coolify firewall revoke --from <ref> --to <ref> --port N # remove allow rule (coold DELETE /allow/{id})
coolify host list # show mesh state, last-handshake, container count
coolify host add <ip> --ssh-key K
coolify host remove <ip>
coolify logs <container>
coolify exec <container> -- sh
```
`coolify firewall` is implemented today as a thin SSH-bounced REST client of coold (§3 above). The laptop running the CLI isn't a mesh peer, so every call SSHes into the target host and runs `curl "http://<wg0-mgmt-ip>:8443/api/v1/firewall/..."` against coold locally. Per-host bearer tokens are fetched from `/etc/coolify/api-token` on demand (with `--coold-token` as an override for homogeneous test clusters).
Everything else on the roadmap (`coolify deploy`, `coolify scale`, `coolify logs`, `coolify exec`) targets the **central** API (SaaS or self-hosted central), not coold directly. Central compiles the request into the primitive-op sequence in §7 and streams it to coold. Only `coolify firewall` currently bypasses central and hits coold directly — legacy + test harness until central wires up `/firewall/*` itself.
---
## Summary
`coolify init bootstrap` does the **first-time host install**: WG mesh, podman runtime, bridge network, default-deny scaffold, coold/corrosion/scheduler/builder agents. `coolify init extend` adds hosts to an existing mesh without disturbing converged ones; `coolify init upgrade` bumps agent versions across the fleet. After that, **everything dynamic is the v5 control plane's job**: container lifecycle, allow rules in COOLIFY-ALLOW (via systemd dropins for persistence), scheduling, observability, ingress, updates.
The pieces communicate via:
1. **SSH** for host provisioning + re-converge (idempotent `coolify init bootstrap` / `extend` / `upgrade` re-runs). SSH is the installer channel only, not a steady-state control path.
2. **coold → central outbound stream** (WSS / gRPC bidi on :443) for day-to-day runtime ops from central. One topology for self-hosted and cloud SaaS; central never dials coold, never joins any mesh. Per-customer gateway (optional) collapses N streams into 1 per mesh.
3. **coold local REST API** on wg0 mgmt IP (`http://100.64.0.X:8443`) for intra-mesh callers: the `coolify firewall` CLI via SSH-bounce, other coolds, the per-customer gateway. Never reachable from the public internet.
coold is the *only* process with access to the local podman socket AND the sole writer of allow rules in COOLIFY-ALLOW. Both transports hit the same API surface.
Persistence model:
- Bootstrap state (chains, jumps, conntrack accept) → idempotent `coolify init bootstrap` re-runs (and `extend` when a namespace is added).
- Rule metadata (who/when/why, audit, RBAC, tenant scoping) → central Coolify DB only. coold does not duplicate this.
- Kernel rules → programmed by coold on every API call (from either central Coolify or the `coolify firewall` CLI); mirrored to `/etc/coolify/allow.rules` for reboot via `coolify-mesh-allow.service` (oneshot `iptables-restore --noflush`).
- Today the `coolify firewall` CLI is the primary caller of coold (SSH-bounced REST client with per-host `/etc/coolify/api-token` resolution). Central Coolify will call the same API once wired.
The podman socket is host-local. There is no TCP podman API. coold is the **authn + privilege boundary** between any caller (central Coolify over the outbound stream, or the `coolify firewall` CLI via SSH-bounced local REST) and the host, AND the kernel-rule applier. Central Coolify owns RBAC, tenant scoping, and the business audit log (who/when/why). coold only verifies a bearer token (per-host static for local REST; per-host JWT for the stream), applies the rule, and keeps an ops/debug request log. `coolify firewall` exercises the local REST surface today; central will exercise the stream surface — same code path end-to-end, different transport.
**coold stays small.** All app-aware logic (compose, Dockerfile, buildpacks, Nixpacks, scheduling, rollback, ingress templating, RBAC, audit) lives in central. coold's wire surface is enumerable (§2); new verbs require a coold release, not a `/podman/raw` passthrough. If coold ever grows a `/apps` or `/compose` endpoint, that is the wrong layer.
+23 -24
View File
@@ -44,27 +44,30 @@ Once you publish the release:
- **Linux**: amd64, arm64
- **macOS (Darwin)**: amd64, arm64
- **Windows**: amd64, arm64
3. Goreleaser injects the version from the tag into the binaries via ldflags (into `internal/version.version`)
3. Goreleaser injects the version from the tag into the binaries
4. Binaries are automatically uploaded to the release
5. A follow-up `update-version` job then:
- Updates the `version` constant in `internal/version/checker.go` to the new tag
- Commits the bump to `v4.x` as `chore: bump version to vX.Y.Z`
- Force-moves the release tag to point at that new commit
6. GoReleaser also publishes a Homebrew formula to the tap at [`coollabsio/homebrew-coolify-cli`](https://github.com/coollabsio/homebrew-coolify-cli) (under `Formula/coolify-cli.rb`), using the `HOMEBREW_TAP_GITHUB_TOKEN` secret
7. The release becomes available at:
5. The release becomes available at:
- GitHub: `https://github.com/coollabsio/coolify-cli/releases/tag/v1.x.x`
- Install script: `curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash`
- Homebrew: `brew install coollabsio/coolify-cli/coolify-cli`
- `go install`: `go install github.com/coollabsio/coolify-cli/coolify@v1.x.x`
### 3. Verify the Release
After the workflow completes (usually 2-5 minutes), verify without touching your local install:
After the workflow completes (usually 2-5 minutes):
1. Check the release page has all platform binaries (Linux/macOS/Windows × amd64/arm64)
2. Confirm the `update-version` job committed the bump on `v4.x` (look for `chore: bump version to vX.Y.Z`) and that the tag now points at that commit
3. Confirm `internal/version/checker.go` on `v4.x` has the new version
4. Confirm the Homebrew tap has a new `Formula/coolify-cli.rb` commit for this version at https://github.com/coollabsio/homebrew-coolify-cli
1. Check the release page has all platform binaries
2. Test the install script:
```bash
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash
coolify version
```
3. Test the auto-update functionality:
```bash
# If you have an older version installed
coolify update
coolify version # Should show the new version
```
4. Verify the version matches your release
## Troubleshooting
@@ -76,10 +79,9 @@ After the workflow completes (usually 2-5 minutes), verify without touching your
- GoReleaser configuration issues
### Version Not Updating
- The version is injected at build time via ldflags into `internal/version.version` — you do **not** need to edit it manually before releasing. The post-release `update-version` job also rewrites `internal/version/checker.go` on `v4.x`.
- If the hardcoded fallback in `internal/version/checker.go` is stale, check that the `update-version` job ran successfully after the release.
- Ensure you committed the version change in `cmd/root.go`
- The tag must start with `v` (e.g., `v1.2.3`, not `1.2.3`)
- Check that the workflow has write permissions (`contents: write` in `release-cli.yml`)
- Check that the workflow has write permissions
### Install Script Not Finding New Version
- Wait a few minutes for GitHub's CDN to update
@@ -92,33 +94,30 @@ Before creating a release:
- [ ] All tests pass: `go test ./internal/...`
- [ ] Code is formatted: `go fmt ./...`
- [ ] Version updated in `cmd/root.go`
- [ ] Changes merged to `v4.x` branch
- [ ] Release notes prepared
> Note: You do **not** need to bump the version manually. GoReleaser injects the tag version via ldflags, and the `update-version` CI job commits the bump to `internal/version/checker.go` after the release.
After creating a release:
- [ ] GitHub Actions workflow completed successfully (both `release-cli` and `update-version` jobs)
- [ ] GitHub Actions workflow completed successfully
- [ ] All platform binaries are present on the release page
- [ ] `internal/version/checker.go` on `v4.x` shows the new version
- [ ] Homebrew tap has a fresh `Formula/coolify-cli.rb` commit
- [ ] Install script downloads the new version
- [ ] `coolify version` returns the correct version
## Configuration Files
The release process uses these configuration files:
- `.goreleaser.yml` - GoReleaser configuration (build matrix, archives, Homebrew tap) - entry point is `./coolify/main.go`
- `.goreleaser.yml` - GoReleaser configuration (build matrix, archives, etc.) - points to `/coolify` as entry point
- `.github/workflows/release-cli.yml` - GitHub Actions workflow
- `scripts/install.sh` - User-facing install script
- `internal/version/checker.go` - Contains `GetVersion()` function that returns the current version
- `coolify/main.go` - Binary entry point for `go install` support
- [`coollabsio/homebrew-coolify-cli`](https://github.com/coollabsio/homebrew-coolify-cli) - External Homebrew tap updated automatically on each release
## Notes
- The CLI has auto-update checking built-in (checks every 10 minutes)
- Users can manually update with `coolify update`
- Install script supports version pinning: `bash install.sh v1.2.3`
- Homebrew users can install via `brew install coollabsio/coolify-cli/coolify-cli` (the tap at https://github.com/coollabsio/homebrew-coolify-cli is auto-updated by GoReleaser)
- Releases are immutable - if you need to fix something, create a new patch version
+1 -1
View File
@@ -7,7 +7,7 @@
#### Linux/macOS
```bash
curl -fsSL https://gitamin.ir/IranAccess/coolify-cli/raw/branch/v4.x/scripts/install.sh | bash
curl -fsSL https://raw.githubusercontent.com/coollabsio/coolify-cli/main/scripts/install.sh | bash
```
It will install the CLI in `/usr/local/bin/coolify` and the configuration file in `~/.config/coolify/config.json`
-11
View File
@@ -5,7 +5,6 @@ import (
"github.com/coollabsio/coolify-cli/cmd/application/create"
"github.com/coollabsio/coolify-cli/cmd/application/env"
"github.com/coollabsio/coolify-cli/cmd/application/previews"
"github.com/coollabsio/coolify-cli/cmd/application/storage"
)
@@ -58,15 +57,5 @@ func NewAppCommand() *cobra.Command {
storageCmd.AddCommand(storage.NewDeleteCommand())
cmd.AddCommand(storageCmd)
// Add previews subcommand with its children
previewsCmd := &cobra.Command{
Use: "previews",
Aliases: []string{"preview"},
Short: "Manage application preview deployments",
Long: `Manage preview deployments created from pull requests. Requires the application UUID.`,
}
previewsCmd.AddCommand(previews.NewDeletePreviewCommand())
cmd.AddCommand(previewsCmd)
return cmd
}
-72
View File
@@ -1,72 +0,0 @@
package previews
import (
"fmt"
"strconv"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/cli"
"github.com/coollabsio/coolify-cli/internal/service"
)
func NewDeletePreviewCommand() *cobra.Command {
deletePreviewCmd := &cobra.Command{
Use: "delete <app_uuid> <pr_id>",
Short: "Delete a preview deployment",
Long: `Delete a preview deployment for an application. First argument is the application UUID, second is the pull request ID.`,
Args: cli.ExactArgs(2, "<app_uuid> <pr_id>"),
RunE: func(cmd *cobra.Command, args []string) error {
ctx := cmd.Context()
appUUID := args[0]
prID := args[1]
prIDInt, err := strconv.Atoi(prID)
if err != nil {
return fmt.Errorf("invalid pr_id: must be an integer")
}
if prIDInt <= 0 {
return fmt.Errorf("invalid pr_id: must be a positive integer")
}
client, err := cli.GetAPIClient(cmd)
if err != nil {
return fmt.Errorf("failed to get API client: %w", err)
}
if err := cli.CheckMinimumVersion(ctx, client, "4.0.0-beta.474"); err != nil {
return err
}
force, _ := cmd.Flags().GetBool("force")
// Prompt for confirmation unless --force is used
if !force {
var response string
fmt.Printf("Are you sure you want to delete the preview deployment for PR %s? (yes/no): ", prID)
_, err := fmt.Scanln(&response)
if err != nil {
return fmt.Errorf("failed to read confirmation: %w", err)
}
if response != "yes" && response != "y" {
fmt.Println("Delete cancelled.")
return nil
}
}
appSvc := service.NewApplicationService(client)
err = appSvc.DeletePreview(ctx, appUUID, prID)
if err != nil {
return fmt.Errorf("failed to delete preview deployment: %w", err)
}
fmt.Printf("Preview deployment for PR %s deleted successfully.\n", prID)
return nil
},
}
deletePreviewCmd.Flags().Bool("force", false, "Skip confirmation prompt")
return deletePreviewCmd
}
-105
View File
@@ -1,105 +0,0 @@
// Package common holds flag structs and helpers shared between the
// `coolify init` and `coolify firewall` command trees. Kept intentionally
// small: only cross-command plumbing (SSH mesh flags, namespace validation)
// lives here.
//
//nolint:revive // "common" is the conventional sharing point for these cobra subtrees
package common
import (
"fmt"
"regexp"
"github.com/spf13/cobra"
)
// DefaultNamespace is the namespace used when the user does not pass
// --namespaces. It is also always present (implicitly) so existing workflows
// and coold defaults keep working.
const DefaultNamespace = "default"
// PodmanNetworkFor returns the podman bridge network name that backs
// namespace ns on every host. Derived as `coolify-<ns>-mesh` so the
// namespace name is visible in `podman network ls`.
func PodmanNetworkFor(ns string) string {
return "coolify-" + ns + "-mesh"
}
// MeshNetFlags holds the flag set shared between `coolify init` (which creates
// per-namespace podman networks on every host) and `coolify firewall` (which
// talks to coold about per-namespace rules).
//
// `init` binds it as a slice so a single command sets up the entire cluster;
// `firewall` binds it as a single value since each allow/revoke/list call
// operates on one namespace at a time.
type MeshNetFlags struct {
// Namespaces enumerates every namespace the mesh should carry. At least
// one entry is required; the first element is the implicit "default"
// unless the user overrides it.
Namespaces []string
// ContainerPool is the shared address pool every namespace carves its
// per-host /<ContainerPrefix> from. One pool covers all namespaces;
// subnets never overlap.
ContainerPool string
// ContainerPrefix is the prefix length of each per-host, per-namespace
// container subnet (default 24 → 254 container IPs per host per ns).
ContainerPrefix int
}
// BindMeshNetMultiFlags registers --namespaces/--container-pool/--container-prefix
// on cmd (init-style: many namespaces per invocation).
func BindMeshNetMultiFlags(cmd *cobra.Command, f *MeshNetFlags) {
pf := cmd.PersistentFlags()
pf.StringSliceVar(&f.Namespaces, "namespaces", []string{DefaultNamespace},
"Comma-separated list of namespaces to create on each host. Each "+
"namespace is a separate Podman bridge network (coolify-<ns>-mesh) "+
"with its own /<container-prefix> per host")
pf.StringVar(&f.ContainerPool, "container-pool", "10.210.0.0/16",
"Shared container address pool — each (namespace, host) pair gets a "+
"/<container-prefix> from here, owned by that namespace's Podman bridge")
pf.IntVar(&f.ContainerPrefix, "container-prefix", 24,
"Prefix length of each per-host, per-namespace container subnet")
}
// BindMeshNetSingleFlags registers --namespace on cmd (firewall-style: one
// namespace per invocation).
func BindMeshNetSingleFlags(cmd *cobra.Command, ns *string) {
pf := cmd.PersistentFlags()
pf.StringVar(ns, "namespace", DefaultNamespace,
"Namespace the command operates against (must match a namespace created by `coolify init`)")
}
// namespaceRegex matches a valid DNS label (namespace names appear in the
// podman network name, in iptables chain names, and — post-coold-changes —
// as DNS labels like web.<ns>.coolify.internal).
var namespaceRegex = regexp.MustCompile(`^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$`)
// ValidateNamespaces checks that every namespace is a valid DNS label and
// that the list has no duplicates.
func (f *MeshNetFlags) ValidateNamespaces() error {
if len(f.Namespaces) == 0 {
return fmt.Errorf("--namespaces must list at least one namespace")
}
seen := make(map[string]struct{}, len(f.Namespaces))
for _, ns := range f.Namespaces {
if !namespaceRegex.MatchString(ns) {
return fmt.Errorf("invalid namespace %q (must be a DNS label: lowercase alphanumerics + '-', 1-63 chars)", ns)
}
if _, dup := seen[ns]; dup {
return fmt.Errorf("duplicate namespace %q in --namespaces", ns)
}
seen[ns] = struct{}{}
}
return nil
}
// ValidateNamespace validates a single namespace value (used by the firewall
// command's --namespace flag).
func ValidateNamespace(ns string) error {
if !namespaceRegex.MatchString(ns) {
return fmt.Errorf("invalid --namespace %q (must be a DNS label: lowercase alphanumerics + '-', 1-63 chars)", ns)
}
return nil
}
-95
View File
@@ -1,95 +0,0 @@
// Package common hosts flag sets and helpers shared between multiple
// top-level commands that SSH into a list of servers (init, firewall, ...).
package common
import (
"fmt"
"os"
"time"
"github.com/spf13/cobra"
"golang.org/x/term"
internalssh "github.com/coollabsio/coolify-cli/internal/ssh"
)
// SSHMeshFlags holds the flags shared by every command that fans out over
// a list of SSH-reachable servers (coolify init, coolify firewall, ...).
type SSHMeshFlags struct {
Servers []string
SSHKey string
SSHUser string
SSHPort int
SSHPassphrasePrompt bool
Concurrency int
SSHTimeout string
}
// BindSSHMeshFlags registers the shared flags as PersistentFlags on cmd.
func BindSSHMeshFlags(cmd *cobra.Command, f *SSHMeshFlags) {
pf := cmd.PersistentFlags()
pf.StringSliceVar(&f.Servers, "servers", nil,
"Comma-separated server IPs (required)")
pf.StringVar(&f.SSHKey, "ssh-key", "",
"Path to SSH private key used to connect to servers (required)")
pf.StringVar(&f.SSHUser, "ssh-user", "root",
"SSH username")
pf.IntVar(&f.SSHPort, "ssh-port", 22,
"SSH port")
pf.BoolVar(&f.SSHPassphrasePrompt, "ssh-passphrase-prompt", false,
"Prompt for SSH key passphrase (also reads COOLIFY_SSH_PASSPHRASE env var)")
pf.IntVar(&f.Concurrency, "concurrency", 10,
"Maximum number of parallel SSH connections")
pf.StringVar(&f.SSHTimeout, "ssh-timeout", "30s",
"SSH connection timeout (e.g. 30s, 1m)")
}
// ParseSSHTimeout parses SSHTimeout, falling back to 30s on error/zero.
func (f *SSHMeshFlags) ParseSSHTimeout() time.Duration {
d, err := time.ParseDuration(f.SSHTimeout)
if err != nil || d <= 0 {
return 30 * time.Second
}
return d
}
// ResolvePassphrase returns the SSH key passphrase in this priority order:
// 1. COOLIFY_SSH_PASSPHRASE env var
// 2. Interactive prompt when --ssh-passphrase-prompt is set
// 3. nil (no passphrase)
func (f *SSHMeshFlags) ResolvePassphrase() ([]byte, error) {
if env := os.Getenv("COOLIFY_SSH_PASSPHRASE"); env != "" {
return []byte(env), nil
}
if f.SSHPassphrasePrompt {
fmt.Fprint(os.Stderr, "SSH key passphrase: ")
pass, err := term.ReadPassword(int(os.Stdin.Fd()))
fmt.Fprintln(os.Stderr)
if err != nil {
return nil, fmt.Errorf("read passphrase: %w", err)
}
return pass, nil
}
return nil, nil
}
// BuildSSHClient creates an SSH client, resolving any key passphrase first.
func (f *SSHMeshFlags) BuildSSHClient() (*internalssh.Client, error) {
passphrase, err := f.ResolvePassphrase()
if err != nil {
return nil, err
}
return internalssh.NewClient(f.SSHKey, passphrase, f.ParseSSHTimeout())
}
// Validate checks that the required flags are set.
func (f *SSHMeshFlags) Validate() error {
if len(f.Servers) == 0 {
return fmt.Errorf("--servers is required")
}
if f.SSHKey == "" {
return fmt.Errorf("--ssh-key is required")
}
return nil
}
-57
View File
@@ -1,57 +0,0 @@
package common
import (
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestSSHMeshFlags_ParseSSHTimeout(t *testing.T) {
tests := []struct {
input string
want time.Duration
}{
{"30s", 30 * time.Second},
{"1m", time.Minute},
{"invalid", 30 * time.Second},
{"0s", 30 * time.Second},
{"", 30 * time.Second},
}
for _, tt := range tests {
f := &SSHMeshFlags{SSHTimeout: tt.input}
assert.Equal(t, tt.want, f.ParseSSHTimeout(), "input: %q", tt.input)
}
}
func TestSSHMeshFlags_Validate(t *testing.T) {
t.Run("missing servers", func(t *testing.T) {
err := (&SSHMeshFlags{SSHKey: "/k"}).Validate()
require.Error(t, err)
assert.Contains(t, err.Error(), "--servers")
})
t.Run("missing ssh key", func(t *testing.T) {
err := (&SSHMeshFlags{Servers: []string{"1.1.1.1"}}).Validate()
require.Error(t, err)
assert.Contains(t, err.Error(), "--ssh-key")
})
t.Run("valid", func(t *testing.T) {
err := (&SSHMeshFlags{Servers: []string{"1.1.1.1"}, SSHKey: "/k"}).Validate()
require.NoError(t, err)
})
}
func TestSSHMeshFlags_ResolvePassphrase_Env(t *testing.T) {
t.Setenv("COOLIFY_SSH_PASSPHRASE", "hunter2")
pass, err := (&SSHMeshFlags{}).ResolvePassphrase()
require.NoError(t, err)
assert.Equal(t, []byte("hunter2"), pass)
}
func TestSSHMeshFlags_ResolvePassphrase_NoPrompt(t *testing.T) {
t.Setenv("COOLIFY_SSH_PASSPHRASE", "")
pass, err := (&SSHMeshFlags{SSHPassphrasePrompt: false}).ResolvePassphrase()
require.NoError(t, err)
assert.Nil(t, pass)
}
+4
View File
@@ -10,6 +10,10 @@ import (
func TestNewConfigCommand(t *testing.T) {
cmd := NewConfigCommand()
if cmd == nil {
t.Fatal("NewConfigCommand() returned nil")
}
if cmd.Use != "config" {
t.Errorf("Expected Use to be 'config', got '%s'", cmd.Use)
}
-255
View File
@@ -1,255 +0,0 @@
package firewall
import (
"context"
"fmt"
"net"
"os"
"strings"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/cmd/common"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// allowRevokeFlags are the per-subcommand flags for `allow` / `revoke`.
type allowRevokeFlags struct {
From string
To string
Port int
Proto string
Bidirectional bool
}
// newAllowCommand builds `coolify firewall allow`.
func newAllowCommand(parent *Flags) *cobra.Command {
local := &allowRevokeFlags{}
cmd := &cobra.Command{
Use: "allow",
Short: "Add an allow rule (from container → to container:port)",
RunE: func(cmd *cobra.Command, _ []string) error {
return runAllowRevoke(cmd.Context(), cmd, parent, local, false)
},
}
bindAllowRevokeFlags(cmd, local)
return cmd
}
// newRevokeCommand builds `coolify firewall revoke`.
func newRevokeCommand(parent *Flags) *cobra.Command {
local := &allowRevokeFlags{}
cmd := &cobra.Command{
Use: "revoke",
Short: "Remove an allow rule",
RunE: func(cmd *cobra.Command, _ []string) error {
return runAllowRevoke(cmd.Context(), cmd, parent, local, true)
},
}
bindAllowRevokeFlags(cmd, local)
return cmd
}
func bindAllowRevokeFlags(cmd *cobra.Command, f *allowRevokeFlags) {
pf := cmd.Flags()
pf.StringVar(&f.From, "from", "",
"Source container (name, short-id, raw IP, or host:name) — required")
pf.StringVar(&f.To, "to", "",
"Destination container (name, short-id, raw IP, or host:name) — required")
pf.IntVar(&f.Port, "port", 0,
"Destination port (required unless --proto is empty)")
pf.StringVar(&f.Proto, "proto", "tcp",
"Protocol (tcp, udp, or empty for any)")
pf.BoolVar(&f.Bidirectional, "bidirectional", false,
"Also install the reverse rule on the source host (default: one-way; conntrack handles replies)")
}
func validateAllowRevokeFlags(f *allowRevokeFlags) error {
if f.From == "" {
return fmt.Errorf("--from is required")
}
if f.To == "" {
return fmt.Errorf("--to is required")
}
if f.Proto != "" && f.Proto != "tcp" && f.Proto != "udp" {
return fmt.Errorf("--proto must be tcp, udp, or empty (got %q)", f.Proto)
}
if f.Proto != "" && f.Port <= 0 {
return fmt.Errorf("--port is required when --proto is set")
}
return nil
}
func runAllowRevoke(
ctx context.Context,
cmd *cobra.Command,
parent *Flags,
local *allowRevokeFlags,
revoke bool,
) error {
if err := parent.Validate(); err != nil {
return err
}
if err := common.ValidateNamespace(parent.Namespace); err != nil {
return err
}
if err := validateAllowRevokeFlags(local); err != nil {
return err
}
runner, err := parent.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
return emitAllowRevoke(ctx, cmd, parent, local, runner, revoke)
}
// emitAllowRevoke is the core path: discover → resolve → build rule → apply.
// Split from the cobra wrapper so tests inject a fake ssh.Runner.
func emitAllowRevoke(
ctx context.Context,
cmd *cobra.Command,
parent *Flags,
local *allowRevokeFlags,
runner ssh.Runner,
revoke bool,
) error {
all, results := discoverAllViaPkg(ctx, runner, parent)
for _, r := range results {
if r.Err != nil {
fmt.Fprintf(os.Stderr, "Warning: discover %s: %v\n", r.Host, r.Err)
}
}
from, err := resolveEndpoint(local.From, all)
if err != nil {
return fmt.Errorf("--from: %w", err)
}
to, err := resolveEndpoint(local.To, all)
if err != nil {
return fmt.Errorf("--to: %w", err)
}
if from.IP == nil || to.IP == nil {
return fmt.Errorf("failed to resolve endpoint IPs (from=%s to=%s)", local.From, local.To)
}
// Determine destination host (rule owner). If `to` was resolved from a
// raw IP with no container match, try to map it via discovery first.
dstHost := to.Host
if dstHost == "" {
if h, ok := findHostForIP(to.IP, all); ok {
dstHost = h
}
}
if dstHost == "" {
return fmt.Errorf("cannot determine destination host for IP %s — no container on the mesh owns it", to.IP)
}
srcHost := from.Host
if srcHost == "" {
if h, ok := findHostForIP(from.IP, all); ok {
srcHost = h
}
}
ns := parent.Namespace
primary := ifw.AllowRule{
Host: dstHost,
Namespace: ns,
Src: from.IP,
Dst: to.IP,
Proto: local.Proto,
Port: local.Port,
Comment: "cid:" + ifw.ComputeID(ns, from.IP, to.IP, local.Proto, local.Port),
}
rules := []ifw.AllowRule{primary}
if local.Bidirectional {
if srcHost == "" {
return fmt.Errorf("--bidirectional requires the source endpoint to belong to a mesh host")
}
reverse := ifw.AllowRule{
Host: srcHost,
Namespace: ns,
Src: to.IP,
Dst: from.IP,
Proto: local.Proto,
Port: local.Port,
Comment: "cid:" + ifw.ComputeID(ns, to.IP, from.IP, local.Proto, local.Port),
}
rules = append(rules, reverse)
}
action := "allow"
past := "allowed"
if revoke {
action = "revoke"
past = "revoked"
}
tokenFor := tokenResolver(ctx, runner, parent)
for _, r := range rules {
token, terr := tokenFor(r.Host)
if terr != nil {
return fmt.Errorf("%s on %s: %w", action, r.Host, terr)
}
var rerr error
if revoke {
// Revoke by id — coold is idempotent (204 even on unknown id).
id := strings.TrimPrefix(r.Comment, "cid:")
rerr = ifw.CooldRevoke(ctx, runner, r.Host, parent.SSHUser,
parent.SSHPort, parent.CooldPort, parent.WGInterface, token, id)
} else {
rerr = ifw.CooldApply(ctx, runner, r.Host, parent.SSHUser,
parent.SSHPort, parent.CooldPort, parent.WGInterface, token, r)
}
if rerr != nil {
return fmt.Errorf("%s on %s: %w", action, r.Host, rerr)
}
fmt.Fprintf(os.Stderr, "%s on %s: %s → %s %s/%d\n",
past, r.Host, ipOrAny(r.Src), ipOrAny(r.Dst),
protoOrAny(r.Proto), r.Port)
}
rows := make([]models.AllowRuleRow, 0, len(rules))
for _, r := range rules {
rows = append(rows, models.AllowRuleRow{
Host: r.Host,
Namespace: r.Namespace,
ID: r.Comment,
Src: r.Src.String(),
Dst: r.Dst.String(),
Proto: r.Proto,
Port: r.Port,
Comment: r.Comment,
})
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
if format == "" {
format = output.FormatTable
}
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
if format == output.FormatJSON || format == output.FormatPretty {
return formatter.Format(models.FirewallAllowOutput{Rules: rows})
}
return formatter.Format(rows)
}
func ipOrAny(ip net.IP) string {
if ip == nil {
return "any"
}
return ip.String()
}
func protoOrAny(p string) string {
if p == "" {
return "any"
}
return p
}
-247
View File
@@ -1,247 +0,0 @@
package firewall
import (
"context"
"strings"
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coollabsio/coolify-cli/cmd/common"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
func TestValidateAllowRevokeFlags(t *testing.T) {
t.Run("missing from", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{To: "x", Port: 80, Proto: "tcp"})
require.Error(t, err)
assert.Contains(t, err.Error(), "--from")
})
t.Run("missing to", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "x", Port: 80, Proto: "tcp"})
require.Error(t, err)
assert.Contains(t, err.Error(), "--to")
})
t.Run("missing port with proto", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "a", To: "b", Proto: "tcp"})
require.Error(t, err)
assert.Contains(t, err.Error(), "--port")
})
t.Run("bad proto", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "a", To: "b", Proto: "icmp", Port: 1})
require.Error(t, err)
})
t.Run("ok tcp", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "a", To: "b", Proto: "tcp", Port: 80})
require.NoError(t, err)
})
t.Run("ok no-proto no-port", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "a", To: "b", Proto: "", Port: 0})
require.NoError(t, err)
})
}
// cmdFakeRunner matches a Runner call against substrings in its response map
// and returns the first hit. Mirrors cmd/init/plan_test.go's pattern.
type cmdFakeRunner struct {
responses map[string]string
calls []string
}
func (f *cmdFakeRunner) Run(_ context.Context, _, _ string, _ int, cmd string) (string, string, error) {
f.calls = append(f.calls, cmd)
for sub, resp := range f.responses {
if strings.Contains(cmd, sub) {
return resp, "", nil
}
}
return "", "", nil
}
var _ ssh.Runner = (*cmdFakeRunner)(nil)
func rootCmdFor(cmd *cobra.Command) {
root := &cobra.Command{Use: "coolify"}
root.PersistentFlags().String("format", "table", "")
root.AddCommand(cmd)
}
// parentWithToken builds a Flags pre-wired for the REST path:
// single test host, coold port 8443, non-empty bearer token.
func parentWithToken() *Flags {
return &Flags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"h1"}, SSHUser: "root", SSHPort: 22, Concurrency: 1,
},
Namespace: common.DefaultNamespace,
CooldToken: "test-token",
CooldPort: 8443,
WGInterface: "wg0",
}
}
func TestEmitAllowRevoke_PostsOneAllowToCoold(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
}}
parent := parentWithToken()
local := &allowRevokeFlags{
From: "10.210.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.NoError(t, err)
var posts []string
for _, c := range fr.calls {
if strings.Contains(c, "-X POST") && strings.Contains(c, "/api/v1/firewall/allow") {
posts = append(posts, c)
}
}
assert.Len(t, posts, 1)
// Token carried in Authorization header.
assert.Contains(t, posts[0], "Authorization: Bearer test-token")
// JSON body carries namespace + src/dst/port.
assert.Contains(t, posts[0], `"namespace":"default"`)
assert.Contains(t, posts[0], `"src":"10.210.1.5"`)
assert.Contains(t, posts[0], `"dst":"10.210.0.10"`)
assert.Contains(t, posts[0], `"port":80`)
// Discovers mgmt IP via wg0 before curl.
assert.Contains(t, posts[0], "ip -4 -o addr show wg0")
}
// TestEmitAllowRevoke_CarriesNonDefaultNamespace verifies that the user's
// chosen namespace propagates into the JSON body (and therefore into the
// cid hash coold will compute).
func TestEmitAllowRevoke_CarriesNonDefaultNamespace(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.220.0.10",
}}
parent := parentWithToken()
parent.Namespace = "alpha"
local := &allowRevokeFlags{
From: "10.220.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.NoError(t, err)
var post string
for _, c := range fr.calls {
if strings.Contains(c, "-X POST") {
post = c
}
}
assert.NotEmpty(t, post)
assert.Contains(t, post, `"namespace":"alpha"`)
// Discovery targets the alpha-namespace bridge, not the default one.
var psCalls []string
for _, c := range fr.calls {
if strings.Contains(c, "podman ps") {
psCalls = append(psCalls, c)
}
}
assert.NotEmpty(t, psCalls)
assert.Contains(t, psCalls[0], "coolify-alpha-mesh")
}
func TestEmitAllowRevoke_Bidirectional(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10\nbbb222222222|client|10.210.1.5",
}}
parent := parentWithToken()
local := &allowRevokeFlags{
From: "10.210.1.5", To: "10.210.0.10", Proto: "tcp", Port: 80, Bidirectional: true,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.NoError(t, err)
var posts int
for _, c := range fr.calls {
if strings.Contains(c, "-X POST") && strings.Contains(c, "/api/v1/firewall/allow") {
posts++
}
}
assert.Equal(t, 2, posts)
}
func TestEmitAllowRevoke_RevokeIssuesDelete(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
}}
parent := parentWithToken()
local := &allowRevokeFlags{
From: "10.210.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "revoke"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, true)
require.NoError(t, err)
var deletes []string
for _, c := range fr.calls {
if strings.Contains(c, "-X DELETE") && strings.Contains(c, "/api/v1/firewall/allow/") {
deletes = append(deletes, c)
}
}
assert.Len(t, deletes, 1)
assert.Contains(t, deletes[0], "Authorization: Bearer test-token")
}
func TestEmitAllowRevoke_FetchesTokenPerHostWhenOverrideAbsent(t *testing.T) {
// No --coold-token override → CLI SSHes `cat /etc/coolify/api-token`
// on the destination host and uses the result as the bearer.
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
"/etc/coolify/api-token": "per-host-token\n",
}}
parent := parentWithToken()
parent.CooldToken = ""
t.Setenv("COOLIFY_COOLD_TOKEN", "")
local := &allowRevokeFlags{
From: "10.210.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.NoError(t, err)
var post string
for _, c := range fr.calls {
if strings.Contains(c, "-X POST") && strings.Contains(c, "/api/v1/firewall/allow") {
post = c
}
}
assert.NotEmpty(t, post)
assert.Contains(t, post, "Authorization: Bearer per-host-token")
}
func TestEmitAllowRevoke_FetchFailurePropagates(t *testing.T) {
// Empty /etc/coolify/api-token on the host → FetchCooldToken errors,
// and the error surfaces to the caller instead of silently proceeding.
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
// No token file → empty stdout → "token is empty" error.
}}
parent := parentWithToken()
parent.CooldToken = ""
t.Setenv("COOLIFY_COOLD_TOKEN", "")
local := &allowRevokeFlags{
From: "10.210.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.Error(t, err)
assert.Contains(t, err.Error(), "coold token")
}
-105
View File
@@ -1,105 +0,0 @@
package firewall
import (
"context"
"fmt"
"os"
"github.com/spf13/cobra"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// newContainersCommand builds `coolify firewall containers`.
func newContainersCommand(flags *Flags) *cobra.Command {
return &cobra.Command{
Use: "containers",
Short: "List containers on the Coolify mesh bridge across all servers",
RunE: func(cmd *cobra.Command, _ []string) error {
return runContainers(cmd.Context(), cmd, flags)
},
}
}
func runContainers(ctx context.Context, cmd *cobra.Command, flags *Flags) error {
if err := flags.Validate(); err != nil {
return err
}
runner, err := flags.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
return emitContainers(ctx, cmd, flags, runner)
}
// emitContainers is factored out so tests can pass a fake ssh.Runner.
func emitContainers(
ctx context.Context,
cmd *cobra.Command,
flags *Flags,
runner ssh.Runner,
) error {
var (
all []ifw.Container
results []ssh.ServerResult[[]ifw.Container]
)
if flags.AllNamespaces {
// Discover across every managed network on each host.
nsList, nsResults := discoverNamespacesOnHosts(ctx, runner, flags)
for _, r := range nsResults {
if r.Err != nil {
results = append(results, ssh.ServerResult[[]ifw.Container]{
Host: r.Host, Err: r.Err,
})
}
}
var containerResults []ssh.ServerResult[[]ifw.Container]
all, containerResults = discoverAcrossNamespaces(ctx, runner, flags, nsList)
results = append(results, containerResults...)
} else {
all, results = discoverAllViaPkg(ctx, runner, flags)
}
rows := make([]models.ContainerRow, 0, len(all))
for _, c := range all {
rows = append(rows, models.ContainerRow{
Host: c.Host, Namespace: c.Namespace, ID: c.ID, Name: c.Name, IP: c.IP.String(),
})
}
var errs []string
for _, r := range results {
if r.Err != nil {
errs = append(errs, fmt.Sprintf("%s: %v", r.Host, r.Err))
}
}
for _, e := range errs {
fmt.Fprintln(os.Stderr, "Warning:", e)
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
if format == "" {
format = output.FormatTable
}
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
if format == output.FormatJSON || format == output.FormatPretty {
return formatter.Format(models.FirewallContainersOutput{
Containers: rows, Errors: errs,
})
}
if len(rows) == 0 {
if flags.AllNamespaces {
fmt.Fprintln(os.Stderr, "No containers found on any coolify-<ns>-mesh network.")
} else {
fmt.Fprintf(os.Stderr, "No containers found on %s network.\n", flags.PodmanNetworkName())
}
return nil
}
return formatter.Format(rows)
}
-106
View File
@@ -1,106 +0,0 @@
package firewall
import (
"context"
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coollabsio/coolify-cli/cmd/common"
)
func TestEmitContainers_RunsAndFormatsTable(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
}}
parent := &Flags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"h1"}, SSHUser: "root", SSHPort: 22, Concurrency: 1,
},
Namespace: common.DefaultNamespace,
}
inner := &cobra.Command{Use: "containers"}
rootCmdFor(inner)
err := emitContainers(context.Background(), inner, parent, fr)
require.NoError(t, err)
// Discovery command was issued, targeting the default-namespace bridge.
assert.Len(t, fr.calls, 1)
assert.Contains(t, fr.calls[0], "podman ps")
assert.Contains(t, fr.calls[0], "coolify-default-mesh")
}
func TestEmitContainers_EmptyOutput(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{}}
parent := &Flags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"h1"}, SSHUser: "root", SSHPort: 22, Concurrency: 1,
},
Namespace: common.DefaultNamespace,
}
inner := &cobra.Command{Use: "containers"}
rootCmdFor(inner)
err := emitContainers(context.Background(), inner, parent, fr)
require.NoError(t, err)
}
// TestEmitContainers_AllNamespaces_FansOutAcrossNetworks verifies that with
// --all-namespaces the CLI first enumerates managed networks on every host
// and then issues one podman-ps per namespace it found.
func TestEmitContainers_AllNamespaces_FansOutAcrossNetworks(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
// Host reports two managed namespaces via label inspection.
"podman network ls": "default\nalpha\n",
// Every subsequent podman-ps returns the same container.
"podman ps": "aaa111111111|web|10.210.0.10",
}}
parent := &Flags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"h1"}, SSHUser: "root", SSHPort: 22, Concurrency: 1,
},
Namespace: common.DefaultNamespace,
AllNamespaces: true,
}
inner := &cobra.Command{Use: "containers"}
rootCmdFor(inner)
err := emitContainers(context.Background(), inner, parent, fr)
require.NoError(t, err)
// Expect one `podman network ls` discovery call + one `podman ps` per
// discovered namespace (default + alpha = 2).
var ls, ps int
for _, c := range fr.calls {
switch {
case containsAll(c, "podman network ls", "io.coolify.managed=true"):
ls++
case containsAll(c, "podman ps"):
ps++
}
}
assert.Equal(t, 1, ls, "one namespace-discovery call per host")
assert.Equal(t, 2, ps, "one container-discovery call per namespace per host")
}
func containsAll(s string, subs ...string) bool {
for _, sub := range subs {
if !contains(s, sub) {
return false
}
}
return true
}
func contains(s, sub string) bool {
// Tiny local wrapper so tests stay readable without importing strings
// twice — the test file already uses it elsewhere via cmdFakeRunner.
for i := 0; i+len(sub) <= len(s); i++ {
if s[i:i+len(sub)] == sub {
return true
}
}
return false
}
-39
View File
@@ -1,39 +0,0 @@
package firewall
import (
"github.com/spf13/cobra"
)
// NewFirewallCommand creates the parent `coolify firewall` command.
// On bare invocation (no subcommand) it prints help.
func NewFirewallCommand() *cobra.Command {
flags := &Flags{}
cmd := &cobra.Command{
Use: "firewall",
Short: "[ALPHA] Manage cross-host container allow rules (Coolify v5)",
Long: `[ALPHA] Manage the COOLIFY-ALLOW iptables chain installed by
"coolify init --podman --default-deny". This is a test harness for the v5
control-plane firewall flow: it SSHes into every server, discovers running
containers on the Coolify mesh bridge (override with --podman-network), and
lets you add/remove cross-host allow rules.
Subcommands:
containers List discovered containers across the mesh.
list Show installed allow rules.
allow Add an allow rule (src container → dst container:port).
revoke Remove an allow rule.`,
RunE: func(cmd *cobra.Command, _ []string) error {
return cmd.Help()
},
}
bindFlags(cmd, flags)
cmd.AddCommand(newContainersCommand(flags))
cmd.AddCommand(newListCommand(flags))
cmd.AddCommand(newAllowCommand(flags))
cmd.AddCommand(newRevokeCommand(flags))
return cmd
}
-50
View File
@@ -1,50 +0,0 @@
package firewall
import (
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
)
func TestNewFirewallCommand_Subcommands(t *testing.T) {
cmd := NewFirewallCommand()
assert.Equal(t, "firewall", cmd.Use)
subs := map[string]*cobra.Command{}
for _, s := range cmd.Commands() {
subs[s.Use] = s
}
assert.Contains(t, subs, "containers")
assert.Contains(t, subs, "list")
assert.Contains(t, subs, "allow")
assert.Contains(t, subs, "revoke")
}
func TestNewFirewallCommand_PersistentFlags(t *testing.T) {
cmd := NewFirewallCommand()
pf := cmd.PersistentFlags()
for _, name := range []string{"servers", "ssh-key", "ssh-user", "ssh-port",
"concurrency", "ssh-timeout", "namespace", "all-namespaces",
"coold-token", "coold-port", "wg-interface"} {
assert.NotNil(t, pf.Lookup(name), "missing --%s", name)
}
// Replaced by --namespace; must be gone.
assert.Nil(t, pf.Lookup("podman-network"))
}
func TestAllowCommand_LocalFlags(t *testing.T) {
cmd := NewFirewallCommand()
var allow *cobra.Command
for _, s := range cmd.Commands() {
if s.Use == "allow" {
allow = s
break
}
}
if allow == nil {
t.Fatal("allow subcommand not found")
}
for _, name := range []string{"from", "to", "port", "proto", "bidirectional"} {
assert.NotNil(t, allow.Flags().Lookup(name), "missing --%s on allow", name)
}
}
-82
View File
@@ -1,82 +0,0 @@
// Package firewall implements the `coolify firewall` command tree. It is a
// thin SSH-bounced client for the coold agent's REST API: `allow` / `revoke`
// / `list` POST/DELETE/GET against coold on the destination host, while
// `containers` stays SSH+podman because coold has no container surface.
// See CONTROL_PLANE.md §3.
package firewall
import (
"os"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/cmd/common"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
)
// Flags is the shared flag set for every `coolify firewall`
// subcommand: SSH plumbing (via embed) + namespace selection + coold REST
// endpoint/token. The podman network name is derived from the namespace
// (coolify-<ns>-mesh) so the CLI and `coolify init` stay in sync.
type Flags struct {
common.SSHMeshFlags
// Namespace is the mesh namespace the command operates against. Derives
// the podman network (common.PodmanNetworkFor) and is forwarded to coold
// as part of every rule / list query.
Namespace string
// AllNamespaces, when true, makes namespace-aware subcommands operate
// across every namespace the mesh carries. Each subcommand interprets it
// contextually (list: union across namespaces; containers: discover every
// coolify-<ns>-mesh network on each host).
AllNamespaces bool
// CooldToken is an optional bearer-token override for coold's REST API.
// When unset (and COOLIFY_COOLD_TOKEN env is unset), the CLI SSHes into
// each host and reads /etc/coolify/api-token instead — tokens are
// generated per-host at install time and are not centrally shared.
CooldToken string
// CooldPort is the TCP port coold listens on (bound to the WG mgmt IP).
// Must match COOLD_API_BIND emitted by internal/services/coold.go.
CooldPort int
// WGInterface is the WireGuard interface name used to discover coold's
// bind IP on each host. Must match --wg-interface used at `coolify init`.
WGInterface string
}
// bindFlags registers the persistent flags on the parent command.
func bindFlags(cmd *cobra.Command, f *Flags) {
common.BindSSHMeshFlags(cmd, &f.SSHMeshFlags)
common.BindMeshNetSingleFlags(cmd, &f.Namespace)
pf := cmd.PersistentFlags()
pf.BoolVar(&f.AllNamespaces, "all-namespaces", false,
"Operate across every mesh namespace on each host (list/containers fan out; "+
"allow/revoke still require a specific --namespace)")
pf.StringVar(&f.CooldToken, "coold-token", "",
"Bearer token override for coold REST API (also reads COOLIFY_COOLD_TOKEN env). "+
"When unset, CLI reads /etc/coolify/api-token over SSH per host.")
pf.IntVar(&f.CooldPort, "coold-port", 8443,
"TCP port coold's REST API listens on (bound to the WG mgmt IP)")
pf.StringVar(&f.WGInterface, "wg-interface", ifw.DefaultWGInterface,
"WireGuard interface name on remote hosts (must match --wg-interface at init)")
}
// ResolveCooldToken returns the bearer-token override supplied via flag or
// env, or "" when neither is set. Callers treat an empty string as "no
// override — SSH-fetch the per-host token instead".
func (f *Flags) ResolveCooldToken() (string, error) {
if f.CooldToken != "" {
return f.CooldToken, nil
}
if env := os.Getenv("COOLIFY_COOLD_TOKEN"); env != "" {
return env, nil
}
return "", nil
}
// PodmanNetworkName returns the podman bridge that backs the selected
// namespace on every host. Used by container discovery.
func (f *Flags) PodmanNetworkName() string {
return common.PodmanNetworkFor(f.Namespace)
}
-129
View File
@@ -1,129 +0,0 @@
package firewall
import (
"context"
"sort"
"strings"
"sync"
"github.com/coollabsio/coolify-cli/cmd/common"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// discoverAllViaPkg is a thin wrapper around ifw.DiscoverAll /
// ifw.DiscoverAllNamespaces that threads the Flags in. Used by
// `containers` (SSH+podman) and by `allow` / `revoke` for endpoint
// resolution; `list` goes straight to coold REST.
//
// When AllNamespaces is set, the fanout walks every supplied namespace; the
// caller (containers subcommand) is responsible for enumerating which
// namespaces exist on the hosts — absent that, falls back to the selected
// single namespace.
func discoverAllViaPkg(
ctx context.Context,
runner ssh.Runner,
flags *Flags,
) ([]ifw.Container, []ssh.ServerResult[[]ifw.Container]) {
return ifw.DiscoverAll(ctx, runner, flags.Servers, flags.SSHUser,
flags.SSHPort, flags.Namespace, flags.PodmanNetworkName(),
flags.Concurrency)
}
// discoverAcrossNamespaces runs DiscoverAllNamespaces for every supplied
// namespace. Network name is derived from common.PodmanNetworkFor so the
// caller only has to supply the namespace list.
func discoverAcrossNamespaces(
ctx context.Context,
runner ssh.Runner,
flags *Flags,
namespaces []string,
) ([]ifw.Container, []ssh.ServerResult[[]ifw.Container]) {
return ifw.DiscoverAllNamespaces(ctx, runner, flags.Servers,
flags.SSHUser, flags.SSHPort, namespaces,
common.PodmanNetworkFor, flags.Concurrency)
}
// discoverNamespacesOnHosts SSHes into every host and lists every podman
// network carrying the io.coolify.managed=true label, collecting the unique
// io.coolify.namespace label values. Used by `containers --all-namespaces`.
// Returns the per-host results so host-level failures surface as warnings
// instead of aborting the fanout.
func discoverNamespacesOnHosts(
ctx context.Context,
runner ssh.Runner,
flags *Flags,
) ([]string, []ssh.ServerResult[[]string]) {
// `podman network ls`'s `{{.Labels}}` renders as a comma-separated `k=v`
// string (not a map, unlike `podman network inspect`), so `index` can't be
// used — pull `io.coolify.namespace=<val>` out with sed instead.
script := `podman network ls --filter label=io.coolify.managed=true ` +
`--format '{{.Labels}}' 2>/dev/null | ` +
`sed -n 's/.*io\.coolify\.namespace=\([^,]*\).*/\1/p' || true`
results := ssh.ForEachServer(ctx, flags.Servers, flags.Concurrency,
func(ctx context.Context, host string) ([]string, error) {
stdout, _, err := runner.Run(ctx, host, flags.SSHUser,
flags.SSHPort, script)
if err != nil {
return nil, err
}
var nss []string
for _, line := range strings.Split(stdout, "\n") {
ns := strings.TrimSpace(line)
if ns != "" {
nss = append(nss, ns)
}
}
return nss, nil
})
seen := map[string]struct{}{}
for _, r := range results {
for _, ns := range r.Result {
seen[ns] = struct{}{}
}
}
// Always probe the selected namespace too — caller may have just created
// it and we haven't seen it on any host yet.
seen[flags.Namespace] = struct{}{}
all := make([]string, 0, len(seen))
for ns := range seen {
all = append(all, ns)
}
sort.Strings(all)
return all, results
}
// tokenResolver returns a closure that hands out coold bearer tokens
// per-host. Precedence: explicit --coold-token (or COOLIFY_COOLD_TOKEN env)
// wins for every host; otherwise SSH into the host once and cache the
// contents of /etc/coolify/api-token. The cache is goroutine-safe so the
// closure can be passed straight into CooldListAll's fanout.
func tokenResolver(
ctx context.Context,
runner ssh.Runner,
flags *Flags,
) func(host string) (string, error) {
if override, _ := flags.ResolveCooldToken(); override != "" {
return func(string) (string, error) { return override, nil }
}
var (
mu sync.Mutex
cache = map[string]string{}
)
return func(host string) (string, error) {
mu.Lock()
if tok, ok := cache[host]; ok {
mu.Unlock()
return tok, nil
}
mu.Unlock()
tok, err := ifw.FetchCooldToken(ctx, runner, host, flags.SSHUser, flags.SSHPort)
if err != nil {
return "", err
}
mu.Lock()
cache[host] = tok
mu.Unlock()
return tok, nil
}
}
-97
View File
@@ -1,97 +0,0 @@
package firewall
import (
"context"
"fmt"
"os"
"github.com/spf13/cobra"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// newListCommand builds `coolify firewall list`.
func newListCommand(flags *Flags) *cobra.Command {
return &cobra.Command{
Use: "list",
Short: "List installed allow rules across all servers",
RunE: func(cmd *cobra.Command, _ []string) error {
return runList(cmd.Context(), cmd, flags)
},
}
}
func runList(ctx context.Context, cmd *cobra.Command, flags *Flags) error {
if err := flags.Validate(); err != nil {
return err
}
runner, err := flags.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
return emitList(ctx, cmd, flags, runner)
}
func emitList(
ctx context.Context,
cmd *cobra.Command,
flags *Flags,
runner ssh.Runner,
) error {
tokenFor := tokenResolver(ctx, runner, flags)
// --all-namespaces → omit the query param so coold returns the union.
ns := flags.Namespace
if flags.AllNamespaces {
ns = ""
}
all, results := ifw.CooldListAll(ctx, runner, flags.Servers, flags.SSHUser,
flags.SSHPort, flags.CooldPort, flags.WGInterface, tokenFor,
flags.Concurrency, ns)
rows := make([]models.AllowRuleRow, 0, len(all))
for _, r := range all {
rows = append(rows, models.AllowRuleRow{
Host: r.Host,
Namespace: r.Namespace,
ID: r.Comment,
Src: r.Src.String(),
Dst: r.Dst.String(),
Proto: r.Proto,
Port: r.Port,
Comment: r.Comment,
})
}
var errs []string
for _, r := range results {
if r.Err != nil {
errs = append(errs, fmt.Sprintf("%s: %v", r.Host, r.Err))
}
}
for _, e := range errs {
fmt.Fprintln(os.Stderr, "Warning:", e)
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
if format == "" {
format = output.FormatTable
}
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
if format == output.FormatJSON || format == output.FormatPretty {
return formatter.Format(models.FirewallListOutput{
Rules: rows, Errors: errs,
})
}
if len(rows) == 0 {
fmt.Fprintln(os.Stderr, "No allow rules found. Run `coolify firewall allow ...` to add one.")
return nil
}
return formatter.Format(rows)
}
-65
View File
@@ -1,65 +0,0 @@
package firewall
import (
"context"
"strings"
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestEmitList_CallsCooldGet(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"/api/v1/firewall/allow": `[{"src":"10.0.0.1","dst":"10.0.0.2","proto":"tcp","port":80,"id":"abc123def456"}]`,
}}
parent := parentWithToken()
inner := &cobra.Command{Use: "list"}
rootCmdFor(inner)
err := emitList(context.Background(), inner, parent, fr)
require.NoError(t, err)
assert.Len(t, fr.calls, 1)
assert.Contains(t, fr.calls[0], "curl")
assert.Contains(t, fr.calls[0], "/api/v1/firewall/allow")
assert.Contains(t, fr.calls[0], "Authorization: Bearer test-token")
}
func TestEmitList_EmptyCoold(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{}}
parent := parentWithToken()
inner := &cobra.Command{Use: "list"}
rootCmdFor(inner)
err := emitList(context.Background(), inner, parent, fr)
require.NoError(t, err)
}
func TestEmitList_FetchesPerHostTokenWhenOverrideAbsent(t *testing.T) {
// Without --coold-token override, each host's token is read via SSH
// `cat /etc/coolify/api-token` then used as the bearer for GET /allow.
fr := &cmdFakeRunner{responses: map[string]string{
"/etc/coolify/api-token": "per-host-token\n",
"/api/v1/firewall/allow": `[]`,
}}
parent := parentWithToken()
parent.CooldToken = ""
t.Setenv("COOLIFY_COOLD_TOKEN", "")
inner := &cobra.Command{Use: "list"}
rootCmdFor(inner)
err := emitList(context.Background(), inner, parent, fr)
require.NoError(t, err)
var ranTokenFetch, ranGet bool
for _, c := range fr.calls {
if strings.Contains(c, "cat /etc/coolify/api-token") {
ranTokenFetch = true
}
if strings.Contains(c, "curl") && strings.Contains(c, "Authorization: Bearer per-host-token") {
ranGet = true
}
}
assert.True(t, ranTokenFetch, "CLI should SSH-fetch the token")
assert.True(t, ranGet, "bearer should be the fetched token")
}
-113
View File
@@ -1,113 +0,0 @@
package firewall
import (
"fmt"
"net"
"strings"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
)
// resolveEndpoint turns a user-supplied reference (name, short-id, raw IP,
// or "host:name") into the container it points at. When ref is a raw IP
// that doesn't match any discovered container, it returns a synthetic
// entry with Host="" — the caller must derive Host some other way.
//
// Ambiguous names across hosts are rejected; the user must disambiguate
// with "host:name" or a short-ID.
func resolveEndpoint(ref string, all []ifw.Container) (ifw.Container, error) {
ref = strings.TrimSpace(ref)
if ref == "" {
return ifw.Container{}, fmt.Errorf("empty container reference")
}
// "host:name" form — exact host disambiguator.
if host, name, ok := splitHostName(ref); ok {
for _, c := range all {
if c.Host == host && c.Name == name {
return c, nil
}
}
return ifw.Container{}, fmt.Errorf("no container named %q on host %q", name, host)
}
// Raw IP form.
if ip := net.ParseIP(ref); ip != nil {
for _, c := range all {
if c.IP.Equal(ip) {
return c, nil
}
}
// Synthetic: caller must decide on Host.
return ifw.Container{IP: ip}, nil
}
// Name / short-id form. Collect matches, error on ambiguity.
var matches []ifw.Container
for _, c := range all {
if c.Name == ref || strings.HasPrefix(c.ID, ref) {
matches = append(matches, c)
}
}
switch len(matches) {
case 0:
return ifw.Container{}, fmt.Errorf("no container matches %q", ref)
case 1:
return matches[0], nil
default:
return ifw.Container{}, fmt.Errorf(
"reference %q is ambiguous across hosts (%s) — use host:name form",
ref, hostList(matches))
}
}
func splitHostName(ref string) (host, name string, ok bool) {
i := strings.IndexByte(ref, ':')
if i <= 0 || i == len(ref)-1 {
return "", "", false
}
// Reject if the part after `:` looks like a port (all digits) — likely
// an IP:port form the user didn't mean.
name = ref[i+1:]
host = ref[:i]
if allDigits(name) {
return "", "", false
}
return host, name, true
}
func allDigits(s string) bool {
if s == "" {
return false
}
for _, r := range s {
if r < '0' || r > '9' {
return false
}
}
return true
}
func hostList(cs []ifw.Container) string {
seen := map[string]bool{}
var hosts []string
for _, c := range cs {
if !seen[c.Host] {
hosts = append(hosts, c.Host)
seen[c.Host] = true
}
}
return strings.Join(hosts, ", ")
}
// findHostForIP returns the SSH host that owns ip (i.e. the host whose
// coolify-mesh bridge has ip assigned). Used when --to/--from is given as
// a raw IP not tied to a running container.
func findHostForIP(ip net.IP, all []ifw.Container) (string, bool) {
for _, c := range all {
if c.IP.Equal(ip) {
return c.Host, true
}
}
return "", false
}
-76
View File
@@ -1,76 +0,0 @@
package firewall
import (
"net"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
)
func cs() []ifw.Container {
return []ifw.Container{
{Host: "h1", ID: "aaa111111111", Name: "web", IP: net.ParseIP("10.210.0.10")},
{Host: "h2", ID: "bbb222222222", Name: "api", IP: net.ParseIP("10.210.1.10")},
{Host: "h3", ID: "ccc333333333", Name: "web", IP: net.ParseIP("10.210.2.10")},
}
}
func TestResolveEndpoint_ByName_Unique(t *testing.T) {
c, err := resolveEndpoint("api", cs())
require.NoError(t, err)
assert.Equal(t, "h2", c.Host)
assert.Equal(t, "10.210.1.10", c.IP.String())
}
func TestResolveEndpoint_ByName_Ambiguous(t *testing.T) {
_, err := resolveEndpoint("web", cs())
require.Error(t, err)
assert.Contains(t, err.Error(), "ambiguous")
}
func TestResolveEndpoint_ByShortID(t *testing.T) {
c, err := resolveEndpoint("bbb", cs())
require.NoError(t, err)
assert.Equal(t, "h2", c.Host)
}
func TestResolveEndpoint_ByHostName(t *testing.T) {
c, err := resolveEndpoint("h3:web", cs())
require.NoError(t, err)
assert.Equal(t, "h3", c.Host)
assert.Equal(t, "10.210.2.10", c.IP.String())
}
func TestResolveEndpoint_ByRawIP(t *testing.T) {
c, err := resolveEndpoint("10.210.1.10", cs())
require.NoError(t, err)
assert.Equal(t, "h2", c.Host)
}
func TestResolveEndpoint_UnknownRawIP_Synthetic(t *testing.T) {
c, err := resolveEndpoint("10.99.99.99", cs())
require.NoError(t, err)
assert.Empty(t, c.Host)
assert.Equal(t, "10.99.99.99", c.IP.String())
}
func TestResolveEndpoint_NotFound(t *testing.T) {
_, err := resolveEndpoint("nobody", cs())
require.Error(t, err)
}
func TestResolveEndpoint_Empty(t *testing.T) {
_, err := resolveEndpoint("", cs())
require.Error(t, err)
}
func TestFindHostForIP(t *testing.T) {
h, ok := findHostForIP(net.ParseIP("10.210.0.10"), cs())
assert.True(t, ok)
assert.Equal(t, "h1", h)
_, ok = findHostForIP(net.ParseIP("1.2.3.4"), cs())
assert.False(t, ok)
}
-200
View File
@@ -1,200 +0,0 @@
package initcmd
import (
"bufio"
"context"
"fmt"
"os"
"github.com/mattn/go-isatty"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
internalssh "github.com/coollabsio/coolify-cli/internal/ssh"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// Ensure internalssh is used (for *internalssh.Client in signatures).
var _ *internalssh.Client
// applyOptions tweaks runApply per subcommand.
type applyOptions struct {
// SkipAlphaGate, when true, bypasses the interactive "press enter"
// confirmation. upgrade/extend set it because those are called from the
// Coolify backend in production, not a human at a terminal.
SkipAlphaGate bool
// Header is a one-line banner describing the intent (e.g. "extending
// mesh with 1 new host"). Printed to stderr before the plan.
Header string
}
func runApply(ctx context.Context, cmd *cobra.Command, flags *InitFlags, opts applyOptions) error {
fmt.Fprint(os.Stderr, alphaBanner)
if err := validatePlanFlags(flags); err != nil {
return err
}
if !opts.SkipAlphaGate && !shouldSkipGate(flags) {
fmt.Fprintln(os.Stderr, "This command will modify network configuration on the listed servers.")
fmt.Fprint(os.Stderr, "Press Enter to continue, or Ctrl+C to abort... ")
reader := bufio.NewReader(os.Stdin)
if _, err := reader.ReadString('\n'); err != nil {
return fmt.Errorf("read confirmation: %w", err)
}
}
desired, err := buildDesired(flags)
if err != nil {
return err
}
if err := wireguard.ValidateIntent(desired); err != nil {
return err
}
sshClient, err := flags.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
if opts.Header != "" {
fmt.Fprintln(os.Stderr, opts.Header)
}
fmt.Fprintf(os.Stderr, "Probing %d server(s)...\n", len(flags.Servers))
current, probeErr := wireguard.Reconstruct(ctx, sshClient, flags.Servers,
flags.SSHUser, flags.SSHPort, flags.WGInterface,
flags.Namespaces, flags.Concurrency)
if probeErr != nil {
fmt.Fprintf(os.Stderr, "Warning: %v\n", probeErr)
}
plan, err := wireguard.BuildPlan(desired, current)
if err != nil {
return fmt.Errorf("build plan: %w", err)
}
for _, w := range plan.Warnings {
fmt.Fprintf(os.Stderr, "Warning [%s]: %s\n", w.Host, w.Reason)
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
if plan.IsEmpty() {
fmt.Fprintln(os.Stderr, "No changes needed. Mesh is already converged.")
} else {
fmt.Fprintln(os.Stderr, "Plan:")
for _, a := range plan.Actions {
fmt.Fprintf(os.Stderr, " [%s] %s %s\n", a.Host, a.Type, a.Detail)
}
fmt.Fprintln(os.Stderr)
}
if len(plan.Skipped) > 0 {
fmt.Fprintln(os.Stderr, "Skipped by intent filter:")
for _, s := range plan.Skipped {
fmt.Fprintf(os.Stderr, " [%s] %s — %s\n", s.Action.Host, s.Action.Type, s.Reason)
}
fmt.Fprintln(os.Stderr)
}
if plan.IsEmpty() {
return runVerify(ctx, sshClient, flags, desired, format)
}
fmt.Fprintln(os.Stderr, "Applying...")
actionResults, applyErr := wireguard.ApplyMesh(ctx, sshClient,
flags.SSHUser, flags.SSHPort, desired, current, flags.Concurrency)
rows := make([]models.ApplyResultRow, len(actionResults))
for i, r := range actionResults {
status := "ok"
detail := r.Action.Detail
if r.Err != nil {
status = "error"
if detail == "" {
detail = r.Err.Error()
}
}
rows[i] = models.ApplyResultRow{
Server: r.Action.Host,
Action: string(r.Action.Type),
Status: status,
Detail: detail,
}
}
if format == output.FormatJSON || format == output.FormatPretty {
verifyRows := collectVerifyRows(ctx, sshClient, flags, desired)
out := models.ApplyOutput{Results: rows, Verified: verifyRows}
formatter, ferr := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if ferr != nil {
return ferr
}
if err := formatter.Format(out); err != nil {
return err
}
return applyErr
}
if len(rows) > 0 {
formatter, _ := output.NewFormatter(output.FormatTable, output.Options{Writer: os.Stdout})
_ = formatter.Format(rows)
}
if err := runVerify(ctx, sshClient, flags, desired, format); err != nil {
return err
}
return applyErr
}
// shouldSkipGate returns true when the interactive alpha gate should be bypassed.
func shouldSkipGate(flags *InitFlags) bool {
if flags.Yes {
return true
}
if os.Getenv("COOLIFY_NON_INTERACTIVE") == "1" {
return true
}
if !isatty.IsTerminal(os.Stdin.Fd()) && !isatty.IsCygwinTerminal(os.Stdin.Fd()) {
return true
}
return false
}
func runVerify(ctx context.Context, sshClient *internalssh.Client, flags *InitFlags, desired *wireguard.DesiredMesh, format string) error {
fmt.Fprintln(os.Stderr, "Verifying...")
vrows := collectVerifyRows(ctx, sshClient, flags, desired)
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
return formatter.Format(vrows)
}
func collectVerifyRows(ctx context.Context, sshClient *internalssh.Client, flags *InitFlags, desired *wireguard.DesiredMesh) []models.VerifyResultRow {
vresults := wireguard.Verify(ctx, sshClient,
flags.Servers, flags.SSHUser, flags.SSHPort, desired.Interface, flags.Concurrency)
rows := make([]models.VerifyResultRow, len(vresults))
for i, v := range vresults {
status := "ok"
wgIP := ""
if v.WireGuardIP != nil {
wgIP = v.WireGuardIP.String()
}
if v.Err != nil || !v.Active {
status = "error"
}
rows[i] = models.VerifyResultRow{
Server: v.Host,
WireGuardIP: wgIP,
PeerCount: v.PeerCount,
Status: status,
}
}
return rows
}
-125
View File
@@ -1,125 +0,0 @@
package initcmd
import (
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestNewInitCommand verifies the command tree structure.
func TestNewInitCommand(t *testing.T) {
cmd := NewInitCommand()
assert.Equal(t, "init", cmd.Use)
assert.NotEmpty(t, cmd.Short)
subCmds := map[string]*cobra.Command{}
for _, sub := range cmd.Commands() {
subCmds[sub.Use] = sub
}
assert.Contains(t, subCmds, "plan")
assert.Contains(t, subCmds, "bootstrap")
assert.Contains(t, subCmds, "extend")
assert.Contains(t, subCmds, "upgrade")
assert.NotContains(t, subCmds, "apply", "apply removed in favor of bootstrap/extend/upgrade")
}
// TestNewInitCommand_PersistentFlags verifies shared flags are registered.
func TestNewInitCommand_PersistentFlags(t *testing.T) {
cmd := NewInitCommand()
pf := cmd.PersistentFlags()
assert.NotNil(t, pf.Lookup("servers"))
assert.NotNil(t, pf.Lookup("ssh-key"))
assert.NotNil(t, pf.Lookup("ssh-user"))
assert.NotNil(t, pf.Lookup("ssh-port"))
assert.NotNil(t, pf.Lookup("wg-mgmt-pool"))
assert.NotNil(t, pf.Lookup("container-pool"))
assert.NotNil(t, pf.Lookup("container-prefix"))
assert.NotNil(t, pf.Lookup("wg-interface"))
assert.NotNil(t, pf.Lookup("wg-listen-port"))
assert.NotNil(t, pf.Lookup("namespaces"))
assert.NotNil(t, pf.Lookup("skip-default-deny"))
assert.NotNil(t, pf.Lookup("concurrency"))
assert.NotNil(t, pf.Lookup("ssh-timeout"))
assert.NotNil(t, pf.Lookup("yes"))
// Old flags removed.
assert.Nil(t, pf.Lookup("wg-pool"))
assert.Nil(t, pf.Lookup("wg-host-prefix"))
assert.Nil(t, pf.Lookup("wg-subnet"))
assert.Nil(t, pf.Lookup("podman"))
assert.Nil(t, pf.Lookup("default-deny"))
assert.Nil(t, pf.Lookup("install-coold"))
// Replaced by --namespaces.
assert.Nil(t, pf.Lookup("podman-network"))
}
// TestNewInitCommand_FlagDefaults verifies default values.
func TestNewInitCommand_FlagDefaults(t *testing.T) {
cmd := NewInitCommand()
pf := cmd.PersistentFlags()
user, err := pf.GetString("ssh-user")
require.NoError(t, err)
assert.Equal(t, "root", user)
port, err := pf.GetInt("ssh-port")
require.NoError(t, err)
assert.Equal(t, 22, port)
mgmtPool, err := pf.GetString("wg-mgmt-pool")
require.NoError(t, err)
assert.Equal(t, "100.64.0.0/16", mgmtPool)
contPool, err := pf.GetString("container-pool")
require.NoError(t, err)
assert.Equal(t, "10.210.0.0/16", contPool)
contPrefix, err := pf.GetInt("container-prefix")
require.NoError(t, err)
assert.Equal(t, 24, contPrefix)
iface, err := pf.GetString("wg-interface")
require.NoError(t, err)
assert.Equal(t, "wg0", iface)
listenPort, err := pf.GetInt("wg-listen-port")
require.NoError(t, err)
assert.Equal(t, 51820, listenPort)
namespaces, err := pf.GetStringSlice("namespaces")
require.NoError(t, err)
assert.Equal(t, []string{"default"}, namespaces)
skipDefaultDeny, err := pf.GetBool("skip-default-deny")
require.NoError(t, err)
assert.False(t, skipDefaultDeny)
concurrency, err := pf.GetInt("concurrency")
require.NoError(t, err)
assert.Equal(t, 10, concurrency)
timeout, err := pf.GetString("ssh-timeout")
require.NoError(t, err)
assert.Equal(t, "30s", timeout)
}
// TestPlanCommand_FlagsInherited verifies that plan inherits parent persistent flags.
func TestPlanCommand_FlagsInherited(t *testing.T) {
init := NewInitCommand()
_ = init.ParseFlags([]string{})
var planCmd *cobra.Command
for _, sub := range init.Commands() {
if sub.Use == "plan" {
planCmd = sub
break
}
}
require.NotNil(t, planCmd)
f := planCmd.InheritedFlags().Lookup("servers")
assert.NotNil(t, f, "plan should inherit --servers from parent")
}
-29
View File
@@ -1,29 +0,0 @@
package initcmd
import (
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// NewBootstrapCommand creates the `coolify init bootstrap` subcommand — the
// first-time mesh install. Runs every applicable action on every host and
// keeps the interactive alpha gate (unless --yes / non-TTY / env override).
func NewBootstrapCommand(flags *InitFlags) *cobra.Command {
return &cobra.Command{
Use: "bootstrap",
Short: "First-time mesh install (all actions allowed)",
Long: `Bootstrap a fresh WireGuard + Podman + coold mesh across every host in
--servers. Idempotent: re-running with no changes produces an empty plan.
Use this for the initial install. For adding hosts later, see
` + "`coolify init extend`" + `; for bumping agent versions, see
` + "`coolify init upgrade`" + `.`,
RunE: func(cmd *cobra.Command, _ []string) error {
flags.Intent = string(wireguard.IntentBootstrap)
return runApply(cmd.Context(), cmd, flags, applyOptions{
Header: "Bootstrapping mesh...",
})
},
}
}
-51
View File
@@ -1,51 +0,0 @@
package initcmd
import (
"fmt"
"net"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// buildDesired turns the flag struct into a wireguard.DesiredMesh. Intent is
// pulled from flags.Intent so each subcommand can set it before calling the
// shared plan/apply pipeline.
func buildDesired(flags *InitFlags) (*wireguard.DesiredMesh, error) {
_, mgmtPool, err := net.ParseCIDR(flags.WGMgmtPool)
if err != nil {
return nil, fmt.Errorf("invalid --wg-mgmt-pool %q: %w", flags.WGMgmtPool, err)
}
_, contPool, err := net.ParseCIDR(flags.ContainerPool)
if err != nil {
return nil, fmt.Errorf("invalid --container-pool %q: %w", flags.ContainerPool, err)
}
return &wireguard.DesiredMesh{
Hosts: flags.Servers,
Interface: flags.WGInterface,
MgmtPool: mgmtPool,
ContainerPool: contPool,
ContainerPrefix: flags.ContainerPrefix,
ListenPort: flags.WGListenPort,
InstallPodman: true,
Namespaces: flags.Namespaces,
DefaultDenyContainers: !flags.SkipDefaultDeny,
InstallCoold: true,
CooldVersion: flags.CooldVersion,
CorrosionVersion: flags.CorrosionVersion,
CorrosionGossipPort: flags.CorrosionGossipPort,
CorrosionAPIPort: flags.CorrosionAPIPort,
CentralHost: flags.CentralHost,
SchedulerVersion: flags.SchedulerVersion,
EnableBuilder: flags.EnableBuilder,
BuilderHosts: flags.BuilderHosts,
BuilderCapacity: flags.BuilderCapacity,
BuilderCPUQuota: flags.BuilderCPUQuota,
BuilderMemoryMax: flags.BuilderMemoryMax,
BuilderTimeoutSecs: flags.BuilderTimeoutSecs,
Intent: wireguard.Intent(flags.Intent),
NewHosts: flags.NewHosts,
AllowReplace: flags.AllowReplace,
AllowNightly: flags.AllowNightly,
}, nil
}
-65
View File
@@ -1,65 +0,0 @@
package initcmd
import (
"fmt"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// NewExtendCommand creates the `coolify init extend` subcommand. It adds the
// hosts listed in --new-hosts to an existing mesh: new hosts get the full
// first-time install; existing hosts get only peer-refresh actions (WG
// AllowedIPs update, corrosion config refresh, firewall unit reinstall if
// namespace list changed). Destructive actions on existing hosts are blocked
// unless --allow-replace is set.
func NewExtendCommand(flags *InitFlags) *cobra.Command {
cmd := &cobra.Command{
Use: "extend",
Short: "Add new hosts to an existing mesh (existing hosts stay untouched)",
Long: `Extend an existing mesh with brand-new hosts. --new-hosts lists the
subset of --servers that is brand-new; those hosts receive the full
first-time install (install WG, generate keys, install podman, install
coold/corrosion, create bridges, etc.).
Existing hosts in --servers are re-probed and get only the peer-refresh
actions required to route traffic to the new peer: WG config rewrite,
corrosion peer list refresh, firewall unit reinstall when the namespace
list changed. Agent binaries are not re-downloaded on existing hosts —
use ` + "`coolify init upgrade`" + ` for that.
--allow-replace unlocks destructive-replace actions (e.g. recreating a
drifted podman bridge) on existing hosts. Handle with care: containers
on a recreated bridge are disconnected.`,
RunE: func(cmd *cobra.Command, _ []string) error {
if len(flags.NewHosts) == 0 {
return fmt.Errorf("--new-hosts is required: list the subset of --servers that is brand-new")
}
servers := make(map[string]struct{}, len(flags.Servers))
for _, s := range flags.Servers {
servers[s] = struct{}{}
}
for _, nh := range flags.NewHosts {
if _, ok := servers[nh]; !ok {
return fmt.Errorf("--new-hosts: %q is not in --servers", nh)
}
}
flags.Intent = string(wireguard.IntentExtend)
header := fmt.Sprintf("Extending mesh with %d new host(s): %v", len(flags.NewHosts), flags.NewHosts)
return runApply(cmd.Context(), cmd, flags, applyOptions{
SkipAlphaGate: true,
Header: header,
})
},
}
cmd.Flags().StringSliceVar(&flags.NewHosts, "new-hosts", nil,
"Comma-separated subset of --servers that is brand-new this run (required). Only these hosts receive the full first-time install; all other hosts get peer-refresh only.")
cmd.Flags().BoolVar(&flags.AllowReplace, "allow-replace", false,
"Unlock destructive-replace actions on existing hosts (e.g. recreating a drifted podman bridge). Off by default — drifted existing hosts are surfaced as skipped actions instead.")
return cmd
}
-116
View File
@@ -1,116 +0,0 @@
// Package initcmd implements the `coolify init` alpha WireGuard mesh
// bootstrap command tree (Coolify v5).
package initcmd
import (
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/cmd/common"
)
// InitFlags holds all flags shared between `plan` and `apply`.
type InitFlags struct {
common.SSHMeshFlags
common.MeshNetFlags
WGMgmtPool string
WGInterface string
WGListenPort int
SkipDefaultDeny bool
CooldVersion string
CorrosionVersion string
CorrosionGossipPort int
CorrosionAPIPort int
Yes bool
// CentralHost is the SSH address of the central VM (from --central flag).
// When non-empty, phases 4+5 install the scheduler on that host and push
// per-host JWTs to all other hosts. Default empty = no scheduler setup.
CentralHost string
SchedulerVersion string
// EnableBuilder is a cluster-wide shorthand: when true (and BuilderHosts
// is empty), every host in Servers is enrolled as builder-capable. When
// BuilderHosts is non-empty, EnableBuilder is ignored and only the
// listed subset gets the capability.
EnableBuilder bool
// BuilderHosts is an explicit list of SSH addresses (subset of Servers)
// to enroll with the builder capability. Empty = fall back to
// EnableBuilder semantics. Mutually exclusive in practice with
// EnableBuilder=false (leaves builder fully disabled).
BuilderHosts []string
BuilderCapacity int
BuilderCPUQuota string
BuilderMemoryMax string
BuilderTimeoutSecs int
// NewHosts is the extend-subcommand-only list of brand-new hosts. Must
// be a subset of Servers. Existing hosts in Servers get only peer-refresh
// actions; new hosts get the full first-time install.
NewHosts []string
// AllowReplace unlocks destructive-replace actions on existing hosts in
// extend mode (e.g. recreating a podman bridge whose dns_enabled=true
// pre-alpha drift would otherwise be blocked).
AllowReplace bool
// AllowNightly permits the upgrade subcommand to accept "nightly" as a
// version tag. Rejected by default because nightly forces a re-install on
// every run instead of only when the pinned version changes.
AllowNightly bool
// Intent selects the plan filter (bootstrap/extend/upgrade). Set by each
// subcommand before calling runPlan/runApply; not bound to a flag.
Intent string
}
// bindInitFlags registers all shared flags as PersistentFlags on cmd.
func bindInitFlags(cmd *cobra.Command, f *InitFlags) {
common.BindSSHMeshFlags(cmd, &f.SSHMeshFlags)
common.BindMeshNetMultiFlags(cmd, &f.MeshNetFlags)
pf := cmd.PersistentFlags()
pf.StringVar(&f.WGMgmtPool, "wg-mgmt-pool", "100.64.0.0/16",
"WireGuard management address pool — each host gets a /32 from here, assigned to wg0")
pf.StringVar(&f.WGInterface, "wg-interface", "wg0",
"WireGuard interface name on the remote hosts")
pf.IntVar(&f.WGListenPort, "wg-listen-port", 51820,
"WireGuard UDP listen port")
pf.BoolVar(&f.SkipDefaultDeny, "skip-default-deny", false,
"Skip installing the default-deny firewall scaffold. By default, both cross-host and intra-host (same bridge) container traffic is blocked; coold manages the allow list at runtime")
pf.StringVar(&f.CooldVersion, "coold-version", "nightly",
`Release tag to download for coold (e.g. "nightly", "v1.2.3"). nightly always re-installs on every apply.`)
pf.StringVar(&f.CorrosionVersion, "corrosion-version", "nightly",
`Release tag to download for corrosion (e.g. "nightly", "v1.2.3"). nightly always re-installs on every apply.`)
pf.IntVar(&f.CorrosionGossipPort, "corrosion-gossip-port", 8787,
"Corrosion SWIM gossip port (bound to the wg0 mgmt IP)")
pf.IntVar(&f.CorrosionAPIPort, "corrosion-api-port", 8080,
"Corrosion HTTP API port (bound to 127.0.0.1)")
pf.BoolVarP(&f.Yes, "yes", "y", false,
"Skip the interactive alpha confirmation prompt")
pf.StringVar(&f.CentralHost, "central", "",
`SSH address of the central VM that will run the scheduler (and later Laravel).
Must be one of the --servers entries. When set, phases 4+5 install the scheduler on that host
and push a per-host JWT to every other server. Leave empty to skip scheduler setup.`)
pf.StringVar(&f.SchedulerVersion, "scheduler-version", "nightly",
`Release tag to download for scheduler (e.g. "nightly", "v1.2.3").`)
pf.BoolVar(&f.EnableBuilder, "enable-builder", true,
`Cluster-wide shorthand: enable the builder capability on every host
(requires --central). Ignored when --builder-hosts is set.`)
pf.StringSliceVar(&f.BuilderHosts, "builder-hosts", nil,
`Explicit subset of --servers to enroll with the builder capability.
Takes precedence over --enable-builder. Empty (default) means fall back to
--enable-builder for the whole cluster.`)
pf.IntVar(&f.BuilderCapacity, "builder-capacity", 2,
"Concurrent builds accepted per host (COOLD_BUILDER_CAPACITY).")
pf.StringVar(&f.BuilderCPUQuota, "builder-cpu-quota", "200%",
`cgroup CPU quota for each build subprocess (COOLD_BUILDER_CPU_QUOTA).
systemd CPUQuota format; "200%" = two full cores.`)
pf.StringVar(&f.BuilderMemoryMax, "builder-memory-max", "2G",
`cgroup memory cap for each build subprocess (COOLD_BUILDER_MEMORY_MAX).
systemd MemoryMax format; e.g. "2G", "512M".`)
pf.IntVar(&f.BuilderTimeoutSecs, "builder-timeout-secs", 1800,
"Hard wall-clock timeout per build in seconds (COOLD_BUILDER_TIMEOUT_SECS).")
}
-51
View File
@@ -1,51 +0,0 @@
package initcmd
import (
"fmt"
"os"
"github.com/spf13/cobra"
)
const alphaBanner = `
[ALPHA] coolify init targets Coolify v5 and is experimental.
[ALPHA] WireGuard mesh bootstrap requires root/sudo and modifies network configuration.
[ALPHA] Test in non-production environments first. Stability is not guaranteed.
`
// NewInitCommand creates the parent `coolify init` command.
// On bare invocation (no subcommand) it prints the alpha banner and help.
func NewInitCommand() *cobra.Command {
flags := &InitFlags{}
cmd := &cobra.Command{
Use: "init",
Short: "[ALPHA] Initialize WireGuard mesh for Coolify v5",
Long: `[ALPHA] Bootstrap a WireGuard full-mesh overlay between servers and
provision each host with the Coolify v5 runtime stack: Podman + bridge
network, default-deny iptables scaffold, and the coold/corrosion
control-plane agents.
Subcommands:
plan Show what would change without touching anything (--intent
selects the filter: bootstrap / extend / upgrade).
bootstrap First-time install (all actions allowed).
extend Add new hosts to an existing mesh; existing hosts get only
peer-refresh actions.
upgrade Bump agent versions (coold / corrosion / scheduler / builder);
WG / podman / firewall untouched.`,
RunE: func(cmd *cobra.Command, _ []string) error {
fmt.Fprint(os.Stderr, alphaBanner)
return cmd.Help()
},
}
bindInitFlags(cmd, flags)
cmd.AddCommand(NewPlanCommand(flags))
cmd.AddCommand(NewBootstrapCommand(flags))
cmd.AddCommand(NewExtendCommand(flags))
cmd.AddCommand(NewUpgradeCommand(flags))
return cmd
}
-175
View File
@@ -1,175 +0,0 @@
package initcmd
import (
"context"
"fmt"
"os"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// NewPlanCommand creates the `coolify init plan` subcommand.
func NewPlanCommand(flags *InitFlags) *cobra.Command {
var intentFlag string
cmd := &cobra.Command{
Use: "plan",
Short: "Show WireGuard mesh changes without applying them",
Long: `Reconstruct the current WireGuard state from each server via SSH and
show the actions that apply would execute. Nothing is changed.
Pass --intent to preview a specific subcommand's behavior (bootstrap, extend,
upgrade). bootstrap is the default and matches the pre-split behavior.`,
RunE: func(cmd *cobra.Command, _ []string) error {
fmt.Fprint(os.Stderr, alphaBanner)
flags.Intent = intentFlag
return runPlan(cmd.Context(), cmd, flags)
},
}
cmd.Flags().StringVar(&intentFlag, "intent", "bootstrap",
`Preview filter: "bootstrap" (all actions), "extend" (treat --new-hosts as fresh, existing hosts peer-refresh only), "upgrade" (version bumps only).`)
return cmd
}
func runPlan(ctx context.Context, cmd *cobra.Command, flags *InitFlags) error {
if err := validatePlanFlags(flags); err != nil {
return err
}
desired, err := buildDesired(flags)
if err != nil {
return err
}
if err := wireguard.ValidateIntent(desired); err != nil {
return err
}
sshClient, err := flags.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
fmt.Fprintf(os.Stderr, "Probing %d server(s)...\n", len(flags.Servers))
current, err := wireguard.Reconstruct(ctx, sshClient, flags.Servers,
flags.SSHUser, flags.SSHPort, flags.WGInterface,
flags.Namespaces, flags.Concurrency)
if err != nil {
fmt.Fprintf(os.Stderr, "Warning: %v\n", err)
}
plan, err := wireguard.BuildPlan(desired, current)
if err != nil {
return fmt.Errorf("build plan: %w", err)
}
for _, w := range plan.Warnings {
fmt.Fprintf(os.Stderr, "Warning [%s]: %s\n", w.Host, w.Reason)
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
intent := intentLabel(flags.Intent)
if plan.IsEmpty() && len(plan.Skipped) == 0 {
msg := "No changes needed. Mesh is already converged."
if format == output.FormatJSON {
out := models.PlanOutput{
Servers: flags.Servers,
Intent: intent,
Actions: []models.PlanActionRow{},
Warnings: warningsToStrings(plan.Warnings),
}
formatter, _ := output.NewFormatter(format, output.Options{Writer: os.Stdout})
return formatter.Format(out)
}
fmt.Println(msg)
return nil
}
rows := make([]models.PlanActionRow, len(plan.Actions))
for i, a := range plan.Actions {
rows[i] = models.PlanActionRow{
Server: a.Host,
Action: string(a.Type),
Detail: a.Detail,
}
}
skipped := skippedRows(plan.Skipped)
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
if format == output.FormatJSON || format == output.FormatPretty {
return formatter.Format(models.PlanOutput{
Servers: flags.Servers,
Intent: intent,
Actions: rows,
Skipped: skipped,
Warnings: warningsToStrings(plan.Warnings),
})
}
if len(rows) > 0 {
if err := formatter.Format(rows); err != nil {
return err
}
} else {
fmt.Println("No actions scheduled.")
}
if len(skipped) > 0 {
fmt.Fprintln(os.Stderr)
fmt.Fprintln(os.Stderr, "Skipped by intent filter:")
for _, s := range skipped {
fmt.Fprintf(os.Stderr, " [%s] %s — %s\n", s.Server, s.Action, s.Reason)
}
}
return nil
}
func validatePlanFlags(f *InitFlags) error {
if err := f.Validate(); err != nil {
return err
}
return f.ValidateNamespaces()
}
// warningsToStrings formats allocator warnings as human-readable strings.
func warningsToStrings(ws []wireguard.Warning) []string {
if len(ws) == 0 {
return nil
}
out := make([]string, len(ws))
for i, w := range ws {
out[i] = fmt.Sprintf("[%s] %s", w.Host, w.Reason)
}
return out
}
// skippedRows converts the plan's intent-filtered actions into render rows.
func skippedRows(ss []wireguard.SkippedAction) []models.PlanSkippedRow {
if len(ss) == 0 {
return nil
}
out := make([]models.PlanSkippedRow, len(ss))
for i, s := range ss {
out[i] = models.PlanSkippedRow{
Server: s.Action.Host,
Action: string(s.Action.Type),
Reason: s.Reason,
}
}
return out
}
// intentLabel normalizes an empty or zero intent to "bootstrap" for display.
func intentLabel(raw string) string {
if raw == "" {
return "bootstrap"
}
return raw
}
-68
View File
@@ -1,68 +0,0 @@
package initcmd
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coollabsio/coolify-cli/cmd/common"
)
// TestValidatePlanFlags checks required flag validation.
func TestValidatePlanFlags(t *testing.T) {
t.Run("missing servers", func(t *testing.T) {
err := validatePlanFlags(&InitFlags{
SSHMeshFlags: common.SSHMeshFlags{SSHKey: "/path/to/key"},
})
require.Error(t, err)
assert.Contains(t, err.Error(), "--servers")
})
t.Run("missing ssh key", func(t *testing.T) {
err := validatePlanFlags(&InitFlags{
SSHMeshFlags: common.SSHMeshFlags{Servers: []string{"1.1.1.1"}},
})
require.Error(t, err)
assert.Contains(t, err.Error(), "--ssh-key")
})
t.Run("valid", func(t *testing.T) {
err := validatePlanFlags(&InitFlags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"1.1.1.1"},
SSHKey: "/path/to/key",
},
MeshNetFlags: common.MeshNetFlags{
Namespaces: []string{common.DefaultNamespace},
},
})
require.NoError(t, err)
})
t.Run("invalid namespace", func(t *testing.T) {
err := validatePlanFlags(&InitFlags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"1.1.1.1"},
SSHKey: "/path/to/key",
},
MeshNetFlags: common.MeshNetFlags{
Namespaces: []string{"Not Valid"},
},
})
require.Error(t, err)
assert.Contains(t, err.Error(), "invalid namespace")
})
}
// TestShouldSkipGate verifies the alpha gate bypass logic.
func TestShouldSkipGate(t *testing.T) {
// --yes flag
assert.True(t, shouldSkipGate(&InitFlags{Yes: true}))
// Without --yes and without env var, behaviour depends on TTY.
// We can't reliably test the TTY path in unit tests, but we can
// confirm the env-var bypass.
t.Setenv("COOLIFY_NON_INTERACTIVE", "1")
assert.True(t, shouldSkipGate(&InitFlags{}))
}
-38
View File
@@ -1,38 +0,0 @@
package initcmd
import (
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// NewUpgradeCommand creates the `coolify init upgrade` subcommand: bumps
// coold/corrosion/scheduler/builder binaries across every host. Does not touch
// WG config, podman networks, firewall rules, or the corrosion schema. Rejects
// "nightly" version tags unless --allow-nightly is set.
func NewUpgradeCommand(flags *InitFlags) *cobra.Command {
cmd := &cobra.Command{
Use: "upgrade",
Short: "Bump agent binary versions (coold / corrosion / scheduler / builder) on every host",
Long: `Upgrade the agent binaries managed by coolify init across every host in
--servers. Only binary-fetch actions and their follow-up service restarts
run; WG config, podman networks, firewall rules, and the corrosion schema
are left untouched.
Pin each binary with --coold-version / --corrosion-version /
--scheduler-version. "nightly" is rejected by default because it forces a
re-install on every run; pass --allow-nightly to override.`,
RunE: func(cmd *cobra.Command, _ []string) error {
flags.Intent = string(wireguard.IntentUpgrade)
return runApply(cmd.Context(), cmd, flags, applyOptions{
SkipAlphaGate: true,
Header: "Upgrading agent binaries...",
})
},
}
cmd.Flags().BoolVar(&flags.AllowNightly, "allow-nightly", false,
"Permit --coold-version/--corrosion-version/--scheduler-version=nightly. Off by default because nightly re-installs on every run instead of only when the pinned version changes.")
return cmd
}
-4
View File
@@ -15,9 +15,7 @@ import (
"github.com/coollabsio/coolify-cli/cmd/context"
"github.com/coollabsio/coolify-cli/cmd/database"
"github.com/coollabsio/coolify-cli/cmd/deployment"
"github.com/coollabsio/coolify-cli/cmd/firewall"
"github.com/coollabsio/coolify-cli/cmd/github"
initcmd "github.com/coollabsio/coolify-cli/cmd/init"
"github.com/coollabsio/coolify-cli/cmd/privatekeys"
"github.com/coollabsio/coolify-cli/cmd/project"
"github.com/coollabsio/coolify-cli/cmd/resources"
@@ -93,9 +91,7 @@ func init() {
rootCmd.AddCommand(context.NewContextCommand())
rootCmd.AddCommand(database.NewDatabaseCommand())
rootCmd.AddCommand(deployment.NewDeploymentCommand())
rootCmd.AddCommand(firewall.NewFirewallCommand())
rootCmd.AddCommand(github.NewGitHubCommand())
rootCmd.AddCommand(initcmd.NewInitCommand())
rootCmd.AddCommand(privatekeys.NewPrivateKeysCommand())
rootCmd.AddCommand(project.NewProjectCommand())
rootCmd.AddCommand(resources.NewResourceCommand())
+5 -1
View File
@@ -33,7 +33,11 @@ func NewGetCommand() *cobra.Command {
}
format, _ := cmd.Flags().GetString("format")
formatter, err := output.NewFormatter(format, output.Options{})
showSensitive, _ := cmd.Flags().GetBool("show-sensitive")
formatter, err := output.NewFormatter(format, output.Options{
ShowSensitive: showSensitive,
})
if err != nil {
return fmt.Errorf("failed to create formatter: %w", err)
}
+5 -1
View File
@@ -31,7 +31,11 @@ func NewListCommand() *cobra.Command {
}
format, _ := cmd.Flags().GetString("format")
formatter, err := output.NewFormatter(format, output.Options{})
showSensitive, _ := cmd.Flags().GetBool("show-sensitive")
formatter, err := output.NewFormatter(format, output.Options{
ShowSensitive: showSensitive,
})
if err != nil {
return fmt.Errorf("failed to create formatter: %w", err)
}
+57
View File
@@ -0,0 +1,57 @@
#!/bin/bash
set -e
echo "🔧 Setting up Coolify CLI workspace..."
# Check if Go is installed
if ! command -v go &> /dev/null; then
echo "❌ Error: Go is not installed"
echo "Please install Go 1.24+ from https://go.dev/dl/"
exit 1
fi
# Check Go version
GO_VERSION=$(go version | awk '{print $3}' | sed 's/go//')
MAJOR_MINOR=$(echo $GO_VERSION | cut -d. -f1,2)
# Compare version (must be 1.24 or higher)
if [ $(echo "$MAJOR_MINOR" | awk -F. '{print ($1 * 100) + $2}') -lt 124 ]; then
echo "❌ Error: Go version 1.24+ is required"
echo "Current version: $GO_VERSION"
echo "Please upgrade Go from https://go.dev/dl/"
exit 1
fi
echo "✅ Go version $GO_VERSION detected"
# Download dependencies
echo "📦 Downloading dependencies..."
if ! go mod download; then
echo "❌ Error: Failed to download dependencies"
exit 1
fi
echo "✅ Dependencies downloaded"
# Install air if not already installed
if ! command -v air &> /dev/null; then
echo "📦 Installing air (Go file watcher)..."
if ! go install github.com/air-verse/air@latest; then
echo "⚠️ Warning: Failed to install air, but continuing..."
else
echo "✅ air installed successfully"
fi
else
echo "✅ air already installed"
fi
# Build the binary
echo "🔨 Building coolify binary..."
if ! go build -o coolify ./coolify; then
echo "❌ Error: Build failed"
exit 1
fi
echo "✅ Binary built successfully: ./coolify/coolify"
echo "🎉 Workspace setup complete!"
echo "🔥 Use the run script for hot reload during development"
+7
View File
@@ -0,0 +1,7 @@
{
"scripts": {
"setup": "./conductor-setup.sh",
"run": "~/go/bin/air"
},
"runScriptMode": "nonconcurrent"
}
+5 -7
View File
@@ -1,20 +1,16 @@
module github.com/coollabsio/coolify-cli
go 1.25.0
go 1.24.13
require (
github.com/adrg/xdg v0.5.3
github.com/creativeprojects/go-selfupdate v1.5.1
github.com/golang-jwt/jwt/v5 v5.3.1
github.com/hashicorp/go-version v1.7.0
github.com/mattn/go-isatty v0.0.20
github.com/olekukonko/tablewriter v1.1.2
github.com/spf13/cobra v1.10.1
github.com/spf13/pflag v1.0.10
github.com/spf13/viper v1.21.0
github.com/stretchr/testify v1.11.1
golang.org/x/crypto v0.50.0
golang.org/x/term v0.42.0
)
require (
@@ -37,6 +33,7 @@ require (
github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/mattn/go-colorable v0.1.13 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-runewidth v0.0.19 // indirect
github.com/olekukonko/cat v0.0.0-20250911104152-50322a0618f6 // indirect
github.com/olekukonko/errors v1.1.0 // indirect
@@ -51,9 +48,10 @@ require (
github.com/ulikunitz/xz v0.5.15 // indirect
github.com/xanzy/go-gitlab v0.115.0 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/crypto v0.45.0 // indirect
golang.org/x/oauth2 v0.32.0 // indirect
golang.org/x/sys v0.43.0 // indirect
golang.org/x/text v0.36.0 // indirect
golang.org/x/sys v0.38.0 // indirect
golang.org/x/text v0.31.0 // indirect
golang.org/x/time v0.14.0 // indirect
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
+8 -10
View File
@@ -31,8 +31,6 @@ github.com/go-fed/httpsig v1.1.0 h1:9M+hb0jkEICD8/cAiNqEB66R87tTINszBRTjwjQzWcI=
github.com/go-fed/httpsig v1.1.0/go.mod h1:RCMrTZvN1bJYtofsG4rd5NaO5obxQ5xBkdiS7xsT7bM=
github.com/go-viper/mapstructure/v2 v2.4.0 h1:EBsztssimR/CONLSZZ04E8qAkxNYq4Qp9LvH92wZUgs=
github.com/go-viper/mapstructure/v2 v2.4.0/go.mod h1:oJDH3BJKyqBA2TXFhDsKDGDTlndYOZ6rGS0BRZIxGhM=
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
@@ -105,8 +103,8 @@ go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/crypto v0.0.0-20210513164829-c07d793c2f9a/go.mod h1:P+XmwS30IXTQdn5tA2iutPOUgjI07+tq3H3K9MVA1s8=
golang.org/x/crypto v0.50.0 h1:zO47/JPrL6vsNkINmLoo/PH1gcxpls50DNogFvB5ZGI=
golang.org/x/crypto v0.50.0/go.mod h1:3muZ7vA7PBCE6xgPX7nkzzjiUq87kRItoJQM1Yo8S+Q=
golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q=
golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4=
golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
@@ -118,15 +116,15 @@ golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7w
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.43.0 h1:Rlag2XtaFTxp19wS8MXlJwTvoh8ArU6ezoyFsMyCTNI=
golang.org/x/sys v0.43.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc=
golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.42.0 h1:UiKe+zDFmJobeJ5ggPwOshJIVt6/Ft0rcfrXZDLWAWY=
golang.org/x/term v0.42.0/go.mod h1:Dq/D+snpsbazcBG5+F9Q1n2rXV8Ma+71xEjTRufARgY=
golang.org/x/term v0.37.0 h1:8EGAD0qCmHYZg6J17DvsMy9/wJ7/D/4pV/wfnld5lTU=
golang.org/x/term v0.37.0/go.mod h1:5pB4lxRNYYVZuTLmy8oR2BH8dflOR+IbTYFD8fi3254=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.36.0 h1:JfKh3XmcRPqZPKevfXVpI1wXPTqbkE5f7JA92a55Yxg=
golang.org/x/text v0.36.0/go.mod h1:NIdBknypM8iqVmPiuco0Dh6P5Jcdk8lJL0CUebqK164=
golang.org/x/text v0.31.0 h1:aC8ghyu4JhP8VojJ2lEHBnochRno1sgL6nEi9WGFGMM=
golang.org/x/text v0.31.0/go.mod h1:tKRAlv61yKIjGGHX/4tP1LTbc13YSec1pxVEWXzfoeM=
golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI=
golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
-308
View File
@@ -1,308 +0,0 @@
package firewall
import (
"context"
"encoding/json"
"fmt"
"net"
"sort"
"strings"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// CooldAPIBasePath is the path prefix the coold REST router serves under.
// Mirrors `src/firewall/api.rs` in the coold repo.
const CooldAPIBasePath = "/api/v1/firewall"
// CooldAPITokenPath is the remote file coold reads its bearer token from.
// Kept in sync with internal/services/coold.go — the CLI falls back to
// reading this file over SSH when the user hasn't supplied --coold-token.
const CooldAPITokenPath = "/etc/coolify/api-token" //nolint:gosec // filesystem path, not a credential
// FetchCooldToken SSHes into host and reads the coold bearer token at
// CooldAPITokenPath. Each host generates its own random token at install
// time (see EnsureCooldAPITokenCommand), so per-host fetch is the default
// path when the user hasn't provided a global --coold-token override.
func FetchCooldToken(
ctx context.Context,
runner ssh.Runner,
host, user string,
sshPort int,
) (string, error) {
cmd := "cat " + CooldAPITokenPath
stdout, stderr, err := runner.Run(ctx, host, user, sshPort, cmd)
if err != nil {
return "", fmt.Errorf("fetch coold token from %s: %w (stderr: %s)",
host, err, strings.TrimSpace(stderr))
}
tok := strings.TrimSpace(stdout)
if tok == "" {
return "", fmt.Errorf("coold token on %s is empty — is coold installed? (expected at %s)",
host, CooldAPITokenPath)
}
return tok, nil
}
// cooldRulePayload mirrors the JSON shape coold's REST API expects on POST
// and returns on GET /allow. Kept aligned with coold/src/firewall/rule.rs:
// namespace is required (defaults to "default" on the wire), src/dst are
// string IPs, proto/port/id are omitted when absent.
type cooldRulePayload struct {
Namespace string `json:"namespace"`
Src string `json:"src"`
Dst string `json:"dst"`
Proto string `json:"proto,omitempty"`
Port uint16 `json:"port,omitempty"`
ID string `json:"id,omitempty"`
}
// toAllowRule converts a payload coming back from coold into the CLI's
// AllowRule. The host field is filled in by the caller (it is the mesh host
// the list came from, not part of the payload).
func (p cooldRulePayload) toAllowRule() (AllowRule, bool) {
src := net.ParseIP(p.Src)
dst := net.ParseIP(p.Dst)
if src == nil || dst == nil {
return AllowRule{}, false
}
ns := p.Namespace
if ns == "" {
ns = "default"
}
r := AllowRule{
Namespace: ns,
Src: src,
Dst: dst,
Proto: p.Proto,
Port: int(p.Port),
}
if p.ID != "" {
r.Comment = "cid:" + p.ID
}
return r, true
}
// allowRulePayload converts an AllowRule into the wire shape coold accepts.
// coold normalizes and computes the id itself, so we send only the tuple.
// Empty namespace is materialized as "default" on the wire so older coold
// builds with a default-only schema keep working.
func allowRulePayload(r AllowRule) cooldRulePayload {
ns := r.Namespace
if ns == "" {
ns = "default"
}
p := cooldRulePayload{
Namespace: ns,
Src: r.Src.String(),
Dst: r.Dst.String(),
Proto: r.Proto,
}
if r.Port > 0 {
p.Port = uint16(r.Port)
}
return p
}
// CooldApply POSTs r to coold's /allow endpoint on host. coold is reached
// via SSH-bounce: SSH into host, curl localhost wg0 mgmt IP. This is the
// transport of choice for the alpha because the CLI runs on a laptop that
// isn't a mesh peer — only hosts inside the wg0 network can reach coold.
func CooldApply(
ctx context.Context,
runner ssh.Runner,
host, user string,
sshPort, cooldPort int,
iface, token string,
r AllowRule,
) error {
body, err := json.Marshal(allowRulePayload(r))
if err != nil {
return fmt.Errorf("marshal allow rule: %w", err)
}
cmd := buildCurlAllow(iface, token, cooldPort, string(body))
if _, stderr, err := runner.Run(ctx, host, user, sshPort, cmd); err != nil {
return fmt.Errorf("coold apply on %s: %w (stderr: %s)",
host, err, strings.TrimSpace(stderr))
}
return nil
}
// CooldRevoke DELETEs rule id from coold on host. coold returns 204 even
// when the id is unknown, so missing rules are a silent no-op.
func CooldRevoke(
ctx context.Context,
runner ssh.Runner,
host, user string,
sshPort, cooldPort int,
iface, token, id string,
) error {
if id == "" {
return fmt.Errorf("coold revoke: empty id")
}
cmd := buildCurlRevoke(iface, token, cooldPort, id)
if _, stderr, err := runner.Run(ctx, host, user, sshPort, cmd); err != nil {
return fmt.Errorf("coold revoke on %s: %w (stderr: %s)",
host, err, strings.TrimSpace(stderr))
}
return nil
}
// CooldList GETs coold's /allow endpoint on host and returns the parsed
// rules. An empty namespace means "all namespaces"; a non-empty value is
// forwarded to coold as `?namespace=<ns>`. Missing coold (no wg0 interface)
// is treated as an empty slice so a partially-deployed mesh doesn't break
// `firewall list`.
func CooldList(
ctx context.Context,
runner ssh.Runner,
host, user string,
sshPort, cooldPort int,
iface, token, namespace string,
) ([]AllowRule, error) {
cmd := buildCurlList(iface, token, cooldPort, namespace)
stdout, stderr, err := runner.Run(ctx, host, user, sshPort, cmd)
if err != nil {
return nil, fmt.Errorf("coold list on %s: %w (stderr: %s)",
host, err, strings.TrimSpace(stderr))
}
stdout = strings.TrimSpace(stdout)
if stdout == "" {
return nil, nil
}
var payloads []cooldRulePayload
if err := json.Unmarshal([]byte(stdout), &payloads); err != nil {
return nil, fmt.Errorf("parse coold list on %s: %w (body: %s)",
host, err, stdout)
}
out := make([]AllowRule, 0, len(payloads))
for _, p := range payloads {
r, ok := p.toAllowRule()
if !ok {
continue
}
r.Host = host
out = append(out, r)
}
return out, nil
}
// CooldListAll fans CooldList across every host in parallel and returns a
// stably-sorted flattened slice plus the per-host results. tokenFor is
// called once per host on its worker goroutine — fail here and the host
// surfaces as a ServerResult.Err instead of polluting the rule slice. An
// empty namespace forwards `?namespace=` omitted (coold returns all).
func CooldListAll(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
sshPort, cooldPort int,
iface string,
tokenFor func(host string) (string, error),
concurrency int,
namespace string,
) ([]AllowRule, []ssh.ServerResult[[]AllowRule]) {
results := ssh.ForEachServer(ctx, hosts, concurrency,
func(ctx context.Context, host string) ([]AllowRule, error) {
token, err := tokenFor(host)
if err != nil {
return nil, err
}
return CooldList(ctx, runner, host, user, sshPort, cooldPort, iface, token, namespace)
})
var all []AllowRule
for _, r := range results {
all = append(all, r.Result...)
}
sort.Slice(all, func(i, j int) bool {
if all[i].Host != all[j].Host {
return all[i].Host < all[j].Host
}
if all[i].Namespace != all[j].Namespace {
return all[i].Namespace < all[j].Namespace
}
si, sj := all[i].Src.String(), all[j].Src.String()
if si != sj {
return si < sj
}
di, dj := all[i].Dst.String(), all[j].Dst.String()
if di != dj {
return di < dj
}
return all[i].Port < all[j].Port
})
return all, results
}
// shellSingleQuote wraps s in POSIX-shell single quotes, escaping any
// embedded single quotes. Used to embed JSON bodies and tokens into shell
// commands without breaking quoting.
func shellSingleQuote(s string) string {
return "'" + strings.ReplaceAll(s, "'", `'\''`) + "'"
}
// DefaultWGInterface is the WireGuard interface name the firewall CLI
// assumes when no override is supplied. Matches the default of
// `coolify init --wg-interface`.
const DefaultWGInterface = "wg0"
// mgmtIPScript discovers coold's bind IP on the remote host by reading the
// first IPv4 address on the host's WireGuard interface. Emitted as part of
// every curl command so the CLI doesn't need to track per-host mgmt IPs
// (they are already encoded in the host's own WG interface).
func mgmtIPScript(iface string) string {
return fmt.Sprintf(
`MGMT=$(ip -4 -o addr show %[1]s 2>/dev/null | awk '{print $4}' | cut -d/ -f1); `+
`test -n "$MGMT" || { echo "coold mgmt IP (%[1]s) not found on $(hostname) — is coold installed?" >&2; exit 1; }; `,
iface)
}
// mgmtIPScriptSoft is the same as mgmtIPScript but treats a missing WG
// interface as "no rules" rather than a failure. Used by list so a host
// without coold is simply absent from the output instead of aborting the
// whole fanout.
func mgmtIPScriptSoft(iface string) string {
return fmt.Sprintf(
`MGMT=$(ip -4 -o addr show %s 2>/dev/null | awk '{print $4}' | cut -d/ -f1); `+
`if [ -z "$MGMT" ]; then echo '[]'; exit 0; fi; `,
iface)
}
// buildCurlAllow returns the shell one-liner that POSTs body to coold.
// Token is embedded inline in the -H header; on the remote it is briefly
// visible in /proc/<curl-pid>/cmdline to root only, for the ~ms lifetime of
// the curl invocation. Acceptable for alpha; TLS + stdin-fed tokens are a
// follow-up.
func buildCurlAllow(iface, token string, port int, body string) string {
return mgmtIPScript(iface) +
`curl -fsS --max-time 10 ` +
`-H ` + shellSingleQuote("Authorization: Bearer "+token) + ` ` +
`-H 'Content-Type: application/json' ` +
`-X POST -d ` + shellSingleQuote(body) + ` ` +
fmt.Sprintf(`"http://$MGMT:%d%s/allow"`, port, CooldAPIBasePath)
}
// buildCurlRevoke returns the shell one-liner that DELETEs rule id.
func buildCurlRevoke(iface, token string, port int, id string) string {
return mgmtIPScript(iface) +
`curl -fsS --max-time 10 -o /dev/null ` +
`-H ` + shellSingleQuote("Authorization: Bearer "+token) + ` ` +
`-X DELETE ` +
fmt.Sprintf(`"http://$MGMT:%d%s/allow/%s"`, port, CooldAPIBasePath, id)
}
// buildCurlList returns the shell one-liner that GETs /allow. A missing
// WG interface returns an empty JSON array so the caller sees "no rules"
// instead of a transport error. A non-empty namespace is forwarded as
// ?namespace=<ns>.
func buildCurlList(iface, token string, port int, namespace string) string {
query := ""
if namespace != "" {
query = "?namespace=" + namespace
}
return mgmtIPScriptSoft(iface) +
`curl -fsS --max-time 10 ` +
`-H ` + shellSingleQuote("Authorization: Bearer "+token) + ` ` +
fmt.Sprintf(`"http://$MGMT:%d%s/allow%s"`, port, CooldAPIBasePath, query)
}
-200
View File
@@ -1,200 +0,0 @@
package firewall
import (
"context"
"net"
"strings"
"sync"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// fakeCooldRunner is a minimal Runner for client-level tests. It captures
// every command and replies based on substring-matched canned responses.
// mu guards calls against concurrent appends from ForEachServer's parallel
// goroutines.
type fakeCooldRunner struct {
mu sync.Mutex
responses map[string]string
calls []string
}
func (f *fakeCooldRunner) Run(_ context.Context, _, _ string, _ int, cmd string) (string, string, error) {
f.mu.Lock()
f.calls = append(f.calls, cmd)
f.mu.Unlock()
for sub, resp := range f.responses {
if strings.Contains(cmd, sub) {
return resp, "", nil
}
}
return "", "", nil
}
var _ ssh.Runner = (*fakeCooldRunner)(nil)
func TestShellSingleQuote_Escapes(t *testing.T) {
assert.Equal(t, `'plain'`, shellSingleQuote("plain"))
assert.Equal(t, `'it'\''s'`, shellSingleQuote("it's"))
}
func TestBuildCurlAllow_Shape(t *testing.T) {
cmd := buildCurlAllow("wg0", "tok-xyz", 8443, `{"src":"10.0.0.1","dst":"10.0.0.2"}`)
assert.Contains(t, cmd, "ip -4 -o addr show wg0")
assert.Contains(t, cmd, "curl -fsS")
assert.Contains(t, cmd, "Authorization: Bearer tok-xyz")
assert.Contains(t, cmd, "Content-Type: application/json")
assert.Contains(t, cmd, "-X POST")
assert.Contains(t, cmd, `{"src":"10.0.0.1","dst":"10.0.0.2"}`)
assert.Contains(t, cmd, `:8443/api/v1/firewall/allow`)
}
func TestBuildCurlRevoke_Shape(t *testing.T) {
cmd := buildCurlRevoke("wg0", "tok-xyz", 8443, "abc123def456")
assert.Contains(t, cmd, "curl -fsS")
assert.Contains(t, cmd, "-X DELETE")
assert.Contains(t, cmd, "Authorization: Bearer tok-xyz")
assert.Contains(t, cmd, `:8443/api/v1/firewall/allow/abc123def456`)
}
func TestBuildCurlList_SoftMgmtIP(t *testing.T) {
cmd := buildCurlList("wg0", "tok-xyz", 8443, "")
// Missing wg0 yields an empty array and success exit.
assert.Contains(t, cmd, `echo '[]'; exit 0`)
assert.Contains(t, cmd, "Authorization: Bearer tok-xyz")
assert.Contains(t, cmd, `:8443/api/v1/firewall/allow`)
// Empty namespace → no query string.
assert.NotContains(t, cmd, "namespace=")
}
// TestBuildCurlList_WithNamespace verifies that a non-empty namespace is
// forwarded as ?namespace=<ns> so coold can filter on its side.
func TestBuildCurlList_WithNamespace(t *testing.T) {
cmd := buildCurlList("wg0", "tok-xyz", 8443, "alpha")
assert.Contains(t, cmd, `:8443/api/v1/firewall/allow?namespace=alpha`)
}
func TestCooldApply_SendsJSONPayload(t *testing.T) {
fr := &fakeCooldRunner{}
r := AllowRule{
Src: net.ParseIP("10.0.0.1"), Dst: net.ParseIP("10.0.0.2"),
Proto: "tcp", Port: 80,
}
err := CooldApply(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", r)
require.NoError(t, err)
assert.Len(t, fr.calls, 1)
assert.Contains(t, fr.calls[0], `"src":"10.0.0.1"`)
assert.Contains(t, fr.calls[0], `"dst":"10.0.0.2"`)
assert.Contains(t, fr.calls[0], `"proto":"tcp"`)
assert.Contains(t, fr.calls[0], `"port":80`)
}
func TestCooldApply_OmitsProtoWhenEmpty(t *testing.T) {
fr := &fakeCooldRunner{}
r := AllowRule{
Src: net.ParseIP("10.0.0.1"), Dst: net.ParseIP("10.0.0.2"),
}
err := CooldApply(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", r)
require.NoError(t, err)
// omitempty drops zero port and empty proto — avoids tripping coold's
// "port requires proto" validation.
assert.NotContains(t, fr.calls[0], `"proto"`)
assert.NotContains(t, fr.calls[0], `"port"`)
}
func TestCooldRevoke_RejectsEmptyID(t *testing.T) {
fr := &fakeCooldRunner{}
err := CooldRevoke(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", "")
require.Error(t, err)
assert.Empty(t, fr.calls, "no SSH call for empty id")
}
func TestCooldList_ParsesJSON(t *testing.T) {
fr := &fakeCooldRunner{responses: map[string]string{
"/api/v1/firewall/allow": `[
{"src":"10.0.0.1","dst":"10.0.0.2","proto":"tcp","port":80,"id":"abc123def456"},
{"src":"10.0.0.3","dst":"10.0.0.4"}
]`,
}}
rules, err := CooldList(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", "")
require.NoError(t, err)
assert.Len(t, rules, 2)
assert.Equal(t, "h1", rules[0].Host)
assert.Equal(t, "cid:abc123def456", rules[0].Comment)
assert.Equal(t, "tcp", rules[0].Proto)
assert.Equal(t, 80, rules[0].Port)
// Rule without proto/port/id comes through with zero values, no cid.
assert.Empty(t, rules[1].Proto)
assert.Equal(t, 0, rules[1].Port)
assert.Empty(t, rules[1].Comment)
}
func TestCooldList_EmptyBody(t *testing.T) {
fr := &fakeCooldRunner{}
rules, err := CooldList(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", "")
require.NoError(t, err)
assert.Empty(t, rules)
}
func TestCooldListAll_SortsByHost(t *testing.T) {
// Fake returns the same JSON regardless of host; the sort guarantees the
// fanout output is stable across runs.
fr := &fakeCooldRunner{responses: map[string]string{
"/api/v1/firewall/allow": `[{"src":"10.0.0.1","dst":"10.0.0.2","proto":"tcp","port":80,"id":"aaa111111111"}]`,
}}
tokenFor := func(string) (string, error) { return "t", nil }
rules, results := CooldListAll(context.Background(), fr,
[]string{"hB", "hA"}, "root", 22, 8443, "wg0", tokenFor, 2, "")
assert.Len(t, rules, 2)
assert.Equal(t, "hA", rules[0].Host)
assert.Equal(t, "hB", rules[1].Host)
assert.Len(t, results, 2)
}
func TestFetchCooldToken_ReadsFile(t *testing.T) {
fr := &fakeCooldRunner{responses: map[string]string{
"/etc/coolify/api-token": "deadbeefcafe\n",
}}
tok, err := FetchCooldToken(context.Background(), fr, "h1", "root", 22)
require.NoError(t, err)
assert.Equal(t, "deadbeefcafe", tok)
}
func TestFetchCooldToken_EmptyErrors(t *testing.T) {
fr := &fakeCooldRunner{}
_, err := FetchCooldToken(context.Background(), fr, "h1", "root", 22)
require.Error(t, err)
assert.Contains(t, err.Error(), "is empty")
}
func TestCooldListAll_PropagatesTokenFetchError(t *testing.T) {
fr := &fakeCooldRunner{responses: map[string]string{
"/api/v1/firewall/allow": `[]`,
}}
tokenFor := func(h string) (string, error) {
if h == "hBad" {
return "", assertError("no token")
}
return "t", nil
}
_, results := CooldListAll(context.Background(), fr,
[]string{"hOk", "hBad"}, "root", 22, 8443, "wg0", tokenFor, 2, "")
var okCount, errCount int
for _, r := range results {
if r.Err != nil {
errCount++
} else {
okCount++
}
}
assert.Equal(t, 1, okCount)
assert.Equal(t, 1, errCount)
}
type assertError string
func (e assertError) Error() string { return string(e) }
-173
View File
@@ -1,173 +0,0 @@
package firewall
import (
"context"
"fmt"
"net"
"sort"
"strings"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// Container is a single running podman container on one mesh host and one
// namespace (podman bridge network).
type Container struct {
Host string // SSH host the container runs on
Namespace string // mesh namespace (podman network is coolify-<ns>-mesh)
ID string // short (12-char) podman ID
Name string // podman container name
IP net.IP // IP on the coolify-<ns>-mesh bridge network
}
// discoverScript prints one `id|name|ip` line per running container on the
// target network. Piped through `podman inspect` to resolve the per-network
// IP because `podman ps` doesn't surface that directly. `|| true` keeps the
// script from erroring when podman is absent or the network has no members.
func discoverScript(networkName string) string {
return fmt.Sprintf(
`podman ps --filter network=%[1]s --format '{{.ID}}|{{.Names}}' 2>/dev/null | `+
`while IFS='|' read id name; do `+
` [ -z "$id" ] && continue; `+
` ip=$(podman inspect --format '{{(index .NetworkSettings.Networks %[2]q).IPAddress}}' "$id" 2>/dev/null); `+
` printf '%%s|%%s|%%s\n' "$id" "$name" "$ip"; `+
`done || true`,
networkName, networkName)
}
// ParseDiscoverLine parses one `id|name|ip` line from discoverScript.
// Returns (_, false) when the line is blank or malformed.
func ParseDiscoverLine(line string) (id, name string, ip net.IP, ok bool) {
parts := strings.SplitN(strings.TrimSpace(line), "|", 3)
if len(parts) != 3 {
return "", "", nil, false
}
if parts[0] == "" || parts[1] == "" || parts[2] == "" {
return "", "", nil, false
}
ip = net.ParseIP(parts[2])
if ip == nil {
return "", "", nil, false
}
id = parts[0]
if len(id) > 12 {
id = id[:12]
}
return id, parts[1], ip, true
}
// DiscoverContainers SSHes into host and returns every container on
// networkName (the podman bridge backing namespace) with its bridge IP.
func DiscoverContainers(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
namespace, networkName string,
) ([]Container, error) {
stdout, _, err := runner.Run(ctx, host, user, port, discoverScript(networkName))
if err != nil {
return nil, fmt.Errorf("discover containers on %s: %w", host, err)
}
var out []Container
for _, line := range strings.Split(stdout, "\n") {
id, name, ip, ok := ParseDiscoverLine(line)
if !ok {
continue
}
out = append(out, Container{
Host: host, Namespace: namespace,
ID: id, Name: name, IP: ip,
})
}
sort.Slice(out, func(i, j int) bool {
if out[i].Host != out[j].Host {
return out[i].Host < out[j].Host
}
if out[i].Namespace != out[j].Namespace {
return out[i].Namespace < out[j].Namespace
}
return out[i].Name < out[j].Name
})
return out, nil
}
// DiscoverAll runs DiscoverContainers across every host in parallel.
// Returns a flattened, sort-stable slice plus the per-host results so
// callers can surface partial failures.
func DiscoverAll(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
port int,
namespace, networkName string,
concurrency int,
) ([]Container, []ssh.ServerResult[[]Container]) {
results := ssh.ForEachServer(ctx, hosts, concurrency,
func(ctx context.Context, host string) ([]Container, error) {
return DiscoverContainers(ctx, runner, host, user, port, namespace, networkName)
})
var all []Container
for _, r := range results {
all = append(all, r.Result...)
}
sort.Slice(all, func(i, j int) bool {
if all[i].Host != all[j].Host {
return all[i].Host < all[j].Host
}
if all[i].Namespace != all[j].Namespace {
return all[i].Namespace < all[j].Namespace
}
return all[i].Name < all[j].Name
})
return all, results
}
// DiscoverAllNamespaces runs DiscoverAll for every (namespace, network) pair
// and merges the results. Used by `containers --all-namespaces` and by the
// allow/revoke resolver so references can be matched across every namespace
// the user might have set up on the mesh.
func DiscoverAllNamespaces(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
port int,
namespaces []string,
networkFor func(ns string) string,
concurrency int,
) ([]Container, []ssh.ServerResult[[]Container]) {
var (
all []Container
allResults []ssh.ServerResult[[]Container]
seenHosts = map[string]struct{}{}
)
for _, ns := range namespaces {
nsContainers, results := DiscoverAll(ctx, runner, hosts, user, port,
ns, networkFor(ns), concurrency)
all = append(all, nsContainers...)
for _, r := range results {
// Keep only the first error per host to avoid N-duplicate warnings
// (most errors — SSH failures — are host-level, not per-namespace).
if r.Err == nil {
continue
}
if _, ok := seenHosts[r.Host]; ok {
continue
}
seenHosts[r.Host] = struct{}{}
allResults = append(allResults, r)
}
}
sort.Slice(all, func(i, j int) bool {
if all[i].Host != all[j].Host {
return all[i].Host < all[j].Host
}
if all[i].Namespace != all[j].Namespace {
return all[i].Namespace < all[j].Namespace
}
return all[i].Name < all[j].Name
})
return all, allResults
}
-129
View File
@@ -1,129 +0,0 @@
package firewall
import (
"context"
"strings"
"sync"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestParseDiscoverLine(t *testing.T) {
tests := []struct {
line string
wantOk bool
wantID string
wantNm string
wantIP string
}{
{"abcdef123456|web|10.210.0.10", true, "abcdef123456", "web", "10.210.0.10"},
{"abcdef1234567890|web|10.210.0.10", true, "abcdef123456", "web", "10.210.0.10"},
{"|name|10.0.0.1", false, "", "", ""},
{"id|name|", false, "", "", ""},
{"id|name|not-an-ip", false, "", "", ""},
{"", false, "", "", ""},
{"a|b", false, "", "", ""},
}
for _, tt := range tests {
t.Run(tt.line, func(t *testing.T) {
id, name, ip, ok := ParseDiscoverLine(tt.line)
assert.Equal(t, tt.wantOk, ok)
if !ok {
return
}
assert.Equal(t, tt.wantID, id)
assert.Equal(t, tt.wantNm, name)
assert.Equal(t, tt.wantIP, ip.String())
})
}
}
// fakeRunner is a deterministic ssh.Runner for firewall tests. Responses
// map a command substring to its canned stdout. mu guards calls against
// concurrent appends from ForEachServer's parallel goroutines.
type fakeRunner struct {
mu sync.Mutex
responses map[string]string
calls []string
}
func (f *fakeRunner) Run(_ context.Context, _, _ string, _ int, cmd string) (string, string, error) {
f.mu.Lock()
f.calls = append(f.calls, cmd)
f.mu.Unlock()
for sub, resp := range f.responses {
if strings.Contains(cmd, sub) {
return resp, "", nil
}
}
return "", "", nil
}
func TestDiscoverContainers(t *testing.T) {
r := &fakeRunner{responses: map[string]string{
"podman ps": "abc111111111|web|10.210.0.10\ndef222222222|api|10.210.0.11\n\n",
}}
got, err := DiscoverContainers(context.Background(), r, "h1", "root", 22,
"default", "coolify-default-mesh")
require.NoError(t, err)
assert.Len(t, got, 2)
assert.Equal(t, "api", got[0].Name) // sorted by name
assert.Equal(t, "web", got[1].Name)
assert.Equal(t, "h1", got[0].Host)
assert.Equal(t, "default", got[0].Namespace)
assert.Equal(t, "10.210.0.11", got[0].IP.String())
}
func TestDiscoverContainers_EmptyOutput(t *testing.T) {
r := &fakeRunner{responses: map[string]string{}}
got, err := DiscoverContainers(context.Background(), r, "h1", "root", 22,
"default", "coolify-default-mesh")
require.NoError(t, err)
assert.Empty(t, got)
}
func TestDiscoverContainers_BadLinesSkipped(t *testing.T) {
r := &fakeRunner{responses: map[string]string{
"podman ps": "abc111111111|web|10.210.0.10\ngarbage\n|noid|1.1.1.1\n",
}}
got, err := DiscoverContainers(context.Background(), r, "h1", "root", 22,
"default", "coolify-default-mesh")
require.NoError(t, err)
assert.Len(t, got, 1)
assert.Equal(t, "web", got[0].Name)
}
func TestDiscoverAll_Sorted(t *testing.T) {
r := &fakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|x|10.210.0.10",
}}
all, perHost := DiscoverAll(context.Background(), r,
[]string{"h2", "h1"}, "root", 22,
"default", "coolify-default-mesh", 2)
assert.Len(t, all, 2)
assert.Equal(t, "h1", all[0].Host)
assert.Equal(t, "h2", all[1].Host)
assert.Equal(t, "default", all[0].Namespace)
assert.Len(t, perHost, 2)
}
// TestDiscoverAllNamespaces_MergesAcrossNamespaces verifies that the
// multi-namespace discover fanout emits containers for every (ns, host)
// pair and stamps them with the correct namespace.
func TestDiscoverAllNamespaces_MergesAcrossNamespaces(t *testing.T) {
r := &fakeRunner{responses: map[string]string{
// Same podman ps response for every namespace — we only care that the
// namespace label is applied correctly after parsing.
"podman ps": "aaa111111111|web|10.210.0.10",
}}
networkFor := func(ns string) string { return "coolify-" + ns + "-mesh" }
all, _ := DiscoverAllNamespaces(context.Background(), r,
[]string{"h1"}, "root", 22,
[]string{"default", "alpha"}, networkFor, 2)
assert.Len(t, all, 2)
// Sorted by host, then namespace — alpha before default.
assert.Equal(t, "alpha", all[0].Namespace)
assert.Equal(t, "default", all[1].Namespace)
}
-56
View File
@@ -1,56 +0,0 @@
// Package firewall implements the `coolify firewall` command logic: per-host
// container discovery (SSH+podman) and the SSH-bounced REST client that
// drives the coold agent's firewall surface on each mesh host.
//
// Rule-rendering and iptables IO live entirely in coold now (see the coold
// repo, `src/firewall/`). The CLI's job is to resolve endpoints, compute
// stable rule identities, and POST/DELETE/GET against coold over SSH. Rules
// go on the host that owns the destination IP, matching CONTROL_PLANE.md §3.
package firewall
import (
"crypto/sha256"
"encoding/hex"
"fmt"
"net"
"strings"
)
// AllowRule is a single cross-host container allow entry.
//
// The rule lives on the host that owns Dst's container subnet (the default-
// deny jump fires on `-d <subnet> -j COOLIFY-INTRA`). Src may belong to any
// host in the mesh. Proto/Port are optional; zero values mean "any".
//
// Namespace qualifies the tuple so identical src/dst/proto/port pairs in
// different namespaces produce different rule IDs and are managed
// independently. Empty namespace is normalized to "default" at the transport
// boundary for legacy coold peers.
type AllowRule struct {
Host string // host that owns Dst's container subnet
Namespace string // e.g. "default", "alpha"
Src net.IP
Dst net.IP
Proto string // "tcp" | "udp" | ""
Port int // 0 = any
Comment string // "cid:<12-hex>" stable identity for list/revoke
}
// ComputeID returns a 12-hex stable identity hash over
// (namespace, src, dst, proto, port). Used as the rule comment so `list` can
// display it and `revoke --from ... --to ... --port ...` finds the right rule
// without needing to parse.
//
// Byte-compatible with coold's ComputeID_ (src/firewall/rule.rs): namespace
// defaults to "default" when empty, proto lowercased (empty when unset), port
// rendered as 0 when unset. Mixed writers (CLI + coold) produce identical IDs
// for identical tuples.
func ComputeID(namespace string, src, dst net.IP, proto string, port int) string {
if namespace == "" {
namespace = "default"
}
h := sha256.New()
fmt.Fprintf(h, "%s|%s|%s|%s|%d",
namespace, src.String(), dst.String(), strings.ToLower(proto), port)
return hex.EncodeToString(h.Sum(nil))[:12]
}
-45
View File
@@ -1,45 +0,0 @@
package firewall
import (
"net"
"testing"
"github.com/stretchr/testify/assert"
)
func TestComputeID_Stable(t *testing.T) {
a := ComputeID("default", net.ParseIP("10.210.0.10"), net.ParseIP("10.210.1.10"), "tcp", 80)
b := ComputeID("default", net.ParseIP("10.210.0.10"), net.ParseIP("10.210.1.10"), "tcp", 80)
assert.Equal(t, a, b)
assert.Len(t, a, 12)
}
func TestComputeID_CaseInsensitiveProto(t *testing.T) {
a := ComputeID("default", net.ParseIP("1.1.1.1"), net.ParseIP("2.2.2.2"), "TCP", 80)
b := ComputeID("default", net.ParseIP("1.1.1.1"), net.ParseIP("2.2.2.2"), "tcp", 80)
assert.Equal(t, a, b)
}
func TestComputeID_DifferentInputsDifferent(t *testing.T) {
a := ComputeID("default", net.ParseIP("1.1.1.1"), net.ParseIP("2.2.2.2"), "tcp", 80)
b := ComputeID("default", net.ParseIP("1.1.1.1"), net.ParseIP("2.2.2.2"), "tcp", 443)
assert.NotEqual(t, a, b)
}
// TestComputeID_DifferentNamespacesDifferent verifies that identical
// src/dst/proto/port tuples in different namespaces produce different IDs —
// this is the whole point of per-namespace rule identity.
func TestComputeID_DifferentNamespacesDifferent(t *testing.T) {
a := ComputeID("default", net.ParseIP("10.0.0.1"), net.ParseIP("10.0.0.2"), "tcp", 80)
b := ComputeID("alpha", net.ParseIP("10.0.0.1"), net.ParseIP("10.0.0.2"), "tcp", 80)
assert.NotEqual(t, a, b)
}
// TestComputeID_EmptyNamespaceMatchesDefault guards the wire-compat rule:
// an empty namespace must hash the same as "default" so older coold builds
// and newer CLI callers agree on the same ID.
func TestComputeID_EmptyNamespaceMatchesDefault(t *testing.T) {
empty := ComputeID("", net.ParseIP("10.0.0.1"), net.ParseIP("10.0.0.2"), "tcp", 80)
def := ComputeID("default", net.ParseIP("10.0.0.1"), net.ParseIP("10.0.0.2"), "tcp", 80)
assert.Equal(t, empty, def)
}
-39
View File
@@ -1,39 +0,0 @@
package models
// ContainerRow is a table-friendly row for `coolify firewall containers`.
type ContainerRow struct {
Host string `json:"host"`
Namespace string `json:"namespace"`
ID string `json:"id"`
Name string `json:"name"`
IP string `json:"ip"`
}
// AllowRuleRow is a table-friendly row for `coolify firewall list`.
type AllowRuleRow struct {
Host string `json:"host"`
Namespace string `json:"namespace"`
ID string `json:"id"`
Src string `json:"src"`
Dst string `json:"dst"`
Proto string `json:"proto,omitempty"`
Port int `json:"port,omitempty"`
Comment string `json:"comment,omitempty"`
}
// FirewallContainersOutput is the JSON output for `firewall containers`.
type FirewallContainersOutput struct {
Containers []ContainerRow `json:"containers"`
Errors []string `json:"errors,omitempty"`
}
// FirewallListOutput is the JSON output for `firewall list`.
type FirewallListOutput struct {
Rules []AllowRuleRow `json:"rules"`
Errors []string `json:"errors,omitempty"`
}
// FirewallAllowOutput is the JSON output for `firewall allow` / `revoke`.
type FirewallAllowOutput struct {
Rules []AllowRuleRow `json:"rules"`
}
-48
View File
@@ -1,48 +0,0 @@
package models
// PlanActionRow is a table-friendly row for the plan output.
type PlanActionRow struct {
Server string `json:"server"`
Action string `json:"action"`
Detail string `json:"detail"`
}
// PlanSkippedRow is a table-friendly row for actions the intent filter
// suppressed (shown in the plan preview so operators can see what would have
// run and why).
type PlanSkippedRow struct {
Server string `json:"server"`
Action string `json:"action"`
Reason string `json:"reason"`
}
// ApplyResultRow is a table-friendly row for the apply result output.
type ApplyResultRow struct {
Server string `json:"server"`
Action string `json:"action"`
Status string `json:"status"`
Detail string `json:"detail,omitempty"`
}
// VerifyResultRow is a table-friendly row for post-apply verification.
type VerifyResultRow struct {
Server string `json:"server"`
WireGuardIP string `json:"wireguard_ip"`
PeerCount int `json:"peer_count"`
Status string `json:"status"`
}
// PlanOutput is the structured JSON output for the plan command.
type PlanOutput struct {
Servers []string `json:"servers"`
Intent string `json:"intent,omitempty"`
Actions []PlanActionRow `json:"actions"`
Skipped []PlanSkippedRow `json:"skipped,omitempty"`
Warnings []string `json:"warnings,omitempty"`
}
// ApplyOutput is the structured JSON output for the apply command.
type ApplyOutput struct {
Results []ApplyResultRow `json:"results"`
Verified []VerifyResultRow `json:"verified"`
}
-9
View File
@@ -59,15 +59,6 @@ func (s *ApplicationService) Delete(ctx context.Context, uuid string) error {
return nil
}
// DeletePreview deletes a preview deployment for an application
func (s *ApplicationService) DeletePreview(ctx context.Context, appUUID, prID string) error {
err := s.client.Delete(ctx, fmt.Sprintf("applications/%s/previews/%s", appUUID, prID))
if err != nil {
return fmt.Errorf("failed to delete preview %s for application %s: %w", prID, appUUID, err)
}
return nil
}
// Start starts an application (initiates deployment)
func (s *ApplicationService) Start(ctx context.Context, uuid string, force bool, instantDeploy bool) (*models.ApplicationLifecycleResponse, error) {
var resp models.ApplicationLifecycleResponse
-48
View File
@@ -402,54 +402,6 @@ func TestApplicationService_Delete_Error(t *testing.T) {
assert.Contains(t, err.Error(), "failed to delete application")
}
func TestApplicationService_DeletePreview_Success(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
assert.Equal(t, "/api/v1/applications/app-uuid-123/previews/42", r.URL.Path)
assert.Equal(t, "DELETE", r.Method)
assert.Equal(t, "Bearer test-token", r.Header.Get("Authorization"))
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte(`{"message":"Preview deletion request queued."}`))
}))
defer server.Close()
client := api.NewClient(server.URL, "test-token")
svc := NewApplicationService(client)
err := svc.DeletePreview(context.Background(), "app-uuid-123", "42")
require.NoError(t, err)
}
func TestApplicationService_DeletePreview_NotFound(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusNotFound)
_, _ = w.Write([]byte(`{"message":"Preview not found."}`))
}))
defer server.Close()
client := api.NewClient(server.URL, "test-token")
svc := NewApplicationService(client)
err := svc.DeletePreview(context.Background(), "app-uuid-123", "999")
require.Error(t, err)
assert.Contains(t, err.Error(), "failed to delete preview")
}
func TestApplicationService_DeletePreview_ServerError(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusInternalServerError)
_, _ = w.Write([]byte(`{"message":"internal server error"}`))
}))
defer server.Close()
client := api.NewClient(server.URL, "test-token")
svc := NewApplicationService(client)
err := svc.DeletePreview(context.Background(), "app-uuid-123", "42")
require.Error(t, err)
assert.Contains(t, err.Error(), "failed to delete preview")
}
func TestApplicationService_Start(t *testing.T) {
deploymentUUID := "deploy-uuid-123"
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
-42
View File
@@ -1,42 +0,0 @@
package services
import "fmt"
// BuilderWorkDir is the scratch root coold creates per-build subdirectories
// in when it dispatches a `BuildRequest`. Cleaned per-request by coold.
const BuilderWorkDir = "/var/lib/coolify-builder/work"
// BuilderBinaryPath is the path to the builder binary coold spawns as a
// short-lived subprocess under a `systemd-run --scope` transient unit. No
// long-running builder daemon exists on the host.
const BuilderBinaryPath = "/usr/local/bin/builder"
// BuilderInstallCommand returns a shell snippet that installs buildah + git
// (required by the builder pipeline), ensures the work directory exists,
// and downloads the builder binary from the GitHub release for the given
// version tag. The version tag should track the coold release — builder
// and coold ship from the same workspace.
func BuilderInstallCommand(version string) string {
return fmt.Sprintf(`set -e
DEBIAN_FRONTEND=noninteractive apt-get update -qq 2>/dev/null
DEBIAN_FRONTEND=noninteractive apt-get install -y \
-o Dpkg::Options::="--force-confold" \
buildah git ca-certificates 2>&1 >/dev/null
mkdir -p %[1]s
ARCH_RAW=$(uname -m)
case "$ARCH_RAW" in
x86_64) ARCH=amd64 ;;
aarch64) ARCH=arm64 ;;
*) echo "unsupported arch: $ARCH_RAW" >&2; exit 1 ;;
esac
URL="https://github.com/coollabsio/coold/releases/download/%[2]s/builder-linux-${ARCH}.tar.gz"
DLDIR=$(mktemp -d)
trap 'rm -rf "$DLDIR"' EXIT
curl -fsSL --retry 3 --max-time 120 -o "$DLDIR/builder.tar.gz" "$URL"
tar -xzf "$DLDIR/builder.tar.gz" -C "$DLDIR"
test -f "$DLDIR/builder" || { echo "builder binary not found in tarball" >&2; exit 1; }
install -m 0755 "$DLDIR/builder" %[3]s.tmp
mv %[3]s.tmp %[3]s
echo '%[2]s' > %[3]s.version`,
BuilderWorkDir, version, BuilderBinaryPath)
}
-188
View File
@@ -1,188 +0,0 @@
package services
import (
"fmt"
"net"
"strings"
)
// DefaultCooldDNSZone is the DNS zone served by coold's embedded resolver.
// `.internal` is RFC 6761 reserved — safe from public-TLD collisions.
const DefaultCooldDNSZone = "coolify.internal"
// CooldAPIPort is the TCP port coold's firewall REST API binds on wg0.
const CooldAPIPort = 8443
// CooldAPITokenPath is the on-host path where coold reads the bearer token
// for the firewall REST API. The file is generated once by `coolify init
// apply --install-coold` (random 32-byte hex via `openssl rand`) and kept
// mode 0600.
const CooldAPITokenPath = "/etc/coolify/api-token" //nolint:gosec // filesystem path, not a credential
// CooldNamespace describes one namespace for coold's env var. coold's
// embedded DNS binds <BridgeGateway>:53 per namespace, and its sync loop
// iterates `Network` to discover containers.
type CooldNamespace struct {
Name string // e.g. "default", "alpha"
Network string // e.g. "coolify-default-mesh" — podman bridge name
BridgeGateway net.IP // the .1 of that namespace's per-host container subnet
}
// CooldNamespacesEnvValue renders the COOLD_NAMESPACES env value. Shape:
//
// default:coolify-default-mesh:10.210.0.1,alpha:coolify-alpha-mesh:10.220.0.1
//
// Triples are comma-separated; fields within a triple are colon-separated.
// Empty slice yields empty string so callers can omit the env var entirely.
func CooldNamespacesEnvValue(ns []CooldNamespace) string {
parts := make([]string, 0, len(ns))
for _, n := range ns {
parts = append(parts, fmt.Sprintf("%s:%s:%s", n.Name, n.Network, n.BridgeGateway))
}
return strings.Join(parts, ",")
}
// SchedulerConfig carries optional scheduler connectivity injected into the coold unit
// for non-central hosts. nil means no scheduler env vars are emitted.
type SchedulerConfig struct {
URL string // e.g. "http://100.64.0.1:6443"
JWTPath string // e.g. "/etc/coolify/host-jwt"
}
// BuilderConfig carries the builder-capability env vars coold needs when it
// spawns build subprocesses. nil means the capability is disabled and no
// COOLD_BUILDER_* env vars are emitted.
type BuilderConfig struct {
Capacity int // concurrent builds the host accepts; 0 falls back to 2
CPUQuota string // systemd CPUQuota per build scope; "" falls back to "200%"
MemoryMax string // systemd MemoryMax per build scope; "" falls back to "2G"
TimeoutSecs int // hard per-build timeout in seconds; 0 falls back to 1800
DenyNets []string // extra CIDRs to deny at systemd-run IPAddressDeny level
}
// CooldServiceUnitWithScheduler is like CooldServiceUnit but injects scheduler env
// vars when scheduler is non-nil and builder env vars when builder is non-nil.
// Used for non-central hosts after phase 4.
func CooldServiceUnitWithScheduler(mgmtIP net.IP, namespaces []CooldNamespace, scheduler *SchedulerConfig, builder *BuilderConfig) string {
return cooldServiceUnitInner(mgmtIP, namespaces, scheduler, builder)
}
// CooldServiceUnit renders the coold systemd unit without scheduler or builder
// env (phase-3 first install, before phase 5 rewrites the unit to inject
// scheduler settings).
func CooldServiceUnit(mgmtIP net.IP, namespaces []CooldNamespace) string {
return cooldServiceUnitInner(mgmtIP, namespaces, nil, nil)
}
func cooldServiceUnitInner(mgmtIP net.IP, namespaces []CooldNamespace, scheduler *SchedulerConfig, builder *BuilderConfig) string {
// Wants (not Requires) on corrosion: if corrosion crashes/restarts we want
// coold to stay up and retry — reconcile_once already backs off for 1s on
// error, so it self-heals once corrosion is back. Requires would cascade
// stop coold and leave it down until someone restarted it.
nsEnv := ""
if len(namespaces) > 0 {
nsEnv = fmt.Sprintf(`Environment=COOLD_NAMESPACES=%s
Environment=COOLD_DNS_ZONE=%s
`, CooldNamespacesEnvValue(namespaces), DefaultCooldDNSZone)
}
// Firewall REST API binds wg0-only (never a public interface) and requires
// a bearer token. Plain HTTP for alpha — TLS material is managed by the
// central Coolify control plane and will be wired in a follow-up.
apiEnv := fmt.Sprintf(`Environment=COOLD_API_BIND=%s:%d
Environment=COOLD_API_TOKEN_FILE=%s
`, mgmtIP, CooldAPIPort, CooldAPITokenPath)
schedulerEnv := ""
if scheduler != nil {
schedulerEnv = fmt.Sprintf(`Environment=COOLD_SCHEDULER_URL=%s
Environment=COOLD_HOST_JWT_PATH=%s
`, scheduler.URL, scheduler.JWTPath)
}
builderEnv := ""
builderPre := ""
if builder != nil {
capacity := builder.Capacity
if capacity <= 0 {
capacity = 2
}
cpuQuota := builder.CPUQuota
if cpuQuota == "" {
cpuQuota = "200%"
}
memoryMax := builder.MemoryMax
if memoryMax == "" {
memoryMax = "2G"
}
timeoutSecs := builder.TimeoutSecs
if timeoutSecs <= 0 {
timeoutSecs = 1800
}
denyNets := strings.Join(builder.DenyNets, ",")
builderEnv = fmt.Sprintf(`Environment=COOLD_BUILDER_ENABLED=true
Environment=COOLD_BUILDER_WORK_DIR=%s
Environment=COOLD_BUILDER_CAPACITY=%d
Environment=COOLD_BUILDER_CPU_QUOTA=%s
Environment=COOLD_BUILDER_MEMORY_MAX=%s
Environment=COOLD_BUILDER_TIMEOUT_SECS=%d
Environment=COOLD_BUILDER_BIN=%s
Environment=COOLD_BUILDER_DENY_NETS=%s
`, BuilderWorkDir, capacity, cpuQuota, memoryMax, timeoutSecs, BuilderBinaryPath, denyNets)
builderPre = fmt.Sprintf("ExecStartPre=/bin/mkdir -p %s\n", BuilderWorkDir)
}
return fmt.Sprintf(`[Unit]
Description=Coolify host agent
Wants=corrosion.service
After=corrosion.service network-online.target podman.socket coolify-mesh-fw.service
[Service]
Environment=COOLD_HOST_MGMT_IP=%s
%s%s%s%s%sExecStart=/usr/local/bin/coold
AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_NET_ADMIN CAP_NET_RAW
Restart=on-failure
RestartSec=2s
[Install]
WantedBy=multi-user.target
`, mgmtIP, nsEnv, apiEnv, schedulerEnv, builderEnv, builderPre)
}
// CooldInstallCommand returns a shell snippet that downloads and installs coold
// from the GitHub release for the given version tag (e.g. "nightly", "v1.2.3").
// Architecture is auto-detected on the remote host via uname -m.
// The version tag is written to /usr/local/bin/coold.version after install.
func CooldInstallCommand(version string) string {
return fmt.Sprintf(`set -e
ARCH_RAW=$(uname -m)
case "$ARCH_RAW" in
x86_64) ARCH=amd64 ;;
aarch64) ARCH=arm64 ;;
*) echo "unsupported arch: $ARCH_RAW" >&2; exit 1 ;;
esac
URL="https://github.com/coollabsio/coold/releases/download/%s/coold-linux-${ARCH}.tar.gz"
DLDIR=$(mktemp -d)
trap 'rm -rf "$DLDIR"' EXIT
curl -fsSL --retry 3 --max-time 120 -o "$DLDIR/coold.tar.gz" "$URL"
tar -xzf "$DLDIR/coold.tar.gz" -C "$DLDIR"
test -f "$DLDIR/coold" || { echo "coold binary not found in tarball" >&2; exit 1; }
install -m 0755 "$DLDIR/coold" /usr/local/bin/coold.tmp
mv /usr/local/bin/coold.tmp /usr/local/bin/coold
echo '%s' > /usr/local/bin/coold.version`, version, version)
}
// EnsureCooldAPITokenCommand returns a shell snippet that creates the
// CooldAPITokenPath file with a random 32-byte hex token if it does not
// already exist. Idempotent: repeated runs preserve the existing token so
// clients already trusting it keep working.
func EnsureCooldAPITokenCommand() string {
return fmt.Sprintf(
`mkdir -p /etc/coolify && `+
`if [ ! -s %[1]s ]; then `+
`openssl rand -hex 32 > %[1]s.tmp && `+
`chmod 0600 %[1]s.tmp && `+
`mv %[1]s.tmp %[1]s; `+
`fi`,
CooldAPITokenPath,
)
}
-151
View File
@@ -1,151 +0,0 @@
package services
import (
"net"
"strings"
"testing"
)
func TestCooldInstallCommand_SubstitutesVersion(t *testing.T) {
for _, version := range []string{"nightly", "v1.2.3"} {
cmd := CooldInstallCommand(version)
if !strings.Contains(cmd, version) {
t.Errorf("version %q not found in install command", version)
}
if !strings.Contains(cmd, "coollabsio/coold/releases/download/"+version) {
t.Errorf("release URL missing version %q in:\n%s", version, cmd)
}
if !strings.Contains(cmd, "/usr/local/bin/coold.version") {
t.Errorf("version marker write missing from install command")
}
}
}
func TestCooldInstallCommand_ArchDetection(t *testing.T) {
cmd := CooldInstallCommand("nightly")
for _, want := range []string{
"x86_64) ARCH=amd64",
"aarch64) ARCH=arm64",
"coold-linux-${ARCH}.tar.gz",
"install -m 0755",
} {
if !strings.Contains(cmd, want) {
t.Errorf("expected %q in install command:\n%s", want, cmd)
}
}
}
func TestCooldServiceUnit_EmbedsMgmtIPAndNamespaces(t *testing.T) {
namespaces := []CooldNamespace{
{Name: "default", Network: "coolify-default-mesh", BridgeGateway: net.ParseIP("10.210.7.1")},
{Name: "alpha", Network: "coolify-alpha-mesh", BridgeGateway: net.ParseIP("10.210.8.1")},
}
got := CooldServiceUnit(net.ParseIP("100.64.0.5"), namespaces)
for _, want := range []string{
"Environment=COOLD_HOST_MGMT_IP=100.64.0.5",
"Environment=COOLD_NAMESPACES=default:coolify-default-mesh:10.210.7.1,alpha:coolify-alpha-mesh:10.210.8.1",
"Environment=COOLD_DNS_ZONE=coolify.internal",
"Environment=COOLD_API_BIND=100.64.0.5:8443",
"Environment=COOLD_API_TOKEN_FILE=/etc/coolify/api-token",
"AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_NET_ADMIN CAP_NET_RAW",
"Wants=corrosion.service",
"After=corrosion.service network-online.target podman.socket",
"ExecStart=/usr/local/bin/coold",
} {
if !strings.Contains(got, want) {
t.Errorf("unit missing %q:\n%s", want, got)
}
}
}
func TestCooldServiceUnit_EmptyNamespacesSkipsNamespaceEnv(t *testing.T) {
got := CooldServiceUnit(net.ParseIP("100.64.0.5"), nil)
if strings.Contains(got, "COOLD_NAMESPACES") {
t.Errorf("expected no namespace env when nil, got:\n%s", got)
}
if strings.Contains(got, "COOLD_DNS_ZONE") {
t.Errorf("expected no DNS zone env when nil, got:\n%s", got)
}
if !strings.Contains(got, "Environment=COOLD_HOST_MGMT_IP=100.64.0.5") {
t.Errorf("expected mgmt IP env, got:\n%s", got)
}
}
func TestCooldServiceUnit_EmitsBuilderEnvWhenConfigured(t *testing.T) {
builder := &BuilderConfig{
Capacity: 4,
CPUQuota: "400%",
MemoryMax: "4G",
TimeoutSecs: 900,
DenyNets: []string{"100.64.0.0/16", "10.210.0.0/16"},
}
got := CooldServiceUnitWithScheduler(
net.ParseIP("100.64.0.5"),
nil,
&SchedulerConfig{URL: "http://100.64.0.1:6443", JWTPath: "/etc/coolify/host-jwt"},
builder,
)
for _, want := range []string{
"Environment=COOLD_BUILDER_ENABLED=true",
"Environment=COOLD_BUILDER_CAPACITY=4",
"Environment=COOLD_BUILDER_CPU_QUOTA=400%",
"Environment=COOLD_BUILDER_MEMORY_MAX=4G",
"Environment=COOLD_BUILDER_TIMEOUT_SECS=900",
"Environment=COOLD_BUILDER_DENY_NETS=100.64.0.0/16,10.210.0.0/16",
} {
if !strings.Contains(got, want) {
t.Errorf("unit missing %q:\n%s", want, got)
}
}
}
func TestCooldServiceUnit_BuilderDefaultsWhenZero(t *testing.T) {
builder := &BuilderConfig{} // all zero values
got := CooldServiceUnitWithScheduler(
net.ParseIP("100.64.0.5"),
nil,
&SchedulerConfig{URL: "http://100.64.0.1:6443", JWTPath: "/etc/coolify/host-jwt"},
builder,
)
for _, want := range []string{
"Environment=COOLD_BUILDER_CAPACITY=2",
"Environment=COOLD_BUILDER_CPU_QUOTA=200%",
"Environment=COOLD_BUILDER_MEMORY_MAX=2G",
"Environment=COOLD_BUILDER_TIMEOUT_SECS=1800",
} {
if !strings.Contains(got, want) {
t.Errorf("unit missing default %q:\n%s", want, got)
}
}
}
func TestCooldServiceUnit_OmitsBuilderEnvWhenNil(t *testing.T) {
got := CooldServiceUnitWithScheduler(
net.ParseIP("100.64.0.5"),
nil,
&SchedulerConfig{URL: "http://100.64.0.1:6443", JWTPath: "/etc/coolify/host-jwt"},
nil,
)
if strings.Contains(got, "COOLD_BUILDER_") {
t.Errorf("expected no builder env when nil, got:\n%s", got)
}
}
func TestCooldNamespacesEnvValue_Triples(t *testing.T) {
ns := []CooldNamespace{
{Name: "default", Network: "coolify-default-mesh", BridgeGateway: net.ParseIP("10.210.0.1")},
{Name: "alpha", Network: "coolify-alpha-mesh", BridgeGateway: net.ParseIP("10.220.0.1")},
}
got := CooldNamespacesEnvValue(ns)
want := "default:coolify-default-mesh:10.210.0.1,alpha:coolify-alpha-mesh:10.220.0.1"
if got != want {
t.Errorf("got %q, want %q", got, want)
}
if CooldNamespacesEnvValue(nil) != "" {
t.Errorf("expected empty string for nil slice")
}
}
-125
View File
@@ -1,125 +0,0 @@
// Package services generates configuration for the v5 control-plane
// daemons installed by `coolify init` (corrosion + coold). All functions
// are pure: they emit bytes/strings and do no I/O.
package services
import (
"fmt"
"net"
"sort"
"strings"
)
// CoolifySchemaSQL is the Corrosion schema that coold's sync loop writes to.
//
// Every NOT NULL column MUST have a DEFAULT — corrosion's CR-SQLite backend
// rejects schemas missing defaults with "needs a default value for forward
// schema compatibility". Defaults are never surfaced at runtime because
// coold always provides every column on upsert.
//
// Columns:
// - container_name: globally unique DNS label. coold's embedded resolver
// answers <container_name>.coolify.internal → container_ip. Uniqueness
// is Coolify's responsibility at app-create time.
// - namespace: optional app-scoping key reserved for multi-tenant / per-app
// isolation (e.g. one podman network per namespace). Empty string in
// single-tenant deployments. Opaque DNS-safe string owned by Coolify.
// - state: raw podman container status (running, exited, stopped,
// restarting, paused, created, dead, configured, removing). Liveness.
// - health: podman HEALTHCHECK result. One of:
// "healthy", "unhealthy", "starting", "unknown". "unknown" when the
// container has no HEALTHCHECK declared. Readiness.
const CoolifySchemaSQL = `CREATE TABLE service_endpoints (
container_id TEXT NOT NULL DEFAULT '' PRIMARY KEY,
container_name TEXT NOT NULL DEFAULT '',
namespace TEXT NOT NULL DEFAULT '',
host_mgmt_ip TEXT NOT NULL DEFAULT '',
container_ip TEXT NOT NULL DEFAULT '',
state TEXT NOT NULL DEFAULT '',
health TEXT NOT NULL DEFAULT 'unknown',
updated_at INTEGER NOT NULL DEFAULT 0
);
`
// CorrosionConfigBytes renders /etc/corrosion/config.toml for a single host.
//
// bindAddr is this host's wg0 management IP — gossip is confined to the mesh
// (already encrypted by WireGuard, so plaintext=true is safe).
// peers are the mgmt IPs of all OTHER hosts; they are sorted lexically so the
// output is byte-stable across probe orderings (needed for sha256 drift check).
func CorrosionConfigBytes(bindAddr net.IP, gossipPort, apiPort int, peers []net.IP) []byte {
sorted := make([]string, 0, len(peers))
for _, p := range peers {
if p == nil {
continue
}
sorted = append(sorted, p.String())
}
sort.Strings(sorted)
var b strings.Builder
b.WriteString("# generated by coolify init — do not edit\n")
b.WriteString("[db]\n")
b.WriteString(`path = "/var/lib/corrosion/corrosion.db"` + "\n")
b.WriteString(`schema_paths = ["/etc/corrosion/schemas"]` + "\n")
b.WriteString("\n[gossip]\n")
fmt.Fprintf(&b, "addr = \"%s:%d\"\n", bindAddr, gossipPort)
b.WriteString("bootstrap = [")
for i, p := range sorted {
if i > 0 {
b.WriteString(", ")
}
fmt.Fprintf(&b, "\"%s:%d\"", p, gossipPort)
}
b.WriteString("]\n")
b.WriteString("plaintext = true\n")
b.WriteString("\n[api]\n")
fmt.Fprintf(&b, "addr = \"127.0.0.1:%d\"\n", apiPort)
b.WriteString("\n[admin]\n")
b.WriteString(`path = "/var/run/corrosion/admin.sock"` + "\n")
return []byte(b.String())
}
// CorrosionInstallCommand returns a shell snippet that downloads and installs
// corrosion from the GitHub release for the given version tag.
// Architecture is auto-detected on the remote host via uname -m.
// The version tag is written to /usr/local/bin/corrosion.version after install.
func CorrosionInstallCommand(version string) string {
return fmt.Sprintf(`set -e
ARCH_RAW=$(uname -m)
case "$ARCH_RAW" in
x86_64) ARCH=x86_64-unknown-linux-gnu ;;
aarch64) ARCH=aarch64-unknown-linux-gnu ;;
*) echo "unsupported arch: $ARCH_RAW" >&2; exit 1 ;;
esac
URL="https://github.com/coollabsio/corrosion/releases/download/%s/corrosion-${ARCH}.tar.gz"
DLDIR=$(mktemp -d)
trap 'rm -rf "$DLDIR"' EXIT
curl -fsSL --retry 3 --max-time 120 -o "$DLDIR/corrosion.tar.gz" "$URL"
tar -xzf "$DLDIR/corrosion.tar.gz" -C "$DLDIR"
test -f "$DLDIR/corrosion" || { echo "corrosion binary not found in tarball" >&2; exit 1; }
install -m 0755 "$DLDIR/corrosion" /usr/local/bin/corrosion.tmp
mv /usr/local/bin/corrosion.tmp /usr/local/bin/corrosion
echo '%s' > /usr/local/bin/corrosion.version`, version, version)
}
// CorrosionServiceUnit returns the systemd unit text for corrosion.
// Plain .service (not a template unit); iface is baked into the dependency.
func CorrosionServiceUnit(iface string) string {
return fmt.Sprintf(`[Unit]
Description=Corrosion agent
After=network-online.target wg-quick@%[1]s.service
Wants=network-online.target
Requires=wg-quick@%[1]s.service
[Service]
ExecStart=/usr/local/bin/corrosion agent --config /etc/corrosion/config.toml
Restart=on-failure
RestartSec=2s
StateDirectory=corrosion
WorkingDirectory=/var/lib/corrosion
[Install]
WantedBy=multi-user.target
`, iface)
}
-150
View File
@@ -1,150 +0,0 @@
package services
import (
"crypto/sha256"
"encoding/hex"
"net"
"strings"
"testing"
)
func TestCorrosionInstallCommand_SubstitutesVersion(t *testing.T) {
for _, version := range []string{"nightly", "v1.2.3"} {
cmd := CorrosionInstallCommand(version)
if !strings.Contains(cmd, version) {
t.Errorf("version %q not found in install command", version)
}
if !strings.Contains(cmd, "coollabsio/corrosion/releases/download/"+version) {
t.Errorf("release URL missing version %q in:\n%s", version, cmd)
}
if !strings.Contains(cmd, "/usr/local/bin/corrosion.version") {
t.Errorf("version marker write missing from install command")
}
}
}
func TestCorrosionInstallCommand_ArchDetection(t *testing.T) {
cmd := CorrosionInstallCommand("nightly")
for _, want := range []string{
"x86_64) ARCH=x86_64-unknown-linux-gnu",
"aarch64) ARCH=aarch64-unknown-linux-gnu",
"corrosion-${ARCH}.tar.gz",
"install -m 0755",
} {
if !strings.Contains(cmd, want) {
t.Errorf("expected %q in install command:\n%s", want, cmd)
}
}
}
func TestCorrosionConfigBytes_GoldenThreeHost(t *testing.T) {
self := net.ParseIP("100.64.0.1")
peers := []net.IP{
net.ParseIP("100.64.0.3"),
net.ParseIP("100.64.0.2"), // intentionally unsorted
}
got := CorrosionConfigBytes(self, 8787, 8080, peers)
want := `# generated by coolify init do not edit
[db]
path = "/var/lib/corrosion/corrosion.db"
schema_paths = ["/etc/corrosion/schemas"]
[gossip]
addr = "100.64.0.1:8787"
bootstrap = ["100.64.0.2:8787", "100.64.0.3:8787"]
plaintext = true
[api]
addr = "127.0.0.1:8080"
[admin]
path = "/var/run/corrosion/admin.sock"
`
if string(got) != want {
t.Fatalf("config mismatch.\nWANT:\n%s\nGOT:\n%s", want, got)
}
}
func TestCorrosionConfigBytes_StableHashAcrossOrderings(t *testing.T) {
self := net.ParseIP("100.64.0.1")
peersA := []net.IP{net.ParseIP("100.64.0.2"), net.ParseIP("100.64.0.3")}
peersB := []net.IP{net.ParseIP("100.64.0.3"), net.ParseIP("100.64.0.2")}
a := CorrosionConfigBytes(self, 8787, 8080, peersA)
b := CorrosionConfigBytes(self, 8787, 8080, peersB)
hashA := sha256.Sum256(a)
hashB := sha256.Sum256(b)
if hex.EncodeToString(hashA[:]) != hex.EncodeToString(hashB[:]) {
t.Fatalf("hashes differ across peer orderings (sort broken):\nA=%x\nB=%x", hashA, hashB)
}
}
func TestCorrosionConfigBytes_EmptyPeers(t *testing.T) {
got := string(CorrosionConfigBytes(net.ParseIP("100.64.0.1"), 8787, 8080, nil))
if !strings.Contains(got, `bootstrap = []`) {
t.Fatalf("expected empty bootstrap array, got:\n%s", got)
}
}
func TestCoolifySchema_HasLivenessAndReadinessColumns(t *testing.T) {
for _, want := range []string{
"state TEXT NOT NULL DEFAULT ''",
"health TEXT NOT NULL DEFAULT 'unknown'",
} {
if !strings.Contains(CoolifySchemaSQL, want) {
t.Errorf("schema missing %q:\n%s", want, CoolifySchemaSQL)
}
}
if strings.Contains(CoolifySchemaSQL, "healthy") {
t.Errorf("schema still has removed `healthy` column:\n%s", CoolifySchemaSQL)
}
}
func TestCoolifySchema_HasContainerNameColumn(t *testing.T) {
// container_name is the DNS label coold's resolver queries on. Flat
// scheme: <container_name>.coolify.internal → container_ip. Coolify
// enforces global uniqueness.
want := "container_name TEXT NOT NULL DEFAULT ''"
if !strings.Contains(CoolifySchemaSQL, want) {
t.Errorf("schema missing %q:\n%s", want, CoolifySchemaSQL)
}
}
func TestCoolifySchema_HasNamespaceColumn(t *testing.T) {
// namespace is reserved for future per-app isolation / multi-tenant.
// Empty in single-tenant; populated when Coolify wants app scoping.
want := "namespace TEXT NOT NULL DEFAULT ''"
if !strings.Contains(CoolifySchemaSQL, want) {
t.Errorf("schema missing %q:\n%s", want, CoolifySchemaSQL)
}
}
func TestCoolifySchema_AllNotNullColumnsHaveDefault(t *testing.T) {
// CR-SQLite rejects any NOT NULL column missing a DEFAULT with
// "needs a default value for forward schema compatibility".
for _, line := range strings.Split(CoolifySchemaSQL, "\n") {
trimmed := strings.TrimSpace(line)
if !strings.Contains(trimmed, "NOT NULL") {
continue
}
if !strings.Contains(trimmed, "DEFAULT") {
t.Errorf("line missing DEFAULT (CR-SQLite would reject): %q", trimmed)
}
}
}
func TestCorrosionServiceUnit_ContainsInterface(t *testing.T) {
got := CorrosionServiceUnit("wg0")
for _, want := range []string{
"After=network-online.target wg-quick@wg0.service",
"Requires=wg-quick@wg0.service",
"ExecStart=/usr/local/bin/corrosion agent --config /etc/corrosion/config.toml",
} {
if !strings.Contains(got, want) {
t.Errorf("unit missing %q:\n%s", want, got)
}
}
}
-49
View File
@@ -1,49 +0,0 @@
package services
import (
"crypto/ecdsa"
"crypto/x509"
"encoding/pem"
"fmt"
"time"
"github.com/golang-jwt/jwt/v5"
)
// MintHostJWT creates a 1-year ES256 JWT signed with the EC P-256 private key.
//
// privKeyPEM must be PKCS8 EC PEM (produced by `openssl genpkey -algorithm EC
// -pkeyopt ec_paramgen_curve:P-256`). hostID becomes the `sub` claim; the
// scheduler uses it as the key into its host→stream registry.
//
// caps lists the capabilities this host is authorized to advertise in the
// coold Hello frame. Always includes "coold"; hosts that accept builds also
// carry "builder". The scheduler cross-checks the advertised Hello capability
// set against this claim and rejects streams that try to elevate.
func MintHostJWT(privKeyPEM []byte, hostID string, caps []string) (string, error) {
block, _ := pem.Decode(privKeyPEM)
if block == nil {
return "", fmt.Errorf("no PEM block found in private key")
}
raw, err := x509.ParsePKCS8PrivateKey(block.Bytes)
if err != nil {
return "", fmt.Errorf("parse PKCS8 private key: %w", err)
}
ecKey, ok := raw.(*ecdsa.PrivateKey)
if !ok {
return "", fmt.Errorf("expected EC private key, got %T", raw)
}
if len(caps) == 0 {
caps = []string{"coold"}
}
now := time.Now()
claims := jwt.MapClaims{
"sub": hostID,
"aud": "coold",
"caps": caps,
"iat": now.Unix(),
"exp": now.Add(365 * 24 * time.Hour).Unix(),
}
token := jwt.NewWithClaims(jwt.SigningMethodES256, claims)
return token.SignedString(ecKey)
}
-86
View File
@@ -1,86 +0,0 @@
package services
import "fmt"
// SchedulerGRPCPort is the TCP port scheduler listens on. coold dials this stream
// and carries both coold and builder traffic on the same connection — there
// is no longer a separate listener for builds.
const SchedulerGRPCPort = 6443
// SchedulerJWTPubPath is the on-host path where the scheduler reads the ES256 public key.
const SchedulerJWTPubPath = "/etc/coolify/jwt.pub"
// SchedulerJWTPrivPath is the on-central path for the EC private key (chmod 0600).
const SchedulerJWTPrivPath = "/etc/coolify/jwt.priv"
// HostJWTPath is the on-host path where coold reads its bearer JWT.
const HostJWTPath = "/etc/coolify/host-jwt"
// SchedulerUnixSocketPath is the on-host path of the scheduler's HTTP-over-UDS
// listener. The central-plane caller (Laravel) connects here. Access
// control is filesystem perms — see SchedulerServiceUnit.
const SchedulerUnixSocketPath = "/run/coolify/scheduler.sock"
// SchedulerServiceUnit returns the systemd unit text for scheduler.
//
// grpcBind is "ip:port" for the single gRPC listener (e.g. "100.64.0.1:6443").
// It binds on the central host's wg0 mgmt IP so the listener is unreachable
// outside the mesh.
//
// RuntimeDirectory=coolify creates /run/coolify owned by the scheduler user
// at unit start, which is where the UDS gets bound. Laravel group access
// is configured at deploy time via SCHEDULER_UNIX_SOCKET_GROUP once the
// PHP-FPM group is finalized; until then the socket stays 0600.
func SchedulerServiceUnit(grpcBind, jwtPubPath string) string {
return fmt.Sprintf(`[Unit]
Description=Coolify scheduler
After=network-online.target wg-quick@wg0.service
[Service]
RuntimeDirectory=coolify
RuntimeDirectoryMode=0750
Environment=SCHEDULER_GRPC_BIND=%s
Environment=SCHEDULER_UNIX_SOCKET_PATH=%s
Environment=SCHEDULER_JWT_PUBLIC_KEY_PATH=%s
ExecStart=/usr/local/bin/scheduler
Restart=on-failure
RestartSec=2s
[Install]
WantedBy=multi-user.target
`, grpcBind, SchedulerUnixSocketPath, jwtPubPath)
}
// SchedulerInstallCommand returns a shell snippet that downloads and installs
// scheduler from the GitHub release for the given version tag.
func SchedulerInstallCommand(version string) string {
return fmt.Sprintf(`set -e
ARCH_RAW=$(uname -m)
case "$ARCH_RAW" in
x86_64) ARCH=amd64 ;;
aarch64) ARCH=arm64 ;;
*) echo "unsupported arch: $ARCH_RAW" >&2; exit 1 ;;
esac
URL="https://github.com/coollabsio/coold/releases/download/%s/scheduler-linux-${ARCH}.tar.gz"
DLDIR=$(mktemp -d)
trap 'rm -rf "$DLDIR"' EXIT
curl -fsSL --retry 3 --max-time 120 -o "$DLDIR/scheduler.tar.gz" "$URL"
tar -xzf "$DLDIR/scheduler.tar.gz" -C "$DLDIR"
test -f "$DLDIR/scheduler" || { echo "scheduler binary not found in tarball" >&2; exit 1; }
install -m 0755 "$DLDIR/scheduler" /usr/local/bin/scheduler.tmp
mv /usr/local/bin/scheduler.tmp /usr/local/bin/scheduler
echo '%s' > /usr/local/bin/scheduler.version`, version, version)
}
// EnsureJWTKeypairCommand returns a shell snippet that generates an EC P-256
// keypair in PKCS8 format on the central host (idempotent).
func EnsureJWTKeypairCommand() string {
return `mkdir -p /etc/coolify && ` +
`if [ ! -f ` + SchedulerJWTPrivPath + ` ]; then ` +
`openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-256 ` +
`-out ` + SchedulerJWTPrivPath + `.tmp 2>&1 && ` +
`chmod 0600 ` + SchedulerJWTPrivPath + `.tmp && ` +
`mv ` + SchedulerJWTPrivPath + `.tmp ` + SchedulerJWTPrivPath + ` && ` +
`openssl pkey -in ` + SchedulerJWTPrivPath + ` -pubout -out ` + SchedulerJWTPubPath + ` 2>&1 && ` +
`chmod 0644 ` + SchedulerJWTPubPath + `; fi`
}
-57
View File
@@ -1,57 +0,0 @@
package services
import (
"strings"
"testing"
)
func TestSchedulerInstallCommand_ContainsNewAssetName(t *testing.T) {
cmd := SchedulerInstallCommand("nightly")
for _, want := range []string{
"scheduler-linux-${ARCH}.tar.gz",
"/usr/local/bin/scheduler",
"nightly",
} {
if !strings.Contains(cmd, want) {
t.Errorf("SchedulerInstallCommand missing %q", want)
}
}
if strings.Contains(cmd, "coolify-scheduler") {
t.Error("SchedulerInstallCommand still contains old name 'coolify-scheduler'")
}
}
func TestSchedulerInstallCommand_VersionTagEmbedded(t *testing.T) {
cmd := SchedulerInstallCommand("v1.2.3")
if !strings.Contains(cmd, "v1.2.3") {
t.Error("SchedulerInstallCommand missing version tag in URL and version file write")
}
}
func TestSchedulerServiceUnit_ExecStartPath(t *testing.T) {
unit := SchedulerServiceUnit("100.64.0.1:6443", SchedulerJWTPubPath)
if !strings.Contains(unit, "ExecStart=/usr/local/bin/scheduler") {
t.Error("SchedulerServiceUnit ExecStart does not point to /usr/local/bin/scheduler")
}
if strings.Contains(unit, "coolify-scheduler") {
t.Error("SchedulerServiceUnit still contains old name 'coolify-scheduler'")
}
if strings.Contains(unit, "BUILDER_GRPC_BIND") {
t.Error("SchedulerServiceUnit still emits SCHEDULER_BUILDER_GRPC_BIND; builder port was removed")
}
if strings.Contains(unit, "SCHEDULER_REDIS_URL") || strings.Contains(unit, "redis") {
t.Error("SchedulerServiceUnit still references Redis; UDS migration should have dropped it")
}
for _, want := range []string{
"SCHEDULER_GRPC_BIND=100.64.0.1:6443",
"SCHEDULER_UNIX_SOCKET_PATH=" + SchedulerUnixSocketPath,
"RuntimeDirectory=coolify",
SchedulerJWTPubPath,
} {
if !strings.Contains(unit, want) {
t.Errorf("SchedulerServiceUnit missing %q", want)
}
}
}
-213
View File
@@ -1,213 +0,0 @@
// Package ssh provides a thin SSH client and parallel fanout helper
// for the coolify init mesh-bootstrap commands.
package ssh
import (
"bytes"
"context"
"fmt"
"net"
"os"
"strconv"
"time"
gossh "golang.org/x/crypto/ssh"
)
// Runner executes a shell command on a remote host and returns its
// stdout, stderr, and exit error. It is an interface so tests can
// inject a fake implementation without opening real SSH connections.
type Runner interface {
Run(ctx context.Context, host, user string, port int, cmd string) (stdout, stderr string, err error)
}
// FileUploader streams a local file to a remote path via a single SSH
// session. Kept separate from Runner so existing Runner mocks stay valid.
type FileUploader interface {
UploadFile(ctx context.Context, host, user string, port int, localPath, remotePath string, mode os.FileMode) error
}
// Client implements Runner using the golang.org/x/crypto/ssh library.
// Keys must be unencrypted PEM files.
// NOTE: host-key verification is intentionally disabled in v1 (alpha).
// This is acceptable for a bootstrap tool in controlled environments
// and should be improved in a future release.
type Client struct {
signer gossh.Signer
timeout time.Duration
}
// NewClient loads the private key at keyPath and returns a Client ready to
// SSH into hosts. If passphrase is non-nil it is used to decrypt the key;
// pass nil for unencrypted keys.
func NewClient(keyPath string, passphrase []byte, timeout time.Duration) (*Client, error) {
raw, err := os.ReadFile(keyPath)
if err != nil {
return nil, fmt.Errorf("read SSH key %q: %w", keyPath, err)
}
var signer gossh.Signer
if len(passphrase) > 0 {
signer, err = gossh.ParsePrivateKeyWithPassphrase(raw, passphrase)
} else {
signer, err = gossh.ParsePrivateKey(raw)
}
if err != nil {
// Give the user an actionable hint when the key is passphrase-protected.
if isPassphraseError(err) {
return nil, fmt.Errorf("SSH key %q is passphrase-protected — use --ssh-passphrase-prompt or set COOLIFY_SSH_PASSPHRASE: %w", keyPath, err)
}
return nil, fmt.Errorf("parse SSH key %q: %w", keyPath, err)
}
return &Client{
signer: signer,
timeout: timeout,
}, nil
}
// isPassphraseError returns true when err is the "passphrase protected" error
// returned by golang.org/x/crypto/ssh.
func isPassphraseError(err error) bool {
if err == nil {
return false
}
msg := err.Error()
return contains(msg, "passphrase") || contains(msg, "encrypted")
}
func contains(s, sub string) bool {
return len(sub) > 0 && len(s) >= len(sub) &&
func() bool {
for i := 0; i <= len(s)-len(sub); i++ {
if s[i:i+len(sub)] == sub {
return true
}
}
return false
}()
}
// dial opens an SSH connection to host:port as user and returns it. Caller
// owns Close(). Shared by Run and UploadFile so host-key/timeout behaviour
// stays identical across commands and file transfers.
func (c *Client) dial(ctx context.Context, host, user string, port int) (*gossh.Client, error) {
cfg := &gossh.ClientConfig{
User: user,
Auth: []gossh.AuthMethod{gossh.PublicKeys(c.signer)},
HostKeyCallback: gossh.InsecureIgnoreHostKey(), //nolint:gosec // alpha v1, documented limitation
Timeout: c.timeout,
}
addr := net.JoinHostPort(host, strconv.Itoa(port))
dialer := &net.Dialer{Timeout: c.timeout}
netConn, err := dialer.DialContext(ctx, "tcp", addr)
if err != nil {
return nil, fmt.Errorf("dial %s: %w", addr, err)
}
sshConn, chans, reqs, err := gossh.NewClientConn(netConn, addr, cfg)
if err != nil {
_ = netConn.Close()
return nil, fmt.Errorf("SSH handshake %s: %w", addr, err)
}
return gossh.NewClient(sshConn, chans, reqs), nil
}
// Run connects to host:port over SSH as user, executes cmd, and returns
// the combined stdout, stderr, and any error. The connection is
// closed when the command finishes or ctx is cancelled.
func (c *Client) Run(ctx context.Context, host, user string, port int, cmd string) (string, string, error) {
conn, err := c.dial(ctx, host, user, port)
if err != nil {
return "", "", err
}
defer conn.Close()
addr := net.JoinHostPort(host, strconv.Itoa(port))
sess, err := conn.NewSession()
if err != nil {
return "", "", fmt.Errorf("SSH new session on %s: %w", addr, err)
}
defer sess.Close()
var stdout, stderr bytes.Buffer
sess.Stdout = &stdout
sess.Stderr = &stderr
if err := sess.Start(cmd); err != nil {
return "", "", fmt.Errorf("SSH start on %s: %w", addr, err)
}
waitDone := make(chan error, 1)
go func() { waitDone <- sess.Wait() }()
select {
case <-ctx.Done():
// Best-effort signal; ignore error since we're already cancelled.
_ = sess.Signal(gossh.SIGTERM)
return stdout.String(), stderr.String(), ctx.Err()
case runErr := <-waitDone:
return stdout.String(), stderr.String(), runErr
}
}
// uploadShellCmd returns the remote command that atomically writes stdin
// to remotePath with the given mode. Exposed as a function so it can be
// unit-tested without opening an SSH connection.
func uploadShellCmd(remotePath string, mode os.FileMode) string {
return fmt.Sprintf(
`set -e; umask 077; mkdir -p "$(dirname %q)"; `+
`cat > %q.tmp.$$ && chmod %o %q.tmp.$$ && mv -f %q.tmp.$$ %q`,
remotePath, remotePath, mode.Perm(), remotePath, remotePath, remotePath)
}
// UploadFile streams localPath to remotePath on host via a single SSH
// session. The write is atomic: data lands in <remote>.tmp.$PID first and
// is renamed on success.
func (c *Client) UploadFile(ctx context.Context, host, user string, port int, localPath, remotePath string, mode os.FileMode) error {
f, err := os.Open(localPath)
if err != nil {
return fmt.Errorf("open %s: %w", localPath, err)
}
defer f.Close()
conn, err := c.dial(ctx, host, user, port)
if err != nil {
return err
}
defer conn.Close()
addr := net.JoinHostPort(host, strconv.Itoa(port))
sess, err := conn.NewSession()
if err != nil {
return fmt.Errorf("SSH new session on %s: %w", addr, err)
}
defer sess.Close()
var stderr bytes.Buffer
sess.Stdin = f
sess.Stderr = &stderr
if err := sess.Start(uploadShellCmd(remotePath, mode)); err != nil {
return fmt.Errorf("SSH upload start on %s: %w", addr, err)
}
waitDone := make(chan error, 1)
go func() { waitDone <- sess.Wait() }()
select {
case <-ctx.Done():
_ = sess.Signal(gossh.SIGTERM)
return ctx.Err()
case runErr := <-waitDone:
if runErr != nil {
return fmt.Errorf("upload %s -> %s: %w (stderr: %s)",
localPath, remotePath, runErr, bytes.TrimSpace(stderr.Bytes()))
}
return nil
}
}
-30
View File
@@ -1,30 +0,0 @@
package ssh
import (
"strings"
"testing"
)
func TestUploadShellCmd_AtomicWrite(t *testing.T) {
got := uploadShellCmd("/usr/local/bin/coold", 0o755)
for _, want := range []string{
`mkdir -p "$(dirname "/usr/local/bin/coold")"`,
`cat > "/usr/local/bin/coold".tmp.$$`,
`chmod 755 "/usr/local/bin/coold".tmp.$$`,
`mv -f "/usr/local/bin/coold".tmp.$$ "/usr/local/bin/coold"`,
`umask 077`,
`set -e`,
} {
if !strings.Contains(got, want) {
t.Errorf("upload cmd missing %q:\nGOT: %s", want, got)
}
}
}
func TestUploadShellCmd_ModeIsOctal(t *testing.T) {
got := uploadShellCmd("/x", 0o644)
if !strings.Contains(got, "chmod 644") {
t.Errorf("expected octal mode 644, got: %s", got)
}
}
-55
View File
@@ -1,55 +0,0 @@
package ssh
import (
"context"
"sync"
)
// ServerResult holds the return value (or error) from running a function
// against a single server.
type ServerResult[T any] struct {
Host string
Result T
Err error
}
// ForEachServer runs fn concurrently on every host, honouring the
// concurrency limit. It always returns a result for every host (even on
// error) and never returns early — callers inspect each ServerResult.Err.
func ForEachServer[T any](
ctx context.Context,
hosts []string,
concurrency int,
fn func(ctx context.Context, host string) (T, error),
) []ServerResult[T] {
if concurrency <= 0 {
concurrency = 1
}
results := make([]ServerResult[T], len(hosts))
sem := make(chan struct{}, concurrency)
var wg sync.WaitGroup
for i, host := range hosts {
wg.Add(1)
go func(idx int, h string) {
defer wg.Done()
// Acquire semaphore slot.
select {
case sem <- struct{}{}:
defer func() { <-sem }()
case <-ctx.Done():
var zero T
results[idx] = ServerResult[T]{Host: h, Result: zero, Err: ctx.Err()}
return
}
res, err := fn(ctx, h)
results[idx] = ServerResult[T]{Host: h, Result: res, Err: err}
}(i, host)
}
wg.Wait()
return results
}
+1 -1
View File
@@ -15,7 +15,7 @@ import (
// Version variables injected by GoReleaser at build time via ldflags
var (
version = "v1.6.2"
version = "v1.6.0"
)
// GitHubAPIURL is the URL for fetching CLI version tags (exported for testing)
-804
View File
@@ -1,804 +0,0 @@
package wireguard
import (
"context"
"fmt"
"net"
"os"
"strings"
"github.com/coollabsio/coolify-cli/internal/services"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// ActionResult pairs a PlannedAction with its execution outcome.
type ActionResult struct {
Action PlannedAction
Err error
}
// VerifyResult holds the post-apply verification for one server.
type VerifyResult struct {
Host string
WireGuardIP net.IP
PeerCount int
Active bool
Err error
}
const aptInstallCmd = `DEBIAN_FRONTEND=noninteractive apt-get update -qq 2>/dev/null && ` +
`DEBIAN_FRONTEND=noninteractive apt-get install -y ` +
`-o Dpkg::Options::="--force-confold" ` +
`wireguard wireguard-tools 2>&1`
const podmanInstallCmd = `DEBIAN_FRONTEND=noninteractive apt-get update -qq 2>/dev/null && ` +
`DEBIAN_FRONTEND=noninteractive apt-get install -y ` +
`-o Dpkg::Options::="--force-confold" ` +
`podman 2>&1`
// enablePodmanSocketCmd ensures /run/podman/podman.sock exists via systemd
// socket activation. The socket is NEVER exposed on TCP — it stays a Unix
// socket on the host so the per-host coold agent can bind-mount it and
// proxy a curated REST API over wg0. See CONTROL_PLANE.md §2 + §12.
const enablePodmanSocketCmd = `systemctl enable --now podman.socket 2>&1`
const enableIPForwardCmd = `sysctl -w net.ipv4.ip_forward=1 && ` +
`mkdir -p /etc/sysctl.d && ` +
`echo 'net.ipv4.ip_forward=1' > /etc/sysctl.d/99-coolify-mesh.conf`
// podmanNetCreateCmd creates a per-namespace Podman bridge network. Idempotent:
// skips if the network already exists. The bridge gateway is MachineIP(subnet)
// (the .1 of the subnet).
//
// --disable-dns prevents netavark from starting aardvark-dns on the bridge
// gateway IP:53 — coold owns that socket for cluster-wide service discovery
// (see CONTROL_PLANE.md §5). Labels mark the network as ours + carry its
// namespace so `podman network inspect` drift checks can assert it.
func podmanNetCreateCmd(name, namespace string, subnet *net.IPNet, gateway net.IP) string {
return fmt.Sprintf(
`podman network exists %s 2>/dev/null && echo "network exists, skipping" || `+
`podman network create --driver bridge --disable-dns `+
`--label io.coolify.managed=true --label io.coolify.namespace=%s `+
`--subnet=%s --gateway=%s %s`,
name, namespace, subnet, gateway, name)
}
// podmanNetRecreateCmd drops and recreates a per-namespace Podman bridge
// network to clear drift (dns_enabled=true, subnet mismatch, missing label).
// Uses `rm -f` to detach any attached containers first.
func podmanNetRecreateCmd(name, namespace string, subnet *net.IPNet, gateway net.IP) string {
return fmt.Sprintf(
`podman network rm -f %s 2>&1 && `+
`podman network create --driver bridge --disable-dns `+
`--label io.coolify.managed=true --label io.coolify.namespace=%s `+
`--subnet=%s --gateway=%s %s`,
name, namespace, subnet, gateway, name)
}
// runStep executes a single shell command on a remote host, appends an
// ActionResult to out, and returns an error if the command failed.
func runStep(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
out *[]ActionResult,
atype ActionType,
namespace, cmd, errFmt string,
) error {
stdout, stderr, err := runner.Run(ctx, host, user, port, cmd)
detail := ""
if err != nil {
detail = firstLine(stderr)
if detail == "" {
detail = firstLine(stdout)
}
if detail == "" {
detail = err.Error()
}
}
*out = append(*out, ActionResult{
Action: PlannedAction{Host: host, Namespace: namespace, Type: atype, Detail: detail},
Err: err,
})
if err != nil {
return fmt.Errorf(errFmt+": %w", err)
}
return nil
}
// ApplyMesh executes the mesh convergence in two phases:
//
// - Phase 1 (per-server, parallel): install WG + Podman, generate keypair,
// enable podman socket + IP forwarding.
// - Re-probe to collect fresh public keys.
// - Phase 2 (per-server, parallel): write WG config, enable/reload service,
// create per-namespace Podman networks, install firewall service.
// - Phase 3 (per-server, parallel, optional): download + enable corrosion/coold.
func ApplyMesh(
ctx context.Context,
runner ssh.Runner,
user string,
port int,
desired *DesiredMesh,
current MeshState,
concurrency int,
) ([]ActionResult, error) {
var results []ActionResult
p1 := ssh.ForEachServer(ctx, desired.Hosts, concurrency,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase1Server(ctx, runner, host, user, port, desired, current)
})
phase1Failed := false
for _, r := range p1 {
results = append(results, r.Result...)
if r.Err != nil {
phase1Failed = true
}
}
if phase1Failed {
return results, fmt.Errorf("phase 1 (install/keygen) failed on one or more servers; aborting")
}
fresh, err := Reconstruct(ctx, runner, desired.Hosts, user, port,
desired.Interface, desired.Namespaces, concurrency)
if err != nil {
return results, fmt.Errorf("re-probe after phase 1: %w", err)
}
mgmtAssignments, _, err := AllocateMgmtIPs(desired.MgmtPool, fresh.AssignedMgmtIPs(), desired.Hosts)
if err != nil {
return results, fmt.Errorf("mgmt IP allocation: %w", err)
}
containerAssignments, _, err := AllocateNamespaced(desired.ContainerPool, desired.ContainerPrefix,
fresh.AssignedContainerSubnets(), desired.Namespaces, desired.Hosts)
if err != nil {
return results, fmt.Errorf("container subnet allocation: %w", err)
}
p2 := ssh.ForEachServer(ctx, desired.Hosts, concurrency,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase2Server(ctx, runner, host, user, port, desired, fresh, mgmtAssignments, containerAssignments)
})
for _, r := range p2 {
results = append(results, r.Result...)
if r.Err != nil {
err = fmt.Errorf("phase 2 failed on one or more servers")
}
}
if desired.InstallCoold && err == nil {
p3 := ssh.ForEachServer(ctx, desired.Hosts, concurrency,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase3Server(ctx, runner, host, user, port,
desired, fresh, mgmtAssignments, containerAssignments)
})
for _, r := range p3 {
results = append(results, r.Result...)
if r.Err != nil {
err = fmt.Errorf("phase 3 failed on one or more servers")
}
}
}
// Phase 4: central-only — install scheduler, generate JWT keypair.
if desired.CentralHost != "" && err == nil {
p4 := ssh.ForEachServer(ctx, []string{desired.CentralHost}, 1,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase4Central(ctx, runner, host, user, port, desired, mgmtAssignments)
})
for _, r := range p4 {
results = append(results, r.Result...)
if r.Err != nil {
err = fmt.Errorf("phase 4 (central scheduler setup) failed: %w", r.Err)
}
}
}
// Phase 5: per non-central host — mint JWT (with caps), update coold unit
// with scheduler env (and builder env when EnableBuilder).
if desired.CentralHost != "" && err == nil {
privKeyPEM, _, keyErr := runner.Run(ctx, desired.CentralHost, user, port,
"cat "+services.SchedulerJWTPrivPath)
if keyErr != nil {
err = fmt.Errorf("read jwt.priv from central %s: %w", desired.CentralHost, keyErr)
} else {
centralMgmtIP := mgmtAssignments[desired.CentralHost]
schedulerURL := fmt.Sprintf("http://%s:%d", centralMgmtIP, services.SchedulerGRPCPort)
// Include central itself: in single-server topology central *is* the coold
// target, and in fleet mode central's own coold still benefits from scheduler
// wiring (uniform dispatch path, no standalone-API exception).
p5 := ssh.ForEachServer(ctx, desired.Hosts, concurrency,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase5PerHost(ctx, runner, host, user, port,
desired, fresh, mgmtAssignments, containerAssignments,
[]byte(privKeyPEM), schedulerURL)
})
for _, r := range p5 {
results = append(results, r.Result...)
if r.Err != nil {
err = fmt.Errorf("phase 5 failed on one or more servers")
}
}
}
}
return results, err
}
// phase1Server installs WireGuard, generates a keypair, and (if requested)
// installs Podman, enables its socket, and enables IP forwarding.
func phase1Server(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
current MeshState,
) ([]ActionResult, error) {
state, ok := current.Servers[host]
if !ok {
state = &ServerState{Host: host}
}
var out []ActionResult
if !state.Installed {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallWG, "", aptInstallCmd,
fmt.Sprintf("install WireGuard on %s", host)); err != nil {
return out, err
}
}
if !state.KeysExist {
genCmd := `mkdir -p /etc/wireguard && ` +
`wg genkey | tee /etc/wireguard/privatekey | wg pubkey | tee /etc/wireguard/publickey && ` +
`chmod 600 /etc/wireguard/privatekey`
if err := runStep(ctx, runner, host, user, port, &out,
ActionGenKeyPair, "", genCmd,
fmt.Sprintf("generate keypair on %s", host)); err != nil {
return out, err
}
}
if desired.InstallPodman {
if !state.PodmanInstalled {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallPodman, "", podmanInstallCmd,
fmt.Sprintf("install Podman on %s", host)); err != nil {
return out, err
}
}
if !state.PodmanSocketActive {
if err := runStep(ctx, runner, host, user, port, &out,
ActionEnablePodmanSocket, "", enablePodmanSocketCmd,
fmt.Sprintf("enable podman.socket on %s", host)); err != nil {
return out, err
}
}
if !state.IPForwardEnabled {
if err := runStep(ctx, runner, host, user, port, &out,
ActionEnableIPForward, "", enableIPForwardCmd,
fmt.Sprintf("enable IP forwarding on %s", host)); err != nil {
return out, err
}
}
}
return out, nil
}
// phase2Server writes the WireGuard config, enables/reloads the service,
// creates per-namespace Podman bridges, and installs the firewall service.
func phase2Server(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
fresh MeshState,
mgmtAssignments map[string]net.IP,
containerAssignments map[string]map[string]*net.IPNet,
) ([]ActionResult, error) {
var out []ActionResult
mgmtIP := mgmtAssignments[host]
nsSorted := desired.SortedNamespaces()
// Build peer list (everyone except self, skip hosts with no pubkey).
// Each peer's AllowedIPs covers every namespace subnet that peer owns.
var peers []PeerConfig
for _, peer := range desired.Hosts {
if peer == host {
continue
}
ps, ok := fresh.Servers[peer]
if !ok || ps.PublicKey == "" {
continue
}
var subnets []*net.IPNet
for _, ns := range nsSorted {
if sn := containerAssignments[ns][peer]; sn != nil {
subnets = append(subnets, sn)
}
}
peers = append(peers, PeerConfig{
Endpoint: peer,
PublicKey: ps.PublicKey,
MgmtIP: mgmtAssignments[peer],
ContainerSubnets: subnets,
})
}
// Write WG config.
configCmd := WriteConfigCommand(desired.Interface, mgmtIP, desired.ListenPort, peers)
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteConfig, "", configCmd,
fmt.Sprintf("write config on %s", host)); err != nil {
return out, err
}
// Enable or reload wg-quick.
state := fresh.Servers[host]
var serviceCmd string
actionType := ActionEnableService
if state != nil && state.Active {
serviceCmd = fmt.Sprintf(`systemctl restart wg-quick@%s 2>&1 || wg syncconf %s <(wg-quick strip %s) 2>&1`,
desired.Interface, desired.Interface, desired.Interface)
actionType = ActionReloadService
} else {
serviceCmd = fmt.Sprintf(`systemctl enable --now wg-quick@%s 2>&1`, desired.Interface)
}
if err := runStep(ctx, runner, host, user, port, &out,
actionType, "", serviceCmd,
fmt.Sprintf("enable/reload service on %s", host)); err != nil {
return out, err
}
if desired.InstallPodman {
freshState := fresh.Servers[host]
// Per-namespace podman network reconcile.
for _, ns := range nsSorted {
contSubnet := containerAssignments[ns][host]
if contSubnet == nil {
continue
}
netName := PodmanNetworkFor(ns)
gw := MachineIP(contSubnet)
var nss *NamespaceServerState
if freshState != nil {
nss = freshState.Namespaces[ns]
}
if nss == nil || !nss.NetworkExists {
netCmd := podmanNetCreateCmd(netName, ns, contSubnet, gw)
if err := runStep(ctx, runner, host, user, port, &out,
ActionCreatePodmanNet, ns, netCmd,
fmt.Sprintf("create Podman network %s on %s", netName, host)); err != nil {
return out, err
}
continue
}
subnetDrift := nss.ContainerSubnet != nil && nss.ContainerSubnet.String() != contSubnet.String()
if nss.DNSEnabled || subnetDrift || nss.Label != ns {
recreateCmd := podmanNetRecreateCmd(netName, ns, contSubnet, gw)
if err := runStep(ctx, runner, host, user, port, &out,
ActionRecreatePodmanNet, ns, recreateCmd,
fmt.Sprintf("recreate Podman network %s on %s", netName, host)); err != nil {
return out, err
}
}
}
// Firewall service: union of namespace subnets; reinstall when missing,
// default-deny flipped, or unit text drifted (e.g. namespace added).
var subnets []*net.IPNet
for _, ns := range nsSorted {
if sn := containerAssignments[ns][host]; sn != nil {
subnets = append(subnets, sn)
}
}
expectedUnit := FirewallServiceUnit(desired.Interface, desired.SortedNamespaces(), subnets, desired.DefaultDenyContainers)
expectedUnitHash := sha256Hex([]byte(expectedUnit))
unitDrift := freshState != nil && freshState.FirewallUnitSha256 != expectedUnitHash
if freshState == nil || !freshState.FirewallActive ||
freshState.DefaultDenyActive != desired.DefaultDenyContainers ||
unitDrift {
fwCmd := InstallFirewallCommand(desired.Interface, desired.SortedNamespaces(), subnets, desired.DefaultDenyContainers)
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallFirewall, "", fwCmd,
fmt.Sprintf("install firewall service on %s", host)); err != nil {
return out, err
}
}
}
return out, nil
}
// Verify SSHes into each host and checks that WireGuard is active with the
// expected number of peers.
func Verify(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
port int,
iface string,
concurrency int,
) []VerifyResult {
results := ssh.ForEachServer(ctx, hosts, concurrency,
func(ctx context.Context, host string) (VerifyResult, error) {
return verifyHost(ctx, runner, host, user, port, iface, len(hosts)-1)
})
out := make([]VerifyResult, len(results))
for i, r := range results {
if r.Err != nil {
out[i] = VerifyResult{Host: r.Host, Err: r.Err}
} else {
out[i] = r.Result
}
}
return out
}
func verifyHost(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
iface string,
expectedPeers int,
) (VerifyResult, error) {
result := VerifyResult{Host: host}
stdout, _, err := runner.Run(ctx, host, user, port,
fmt.Sprintf(`wg show %s dump 2>/dev/null || true`, iface))
if err != nil {
return result, fmt.Errorf("wg show on %s: %w", host, err)
}
lines := nonEmptyLines(stdout)
if len(lines) == 0 {
result.Err = fmt.Errorf("interface %s not active", iface)
return result, nil
}
result.Active = true
result.PeerCount = len(lines) - 1
stdout2, _, _ := runner.Run(ctx, host, user, port,
fmt.Sprintf(`grep '^Address' /etc/wireguard/%s.conf 2>/dev/null || true`, iface))
if addr := strings.TrimSpace(strings.TrimPrefix(strings.TrimSpace(stdout2), "Address =")); addr != "" {
ip, _, _ := net.ParseCIDR(strings.TrimSpace(addr))
result.WireGuardIP = ip
}
if result.PeerCount < expectedPeers {
result.Err = fmt.Errorf("expected %d peer(s), got %d", expectedPeers, result.PeerCount)
}
return result, nil
}
func firstLine(s string) string {
s = strings.TrimSpace(s)
if i := strings.IndexByte(s, '\n'); i >= 0 {
return s[:i]
}
return s
}
func nonEmptyLines(s string) []string {
var out []string
for _, l := range strings.Split(s, "\n") {
if strings.TrimSpace(l) != "" {
out = append(out, l)
}
}
return out
}
// heredocWrite emits a shell command that atomically writes body to remotePath
// via a single-quoted heredoc. Body is trusted (generated by us).
// chmod runs before mv so the final rename is atomic with the intended mode.
func heredocWrite(remotePath, body, tag string, mode os.FileMode) string {
return fmt.Sprintf(`cat > %[1]s.tmp <<'%[3]s'
%[2]s%[3]s
chmod %[4]o %[1]s.tmp
mv %[1]s.tmp %[1]s`, remotePath, body, tag, mode)
}
// phase4Central installs scheduler, generates the JWT keypair, and enables
// the scheduler systemd service on the central host.
func phase4Central(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
mgmtAssignments map[string]net.IP,
) ([]ActionResult, error) {
var out []ActionResult
// 1. Install scheduler binary.
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallScheduler, "", services.SchedulerInstallCommand(desired.SchedulerVersion),
fmt.Sprintf("install scheduler on %s", host)); err != nil {
return out, err
}
// 2. Generate JWT keypair (idempotent).
if err := runStep(ctx, runner, host, user, port, &out,
ActionGenerateJWTKeypair, "", services.EnsureJWTKeypairCommand(),
fmt.Sprintf("generate JWT keypair on %s", host)); err != nil {
return out, err
}
// 3. Write scheduler unit + enable service.
mgmtIP := mgmtAssignments[host]
grpcBind := fmt.Sprintf("%s:%d", mgmtIP, services.SchedulerGRPCPort)
schedulerUnit := services.SchedulerServiceUnit(grpcBind, services.SchedulerJWTPubPath)
serviceCmd := heredocWrite("/etc/systemd/system/scheduler.service",
schedulerUnit, "COOLIFY_SCHEDULER_UNIT_EOF", 0o644) +
` && systemctl daemon-reload` +
` && systemctl enable scheduler` +
` && systemctl restart scheduler`
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallSchedulerService, "", serviceCmd,
fmt.Sprintf("install scheduler service on %s", host)); err != nil {
return out, err
}
return out, nil
}
// phase5PerHost mints a host JWT, writes it to the host, rewrites the coold
// unit with scheduler env vars, and restarts coold.
func phase5PerHost(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
_ MeshState,
mgmtAssignments map[string]net.IP,
containerAssignments map[string]map[string]*net.IPNet,
privKeyPEM []byte,
schedulerURL string,
) ([]ActionResult, error) {
var out []ActionResult
mgmtIP := mgmtAssignments[host]
if mgmtIP == nil {
return out, fmt.Errorf("no mgmt IP for %s", host)
}
// Mint JWT with sub = wg0 mgmt IP (stable, scheduler-addressable identifier).
// caps claim must match what coold will advertise in its Hello frame —
// the scheduler cross-checks and rejects a stream whose Hello elevates over
// its JWT. Per-host toggle via desired.HasBuilderCap(host).
hostID := mgmtIP.String()
hasBuilder := desired.HasBuilderCap(host)
caps := []string{"coold"}
if hasBuilder {
caps = append(caps, "builder")
}
jwtToken, err := services.MintHostJWT(privKeyPEM, hostID, caps)
if err != nil {
return out, fmt.Errorf("mint JWT for %s: %w", host, err)
}
// 1. Write JWT to /etc/coolify/host-jwt (mode 0600, idempotent).
writeJWTCmd := fmt.Sprintf(
`mkdir -p /etc/coolify && printf '%%s' '%s' > %s.tmp && chmod 0600 %s.tmp && mv %s.tmp %s`,
jwtToken, services.HostJWTPath, services.HostJWTPath, services.HostJWTPath, services.HostJWTPath)
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteHostJWT, "", writeJWTCmd,
fmt.Sprintf("write host JWT on %s", host)); err != nil {
return out, err
}
// 2. Install builder binary + buildah/git (only on builder-capable hosts).
if hasBuilder {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallBuilder, "",
services.BuilderInstallCommand(desired.CooldVersion),
fmt.Sprintf("install builder on %s", host)); err != nil {
return out, err
}
}
// 3. Rewrite coold unit with scheduler env vars (and builder env when
// enabled) + restart.
nsSorted := desired.SortedNamespaces()
nsConfigs := buildNamespaceConfigs(host, nsSorted, containerAssignments)
scheduler := &services.SchedulerConfig{
URL: schedulerURL,
JWTPath: services.HostJWTPath,
}
var builderCfg *services.BuilderConfig
if hasBuilder {
denyNets := []string{}
if desired.MgmtPool != nil {
denyNets = append(denyNets, desired.MgmtPool.String())
}
if desired.ContainerPool != nil {
denyNets = append(denyNets, desired.ContainerPool.String())
}
builderCfg = &services.BuilderConfig{
Capacity: desired.BuilderCapacity,
CPUQuota: desired.BuilderCPUQuota,
MemoryMax: desired.BuilderMemoryMax,
TimeoutSecs: desired.BuilderTimeoutSecs,
DenyNets: denyNets,
}
}
cooldUnit := services.CooldServiceUnitWithScheduler(mgmtIP, nsConfigs, scheduler, builderCfg)
updateCmd := heredocWrite("/etc/systemd/system/coold.service",
cooldUnit, "COOLIFY_COOLD_SCHEDULER_UNIT_EOF", 0o644) +
` && systemctl daemon-reload` +
` && systemctl restart coold`
if err := runStep(ctx, runner, host, user, port, &out,
ActionUpdateCooldSchedulerEnv, "", updateCmd,
fmt.Sprintf("update coold scheduler env on %s", host)); err != nil {
return out, err
}
return out, nil
}
// phase3Server downloads corrosion + coold from GitHub releases, writes their
// configs/unit files, and enables both services.
// Guarded by desired.InstallCoold at the caller.
func phase3Server(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
fresh MeshState,
mgmtAssignments map[string]net.IP,
containerAssignments map[string]map[string]*net.IPNet,
) ([]ActionResult, error) {
var out []ActionResult
mgmtIP := mgmtAssignments[host]
if mgmtIP == nil {
return out, fmt.Errorf("no mgmt IP allocated for %s", host)
}
nsSorted := desired.SortedNamespaces()
nsConfigs := buildNamespaceConfigs(host, nsSorted, containerAssignments)
if len(nsConfigs) == 0 {
return out, fmt.Errorf("no namespace subnets allocated for %s", host)
}
freshState := fresh.Servers[host]
// 1. Download + install corrosion if version drifted.
if binaryVersionDrift(desired.CorrosionVersion,
freshState != nil && freshState.CorrosionInstalled,
func() string {
if freshState != nil {
return freshState.CorrosionVersion
}
return ""
}()) {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallCorrosion, "",
services.CorrosionInstallCommand(desired.CorrosionVersion),
fmt.Sprintf("install corrosion on %s", host)); err != nil {
return out, err
}
}
// 2. Download + install coold if version drifted.
if binaryVersionDrift(desired.CooldVersion,
freshState != nil && freshState.CooldInstalled,
func() string {
if freshState != nil {
return freshState.CooldVersion
}
return ""
}()) {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallCoold, "",
services.CooldInstallCommand(desired.CooldVersion),
fmt.Sprintf("install coold on %s", host)); err != nil {
return out, err
}
}
// 3. Create dirs for corrosion state/config/admin socket.
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteCorrosionConfig, "",
`mkdir -p /etc/corrosion/schemas /var/lib/corrosion /var/run/corrosion`,
fmt.Sprintf("mkdir corrosion dirs on %s", host)); err != nil {
return out, err
}
// 4. Write config.toml.
peers := peerMgmtIPs(host, desired.Hosts, mgmtAssignments)
configBody := string(services.CorrosionConfigBytes(mgmtIP,
desired.CorrosionGossipPort, desired.CorrosionAPIPort, peers))
configCmd := heredocWrite("/etc/corrosion/config.toml", configBody, "COOLIFY_CORROSION_EOF", 0o600)
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteCorrosionConfig, "", configCmd,
fmt.Sprintf("write corrosion config on %s", host)); err != nil {
return out, err
}
// 5. Write schema. When schema content drifts (not first install) the
// CR-SQLite on-disk DB is incompatible — stop corrosion and wipe the DB
// so it re-bootstraps from the new schema. Coold repopulates within ~2s.
expectedSchemaSha := sha256Hex([]byte(services.CoolifySchemaSQL))
schemaDrift := freshState != nil &&
freshState.CorrosionSchemaSha256 != "" &&
freshState.CorrosionSchemaSha256 != expectedSchemaSha
schemaCmd := heredocWrite("/etc/corrosion/schemas/coolify.sql",
services.CoolifySchemaSQL, "COOLIFY_SCHEMA_EOF", 0o600)
if schemaDrift {
schemaCmd = `systemctl stop corrosion 2>/dev/null || true; ` +
`rm -f /var/lib/corrosion/corrosion.db ` +
`/var/lib/corrosion/corrosion.db-shm ` +
`/var/lib/corrosion/corrosion.db-wal && ` +
schemaCmd
}
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteCorrosionSchema, "", schemaCmd,
fmt.Sprintf("write corrosion schema on %s", host)); err != nil {
return out, err
}
// 6. Write corrosion unit + 7. Write coold unit + 8. daemon-reload + enable.
// Use enable + restart (not enable --now) so an already-active service still
// picks up new unit/config/schema without a separate reload step.
//
// Also ensure the coold API bearer token exists before the unit starts.
// The command is idempotent — reruns keep the existing token so clients
// don't get invalidated on every `apply`.
corrosionUnit := services.CorrosionServiceUnit(desired.Interface)
cooldUnit := services.CooldServiceUnit(mgmtIP, nsConfigs)
serviceCmd := services.EnsureCooldAPITokenCommand() +
" && " +
heredocWrite("/etc/systemd/system/corrosion.service",
corrosionUnit, "COOLIFY_CORROSION_UNIT_EOF", 0o644) +
" && " +
heredocWrite("/etc/systemd/system/coold.service",
cooldUnit, "COOLIFY_COOLD_UNIT_EOF", 0o644) +
` && systemctl daemon-reload` +
` && systemctl enable corrosion coold` +
` && systemctl restart corrosion` +
` && sleep 1` +
` && systemctl restart coold`
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallCorrosionService, "", serviceCmd,
fmt.Sprintf("install corrosion+coold services on %s", host)); err != nil {
return out, err
}
// Append a trailing coold install result so the rendered table matches
// the planned action list (install-coold-service).
out = append(out, ActionResult{
Action: PlannedAction{
Host: host,
Type: ActionInstallCooldService,
Detail: fmt.Sprintf("coold.service (mgmt=%s, namespaces=%d)", mgmtIP, len(nsConfigs)),
},
})
return out, nil
}
-96
View File
@@ -1,96 +0,0 @@
package wireguard
import (
"net"
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
func TestFirstLine(t *testing.T) {
tests := []struct {
input string
want string
}{
{"", ""},
{"single line", "single line"},
{"first\nsecond\nthird", "first"},
{" spaces \nnext", "spaces "},
{"\nleading newline", "leading newline"},
}
for _, tt := range tests {
assert.Equal(t, tt.want, firstLine(tt.input), "input: %q", tt.input)
}
}
func TestPodmanNetCreateCmd_DisablesDNSAndLabels(t *testing.T) {
_, subnet, _ := net.ParseCIDR("10.210.0.0/24")
gw := net.ParseIP("10.210.0.1")
got := podmanNetCreateCmd("coolify-default-mesh", "default", subnet, gw)
// Must pass --disable-dns so aardvark-dns never binds bridge gateway :53
// (coold owns that socket).
assert.Contains(t, got, "--disable-dns", "create must include --disable-dns")
assert.Contains(t, got, "--subnet=10.210.0.0/24")
assert.Contains(t, got, "--gateway=10.210.0.1")
// Labels identify the network as ours + carry the namespace for drift checks.
assert.Contains(t, got, "--label io.coolify.managed=true")
assert.Contains(t, got, "--label io.coolify.namespace=default")
// Idempotency guard must still be present.
assert.Contains(t, got, "podman network exists coolify-default-mesh")
}
func TestPodmanNetRecreateCmd_DropsAndCreatesWithDisableDNS(t *testing.T) {
_, subnet, _ := net.ParseCIDR("10.220.0.0/24")
gw := net.ParseIP("10.220.0.1")
got := podmanNetRecreateCmd("coolify-alpha-mesh", "alpha", subnet, gw)
assert.Contains(t, got, "podman network rm -f coolify-alpha-mesh")
assert.Contains(t, got, "--disable-dns")
assert.Contains(t, got, "--subnet=10.220.0.0/24")
assert.Contains(t, got, "--label io.coolify.namespace=alpha")
// rm must come before create so the ordering is unambiguous.
rmIdx := strings.Index(got, "rm -f")
createIdx := strings.Index(got, "network create")
assert.True(t, rmIdx >= 0 && createIdx > rmIdx, "rm must precede create")
}
func TestHeredocWrite_EmitsChmodBeforeMv(t *testing.T) {
got := heredocWrite("/etc/corrosion/config.toml", "body", "TAG", 0o600)
assert.Contains(t, got, "cat > /etc/corrosion/config.toml.tmp <<'TAG'")
assert.Contains(t, got, "\nbody")
assert.Contains(t, got, "chmod 600 /etc/corrosion/config.toml.tmp")
assert.Contains(t, got, "mv /etc/corrosion/config.toml.tmp /etc/corrosion/config.toml")
chmodIdx := strings.Index(got, "chmod 600")
mvIdx := strings.Index(got, "mv /etc/corrosion")
assert.True(t, chmodIdx > 0 && mvIdx > chmodIdx,
"chmod must precede mv so final rename is atomic with intended mode")
}
func TestHeredocWrite_DifferentModes(t *testing.T) {
unit := heredocWrite("/etc/systemd/system/x.service", "b", "T", 0o644)
assert.Contains(t, unit, "chmod 644 /etc/systemd/system/x.service.tmp")
secret := heredocWrite("/etc/corrosion/schemas/coolify.sql", "b", "T", 0o600)
assert.Contains(t, secret, "chmod 600 /etc/corrosion/schemas/coolify.sql.tmp")
}
func TestNonEmptyLines(t *testing.T) {
tests := []struct {
input string
want []string
}{
{"", nil},
{"line1\nline2", []string{"line1", "line2"}},
{"line1\n\nline2", []string{"line1", "line2"}},
{" \n \nactual", []string{"actual"}},
{"only", []string{"only"}},
}
for _, tt := range tests {
got := nonEmptyLines(tt.input)
assert.Equal(t, tt.want, got, "input: %q", tt.input)
}
}
-95
View File
@@ -1,95 +0,0 @@
package wireguard
import (
"fmt"
"net"
"strings"
)
// PeerConfig holds the information needed to write a [Peer] block.
type PeerConfig struct {
// Endpoint is the SSH/public IP of the peer (used as WG endpoint).
Endpoint string
// PublicKey is the peer's WireGuard public key.
PublicKey string
// MgmtIP is the peer's /32 wg0 management IP.
MgmtIP net.IP
// ContainerSubnets is the peer's per-namespace container bridge subnets,
// sorted by namespace name for stable output. All of them — along with
// MgmtIP/32 — are listed in AllowedIPs so every namespace's cross-host
// traffic can route via the tunnel.
ContainerSubnets []*net.IPNet
}
// allowedIPsLine joins the mgmt /32 and every container subnet into a single
// comma-separated AllowedIPs value.
func allowedIPsLine(p PeerConfig) string {
parts := make([]string, 0, 1+len(p.ContainerSubnets))
parts = append(parts, fmt.Sprintf("%s/32", p.MgmtIP))
for _, sn := range p.ContainerSubnets {
if sn == nil {
continue
}
parts = append(parts, sn.String())
}
return strings.Join(parts, ", ")
}
// RenderConfig returns the content of wg0.conf for one host.
//
// The host's own Address is the management IP /32 (e.g. 100.64.0.0/32). It
// lives in a separate pool from the container subnets, so the Podman bridges
// can own their per-host /24s without conflict.
//
// The literal string __PRIVKEY__ is used as a placeholder; callers must
// substitute the actual key before (or during) writing to disk.
func RenderConfig(mgmtIP net.IP, listenPort int, peers []PeerConfig) string {
var b strings.Builder
fmt.Fprintf(&b, "[Interface]\n")
fmt.Fprintf(&b, "Address = %s/32\n", mgmtIP)
fmt.Fprintf(&b, "ListenPort = %d\n", listenPort)
fmt.Fprintf(&b, "PrivateKey = __PRIVKEY__\n")
for _, p := range peers {
fmt.Fprintf(&b, "\n[Peer]\n")
fmt.Fprintf(&b, "# %s\n", p.Endpoint)
fmt.Fprintf(&b, "PublicKey = %s\n", p.PublicKey)
fmt.Fprintf(&b, "AllowedIPs = %s\n", allowedIPsLine(p))
fmt.Fprintf(&b, "Endpoint = %s:%d\n", p.Endpoint, listenPort)
fmt.Fprintf(&b, "PersistentKeepalive = 25\n")
}
return b.String()
}
// WriteConfigCommand returns the shell command that atomically writes
// /etc/wireguard/<iface>.conf on the remote host.
//
// The private key is read from /etc/wireguard/privatekey on the remote so it
// never traverses SSH. The config is written to a .tmp file first and then
// moved into place so a killed session cannot leave a torn config.
func WriteConfigCommand(iface string, mgmtIP net.IP, listenPort int, peers []PeerConfig) string {
var b strings.Builder
b.WriteString(`PRIVKEY=$(cat /etc/wireguard/privatekey) && `)
b.WriteString(`mkdir -p /etc/wireguard && `)
b.WriteString(`{ echo "[Interface]"; `)
b.WriteString(fmt.Sprintf(`echo "Address = %s/32"; `, mgmtIP))
b.WriteString(fmt.Sprintf(`echo "ListenPort = %d"; `, listenPort))
b.WriteString(`echo "PrivateKey = $PRIVKEY"; `)
for _, p := range peers {
b.WriteString(`echo ""; `)
b.WriteString(`echo "[Peer]"; `)
b.WriteString(fmt.Sprintf(`echo "# %s"; `, p.Endpoint))
b.WriteString(fmt.Sprintf(`echo "PublicKey = %s"; `, p.PublicKey))
b.WriteString(fmt.Sprintf(`echo "AllowedIPs = %s"; `, allowedIPsLine(p)))
b.WriteString(fmt.Sprintf(`echo "Endpoint = %s:%d"; `, p.Endpoint, listenPort))
b.WriteString(`echo "PersistentKeepalive = 25"; `)
}
b.WriteString(fmt.Sprintf(`} > /etc/wireguard/%s.conf.tmp && `, iface))
b.WriteString(fmt.Sprintf(`chmod 600 /etc/wireguard/%s.conf.tmp && `, iface))
b.WriteString(fmt.Sprintf(`mv /etc/wireguard/%s.conf.tmp /etc/wireguard/%s.conf`, iface, iface))
return b.String()
}
-85
View File
@@ -1,85 +0,0 @@
package wireguard
import (
"net"
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
func TestRenderConfig_NoPeers(t *testing.T) {
mgmtIP := net.ParseIP("100.64.0.1").To4()
got := RenderConfig(mgmtIP, 51820, nil)
assert.Contains(t, got, "[Interface]")
assert.Contains(t, got, "Address = 100.64.0.1/32")
assert.Contains(t, got, "ListenPort = 51820")
assert.Contains(t, got, "PrivateKey = __PRIVKEY__")
assert.NotContains(t, got, "[Peer]")
}
func TestRenderConfig_WithPeers(t *testing.T) {
mgmtIP := net.ParseIP("100.64.0.1").To4()
peers := []PeerConfig{
{
Endpoint: "203.0.113.11",
PublicKey: "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=",
MgmtIP: net.ParseIP("100.64.0.1").To4(),
ContainerSubnets: []*net.IPNet{mustParseCIDR("10.210.1.0/24")},
},
{
Endpoint: "203.0.113.12",
PublicKey: "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=",
MgmtIP: net.ParseIP("100.64.0.2").To4(),
ContainerSubnets: []*net.IPNet{
mustParseCIDR("10.210.2.0/24"),
mustParseCIDR("10.220.2.0/24"),
},
},
}
got := RenderConfig(mgmtIP, 51820, peers)
assert.Equal(t, 2, strings.Count(got, "[Peer]"))
assert.Contains(t, got, "PublicKey = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=")
assert.Contains(t, got, "Endpoint = 203.0.113.11:51820")
assert.Contains(t, got, "AllowedIPs = 100.64.0.1/32, 10.210.1.0/24")
assert.Contains(t, got, "PersistentKeepalive = 25")
assert.Contains(t, got, "PublicKey = CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=")
// Multi-namespace peer lists every namespace subnet after the mgmt /32.
assert.Contains(t, got, "AllowedIPs = 100.64.0.2/32, 10.210.2.0/24, 10.220.2.0/24")
}
func TestWriteConfigCommand_ContainsPrivkeyRead(t *testing.T) {
mgmtIP := net.ParseIP("100.64.0.1").To4()
peers := []PeerConfig{
{
Endpoint: "203.0.113.11",
PublicKey: "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=",
MgmtIP: net.ParseIP("100.64.0.1").To4(),
ContainerSubnets: []*net.IPNet{mustParseCIDR("10.210.1.0/24")},
},
}
cmd := WriteConfigCommand("wg0", mgmtIP, 51820, peers)
assert.Contains(t, cmd, "cat /etc/wireguard/privatekey")
assert.Contains(t, cmd, "$PRIVKEY")
assert.Contains(t, cmd, ".conf.tmp")
assert.Contains(t, cmd, "chmod 600 /etc/wireguard/wg0.conf.tmp")
assert.Contains(t, cmd, "mv /etc/wireguard/wg0.conf.tmp /etc/wireguard/wg0.conf")
// Host Address is the mgmt /32 — outside the container pool.
assert.Contains(t, cmd, "Address = 100.64.0.1/32")
// Peer AllowedIPs lists peer mgmt /32 + peer container /24.
assert.Contains(t, cmd, "100.64.0.1/32, 10.210.1.0/24")
}
func TestWriteConfigCommand_NoPeers(t *testing.T) {
mgmtIP := net.ParseIP("100.64.0.1").To4()
cmd := WriteConfigCommand("wg0", mgmtIP, 51820, nil)
assert.Contains(t, cmd, "PRIVKEY")
assert.Contains(t, cmd, "51820")
assert.NotContains(t, cmd, "[Peer]")
}
-224
View File
@@ -1,224 +0,0 @@
package wireguard
import (
"fmt"
"net"
"sort"
"strings"
)
const firewallUnitPath = "/etc/systemd/system/coolify-mesh-fw.service"
const firewallServiceName = "coolify-mesh-fw.service"
// AllowRulesPath is the on-disk location where coold snapshots the
// COOLIFY-ALLOW chain as an iptables-restore fragment on every rule mutate.
// The firewall unit reads this file at boot/restart to repopulate the chain
// after the kernel tables are cleared.
const AllowRulesPath = "/etc/coolify/allow.rules"
// BridgeTableName is the nftables table name owned by the CLI scaffold.
const BridgeTableName = "coolify_bridge"
// BridgeAllowRulesPath is where coold writes the nft bridge-family allow
// fragment. The firewall unit replays it at start/restart.
const BridgeAllowRulesPath = "/etc/coolify/allow.nft"
// BridgeScaffoldPath is where the CLI writes the static bridge chain
// scaffold (forward + coolify_intra chains). Applied at unit start/restart.
const BridgeScaffoldPath = "/etc/coolify/bridge-fw.nft"
// FirewallServiceUnit returns the systemd unit text that installs the
// idempotent iptables rules required for cross-host container traffic over WG.
//
// containerSubnets is the per-namespace list of subnets on this host (one
// /<prefix> per namespace). Rules are emitted once per subnet so every
// namespace is covered by the same host-global COOLIFY-INTRA / COOLIFY-ALLOW
// chain pair.
//
// Two modes:
//
// - defaultDeny == false (mode A, blanket allow): installs FORWARD ACCEPT
// rules for every subnet. Tears down any default-deny scaffold left over
// from a prior --default-deny run.
//
// - defaultDeny == true (mode B, default deny): removes blanket ACCEPT,
// installs COOLIFY-INTRA + COOLIFY-ALLOW chains, and adds FORWARD jumps
// so any traffic with a container subnet as source OR destination
// traverses the deny chain. Conntrack ESTABLISHED/RELATED is accepted
// early so reply traffic for already-allowed flows bypasses the chain.
//
// Note: default-deny only enforces CROSS-HOST container traffic. Same-
// namespace intra-host traffic stays at L2 and bypasses iptables; cross-
// namespace intra-host traffic is blocked at L2 anyway because each namespace
// has its own podman bridge.
//
// Both modes preserve the POSTROUTING RETURN rule that prevents podman's
// MASQUERADE from rewriting container egress to wg0's IP.
func FirewallServiceUnit(iface string, namespaces []string, containerSubnets []*net.IPNet, defaultDeny bool) string {
var b strings.Builder
fmt.Fprintf(&b, `[Unit]
Description=Coolify mesh firewall rules
After=wg-quick@%[1]s.service network-online.target
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
`, iface)
// POSTROUTING RETURN — needed in both modes, once per subnet.
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -t nat -C POSTROUTING -s %[2]s -o %[1]s -j RETURN 2>/dev/null || /usr/sbin/iptables -t nat -I POSTROUTING -s %[2]s -o %[1]s -j RETURN"
`, iface, sn.String())
}
if !defaultDeny {
fmt.Fprint(&b, `# Tear down default-deny scaffold from prior --default-deny run.
`)
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -d %[1]s -j COOLIFY-INTRA 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -s %[1]s -j COOLIFY-INTRA 2>/dev/null || true"
`, sn.String())
}
fmt.Fprintf(&b, `ExecStart=/bin/sh -c "/usr/sbin/iptables -F COOLIFY-INTRA 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -X COOLIFY-INTRA 2>/dev/null || true"
# COOLIFY-ALLOW intentionally NOT removed preserves runtime allows for re-enable.
# Remove bridge-family scaffold (permissive mode) before installing blanket ACCEPT.
ExecStart=/bin/sh -c "nft delete table bridge %[1]s 2>/dev/null || true"
# Blanket ACCEPT allow all traffic to/from every namespace's container subnet.
`, BridgeTableName)
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -s %[1]s -j ACCEPT 2>/dev/null || /usr/sbin/iptables -I FORWARD -s %[1]s -j ACCEPT"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -d %[1]s -j ACCEPT 2>/dev/null || /usr/sbin/iptables -I FORWARD -d %[1]s -j ACCEPT"
`, sn.String())
}
} else {
fmt.Fprint(&b, `# Remove blanket ACCEPT from prior mode-A run.
`)
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -s %[1]s -j ACCEPT 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -d %[1]s -j ACCEPT 2>/dev/null || true"
`, sn.String())
}
fmt.Fprintf(&b, `
# Create chains (idempotent).
ExecStart=/bin/sh -c "/usr/sbin/iptables -N COOLIFY-ALLOW 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -N COOLIFY-INTRA 2>/dev/null || true"
# Flush COOLIFY-INTRA so order is deterministic on every restart.
ExecStart=/usr/sbin/iptables -F COOLIFY-INTRA
ExecStart=/usr/sbin/iptables -A COOLIFY-INTRA -j COOLIFY-ALLOW
ExecStart=/usr/sbin/iptables -A COOLIFY-INTRA -j DROP
# Repopulate COOLIFY-ALLOW from coold's canonical snapshot. File is rewritten
# by coold on every rule mutate, so it is the source of truth across reboots
# and service restarts. Flush first because 'iptables-restore --noflush'
# leaves existing chain contents in place and would otherwise duplicate every
# rule on re-run.
ExecStart=/bin/sh -c "[ -s %[1]s ] && /usr/sbin/iptables -F COOLIFY-ALLOW && /usr/sbin/iptables-restore --noflush < %[1]s || true"
# Conntrack early-accept at top of FORWARD (idempotent).
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT 2>/dev/null || /usr/sbin/iptables -I FORWARD 1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT"
# Top-level FORWARD jumps for every namespace's subnet (both directions).
`, AllowRulesPath)
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -d %[1]s -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -d %[1]s -j COOLIFY-INTRA"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -s %[1]s -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -s %[1]s -j COOLIFY-INTRA"
`, sn.String())
}
fmt.Fprintf(&b, `# Bridge-family nft scaffold intra-namespace default-deny.
ExecStart=/bin/sh -c "nft list table bridge %[1]s >/dev/null 2>&1 || nft add table bridge %[1]s"
ExecStart=/bin/sh -c "nft add chain bridge %[1]s coolify_allow '{ }' 2>/dev/null || true"
ExecStart=/bin/sh -c "nft delete chain bridge %[1]s forward 2>/dev/null || true"
ExecStart=/bin/sh -c "nft delete chain bridge %[1]s coolify_intra 2>/dev/null || true"
ExecStart=/bin/sh -c "nft -f %[2]s"
ExecStart=/bin/sh -c "[ -s %[3]s ] && nft -f %[3]s || true"
`, BridgeTableName, BridgeScaffoldPath, BridgeAllowRulesPath)
}
_ = namespaces // kept on signature for future per-namespace dispatch; scaffold now keys off subnets (bridge ifnames exceed IFNAMSIZ=16).
b.WriteString(`
[Install]
WantedBy=multi-user.target
`)
return b.String()
}
// InstallFirewallCommand returns a shell command that atomically writes the
// service unit, reloads systemd, and enables/starts (or restarts) it.
func InstallFirewallCommand(iface string, namespaces []string, containerSubnets []*net.IPNet, defaultDeny bool) string {
unit := FirewallServiceUnit(iface, namespaces, containerSubnets, defaultDeny)
var b strings.Builder
b.WriteString(fmt.Sprintf(`cat > %s.tmp <<'COOLIFY_FW_EOF'
%sCOOLIFY_FW_EOF
mv %s.tmp %s && `, firewallUnitPath, unit, firewallUnitPath, firewallUnitPath))
// /etc/coolify may not exist yet on a fresh host (coold's token-gen is the
// only other writer and runs later in phase 2). Create it before the
// bridge scaffold write so `cat > .tmp` doesn't ENOENT.
b.WriteString("mkdir -p /etc/coolify && ")
if defaultDeny {
scaffold := renderBridgeScaffold(containerSubnets)
b.WriteString(fmt.Sprintf(`cat > %s.tmp <<'COOLIFY_BR_EOF'
%sCOOLIFY_BR_EOF
mv %s.tmp %s && `, BridgeScaffoldPath, scaffold, BridgeScaffoldPath, BridgeScaffoldPath))
} else {
b.WriteString(fmt.Sprintf("rm -f %s && ", BridgeScaffoldPath))
}
b.WriteString(`systemctl daemon-reload && `)
// Use restart so a flag flip re-runs ExecStart= even if the unit is
// already active (Type=oneshot with RemainAfterExit=yes blocks plain
// "start" from running again).
b.WriteString(fmt.Sprintf(`systemctl enable %s && systemctl restart %s`, firewallServiceName, firewallServiceName))
return b.String()
}
// renderBridgeScaffold builds the nft file-format content for the bridge
// scaffold. Uses `add table` + `add chain` (idempotent) then `flush chain` +
// `add rule` so forward and coolify_intra are atomically replaced on every
// apply without touching coolify_allow (owned by coold).
//
// Dispatch to coolify_intra is keyed on container subnet (ip saddr / ip daddr)
// rather than bridge interface name — podman auto-names bridges (e.g. podman2)
// and the CLI-level "coolify-<ns>-mesh" network name exceeds Linux IFNAMSIZ=16
// when the kernel sees it anyway. Subnets are disjoint per namespace so this
// still confines deny to coolify-managed traffic and leaves foreign bridges
// untouched.
func renderBridgeScaffold(subnets []*net.IPNet) string {
sortedSubnets := make([]string, 0, len(subnets))
for _, sn := range subnets {
sortedSubnets = append(sortedSubnets, sn.String())
}
sort.Strings(sortedSubnets)
subnetSet := "{ " + strings.Join(sortedSubnets, ", ") + " }"
var b strings.Builder
b.WriteString("# Managed by coolify init — do not edit manually.\n")
b.WriteString("# Replaces forward + coolify_intra chains on restart; never touches coolify_allow.\n")
// Order matters: chains referenced by `jump` must exist before the rule
// is added (nft validates the target at add-rule time). coolify_allow is
// created by the preceding ExecStart line; declare coolify_intra here
// before the forward-chain rules jump to it.
fmt.Fprintf(&b, "add table bridge %s\n", BridgeTableName)
fmt.Fprintf(&b, "add chain bridge %s coolify_intra\n", BridgeTableName)
fmt.Fprintf(&b, "flush chain bridge %s coolify_intra\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s coolify_intra jump coolify_allow\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s coolify_intra drop\n", BridgeTableName)
fmt.Fprintf(&b, "add chain bridge %s forward { type filter hook forward priority -200; policy accept; }\n", BridgeTableName)
fmt.Fprintf(&b, "flush chain bridge %s forward\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s forward meta protocol != ip accept\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s forward ct state established,related accept\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s forward ip saddr %s jump coolify_intra\n", BridgeTableName, subnetSet)
fmt.Fprintf(&b, "add rule bridge %s forward ip daddr %s jump coolify_intra\n", BridgeTableName, subnetSet)
return b.String()
}
-228
View File
@@ -1,228 +0,0 @@
package wireguard
import (
"net"
"os"
"path/filepath"
"strings"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestFirewallServiceUnit_DefaultDenyOff(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, false)
assert.Contains(t, got, "[Unit]")
assert.Contains(t, got, "Description=Coolify mesh firewall rules")
assert.Contains(t, got, "After=wg-quick@wg0.service")
assert.Contains(t, got, "Type=oneshot")
assert.Contains(t, got, "RemainAfterExit=yes")
// Blanket allow rules present.
assert.Contains(t, got, "/usr/sbin/iptables -I FORWARD -s 10.210.0.0/24 -j ACCEPT")
assert.Contains(t, got, "/usr/sbin/iptables -I FORWARD -d 10.210.0.0/24 -j ACCEPT")
// Teardown of default-deny scaffold present (idempotent cleanup).
assert.Contains(t, got, "/usr/sbin/iptables -X COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -D FORWARD -s 10.210.0.0/24 -j COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -D FORWARD -d 10.210.0.0/24 -j COOLIFY-INTRA")
// Default-deny chain rules MUST NOT be present.
assert.NotContains(t, got, "-A COOLIFY-INTRA -j COOLIFY-ALLOW")
assert.NotContains(t, got, "-A COOLIFY-INTRA -j DROP")
// COOLIFY-ALLOW chain is never destroyed.
assert.NotContains(t, got, "-X COOLIFY-ALLOW")
// POSTROUTING RETURN preserved (needed in both modes).
assert.Contains(t, got, "/usr/sbin/iptables -t nat -I POSTROUTING -s 10.210.0.0/24 -o wg0 -j RETURN")
assert.Contains(t, got, "WantedBy=multi-user.target")
}
func TestFirewallServiceUnit_DefaultDenyOn(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, true)
// Chains created.
assert.Contains(t, got, "/usr/sbin/iptables -N COOLIFY-ALLOW")
assert.Contains(t, got, "/usr/sbin/iptables -N COOLIFY-INTRA")
// Conntrack early-accept.
assert.Contains(t, got, "-m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT")
// COOLIFY-INTRA flush + jump to ALLOW + DROP.
assert.Contains(t, got, "/usr/sbin/iptables -F COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -A COOLIFY-INTRA -j COOLIFY-ALLOW")
assert.Contains(t, got, "/usr/sbin/iptables -A COOLIFY-INTRA -j DROP")
// FORWARD jumps for both directions of container subnet traffic.
assert.Contains(t, got, "/usr/sbin/iptables -A FORWARD -d 10.210.0.0/24 -j COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -A FORWARD -s 10.210.0.0/24 -j COOLIFY-INTRA")
// Teardown of blanket ACCEPT from prior mode-A run.
assert.Contains(t, got, "/usr/sbin/iptables -D FORWARD -s 10.210.0.0/24 -j ACCEPT")
assert.Contains(t, got, "/usr/sbin/iptables -D FORWARD -d 10.210.0.0/24 -j ACCEPT")
// Blanket ACCEPT rules MUST NOT be installed in default-deny mode.
assert.NotContains(t, got, "/usr/sbin/iptables -I FORWARD -s 10.210.0.0/24 -j ACCEPT")
assert.NotContains(t, got, "/usr/sbin/iptables -I FORWARD -d 10.210.0.0/24 -j ACCEPT")
// COOLIFY-ALLOW chain is never destroyed. It IS flushed-and-restored at
// boot/restart from the canonical snapshot — that's how runtime allow
// rules survive reboots.
assert.NotContains(t, got, "-X COOLIFY-ALLOW")
assert.Contains(t, got, "/usr/sbin/iptables -F COOLIFY-ALLOW")
assert.Contains(t, got, "/usr/sbin/iptables-restore --noflush < "+AllowRulesPath)
assert.Contains(t, got, "[ -s "+AllowRulesPath+" ]")
// POSTROUTING RETURN preserved.
assert.Contains(t, got, "/usr/sbin/iptables -t nat -I POSTROUTING -s 10.210.0.0/24 -o wg0 -j RETURN")
}
func TestFirewallServiceUnit_DefaultDenyOff_NoAllowRestore(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, false)
// Blanket-allow mode bypasses COOLIFY-ALLOW entirely — no restore.
assert.NotContains(t, got, "iptables-restore")
assert.NotContains(t, got, AllowRulesPath)
}
func TestInstallFirewallCommand_AtomicWriteAndEnable(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.5.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, false)
// Atomic write via .tmp + mv.
assert.Contains(t, cmd, "/etc/systemd/system/coolify-mesh-fw.service.tmp")
assert.Contains(t, cmd, "mv /etc/systemd/system/coolify-mesh-fw.service.tmp /etc/systemd/system/coolify-mesh-fw.service")
// systemd reload + enable + restart (so a flag flip re-runs ExecStart).
assert.Contains(t, cmd, "systemctl daemon-reload")
assert.Contains(t, cmd, "systemctl enable coolify-mesh-fw.service")
assert.Contains(t, cmd, "systemctl restart coolify-mesh-fw.service")
// Subnet baked into command.
assert.Contains(t, cmd, "10.210.5.0/24")
}
func TestInstallFirewallCommand_DefaultDenyEmbedded(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.5.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, true)
// Default-deny variant of unit must be embedded in the heredoc.
assert.Contains(t, cmd, "-A COOLIFY-INTRA -j DROP")
}
func TestFirewallServiceUnit_BridgeScaffold_DefaultDenyOn(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, true)
assert.Contains(t, got, "nft list table bridge coolify_bridge")
assert.Contains(t, got, "nft add table bridge coolify_bridge")
assert.Contains(t, got, "nft add chain bridge coolify_bridge coolify_allow")
assert.Contains(t, got, "nft delete chain bridge coolify_bridge forward")
assert.Contains(t, got, "nft delete chain bridge coolify_bridge coolify_intra")
assert.Contains(t, got, "nft -f /etc/coolify/bridge-fw.nft")
assert.Contains(t, got, "/etc/coolify/allow.nft")
assert.NotContains(t, got, "-X COOLIFY-ALLOW")
}
func TestFirewallServiceUnit_BridgeScaffold_DefaultDenyOff(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, false)
assert.Contains(t, got, "nft delete table bridge coolify_bridge")
assert.NotContains(t, got, "nft add table bridge coolify_bridge")
assert.NotContains(t, got, "nft -f /etc/coolify/bridge-fw.nft")
}
func TestFirewallServiceUnit_BridgeSetStableSortedSubnets(t *testing.T) {
// Pass subnets in reverse-sorted order — scaffold must sort them.
subnets := []*net.IPNet{
mustParseCIDR("10.220.1.0/24"),
mustParseCIDR("10.210.1.0/24"),
}
// renderBridgeScaffold is embedded in InstallFirewallCommand, so check that.
cmd := InstallFirewallCommand("wg0", []string{"alpha", "default"}, subnets, true)
// Assert the nft scaffold set contains both, sorted:
// `ip saddr { 10.210.1.0/24, 10.220.1.0/24 } jump coolify_intra`
assert.Contains(t, cmd, "ip saddr { 10.210.1.0/24, 10.220.1.0/24 } jump coolify_intra")
assert.Contains(t, cmd, "ip daddr { 10.210.1.0/24, 10.220.1.0/24 } jump coolify_intra")
}
func TestFirewallServiceUnit_BridgeScaffold_UsesIPSaddrNotIifname(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, true)
// Podman bridge names exceed IFNAMSIZ=16 (e.g. "coolify-default-mesh" = 20
// chars). Scaffold MUST key dispatch on ip saddr/daddr, never iifname.
assert.Contains(t, cmd, "ip saddr")
assert.Contains(t, cmd, "ip daddr")
assert.NotContains(t, cmd, "iifname")
assert.NotContains(t, cmd, "oifname")
assert.NotContains(t, cmd, "coolify-default-mesh\"")
}
func TestInstallFirewallCommand_WritesBridgeScaffoldFile(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, true)
assert.Contains(t, cmd, "/etc/coolify/bridge-fw.nft")
assert.Contains(t, cmd, "COOLIFY_BR_EOF")
assert.Contains(t, cmd, "bridge-fw.nft.tmp")
// /etc/coolify must be created before bridge-fw.nft.tmp is written —
// without it, `cat > .tmp` fails on fresh hosts.
mkdirIdx := strings.Index(cmd, "mkdir -p /etc/coolify")
tmpIdx := strings.Index(cmd, "bridge-fw.nft.tmp")
assert.GreaterOrEqual(t, mkdirIdx, 0, "mkdir -p /etc/coolify must be present")
assert.Less(t, mkdirIdx, tmpIdx, "mkdir must run before bridge-fw.nft.tmp write")
}
func TestInstallFirewallCommand_DefaultDenyOff_RemovesBridgeScaffold(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, false)
assert.Contains(t, cmd, "rm -f /etc/coolify/bridge-fw.nft")
assert.NotContains(t, cmd, "COOLIFY_BR_EOF")
}
func TestFirewallServiceUnit_GoldenFixture_TwoNamespaces(t *testing.T) {
subnets := []*net.IPNet{
mustParseCIDR("10.210.0.0/24"),
mustParseCIDR("10.220.0.0/24"),
}
got := FirewallServiceUnit("wg0", []string{"alpha", "default"}, subnets, true)
fixturePath := filepath.Join("..", "..", "test", "fixtures", "firewall_unit_deny_two_ns.txt")
if os.Getenv("UPDATE_GOLDEN") == "1" {
err := os.WriteFile(fixturePath, []byte(got), 0o600)
require.NoError(t, err, "failed to write golden fixture")
t.Logf("golden fixture updated: %s", fixturePath)
return
}
b, err := os.ReadFile(fixturePath)
require.NoError(t, err, "golden fixture missing — run with UPDATE_GOLDEN=1 to create it")
assert.Equal(t, string(b), got)
}
func TestFirewallServiceUnit_MultipleNamespacesEmitPerSubnetRules(t *testing.T) {
subnets := []*net.IPNet{
mustParseCIDR("10.210.1.0/24"),
mustParseCIDR("10.220.1.0/24"),
}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, true)
// Each namespace subnet gets its own POSTROUTING RETURN + FORWARD jumps.
for _, sub := range []string{"10.210.1.0/24", "10.220.1.0/24"} {
assert.Contains(t, got, "/usr/sbin/iptables -t nat -I POSTROUTING -s "+sub+" -o wg0 -j RETURN")
assert.Contains(t, got, "/usr/sbin/iptables -A FORWARD -d "+sub+" -j COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -A FORWARD -s "+sub+" -j COOLIFY-INTRA")
}
}
-214
View File
@@ -1,214 +0,0 @@
package wireguard
import (
"fmt"
"strings"
)
// actionCategory classifies every ActionType so the intent filter can decide
// whether to allow, skip, or block it per host.
type actionCategory int
const (
// catSafeAlways: pure add/first-time install. Idempotent, no runtime
// disruption on re-run. Included in every intent.
catSafeAlways actionCategory = iota
// catPeerRefresh: rewrites a config or restarts a service as part of
// keeping peer/namespace state in sync. Idempotent, short service blip
// at worst. Allowed in every intent (extend needs it on existing hosts
// to pick up the new peer's AllowedIPs; upgrade needs it for the
// post-install service restart).
catPeerRefresh
// catDestructiveReplace: recreates a resource that may currently be in
// use (running containers on a podman bridge). Blocked on existing
// hosts in extend mode unless --allow-replace is set. Always blocked
// in upgrade mode.
catDestructiveReplace
// catVersionBump: re-downloads an agent binary (coold, corrosion,
// scheduler, builder). Runs on new hosts in extend mode (first install)
// but not on existing hosts. Always allowed in upgrade mode.
catVersionBump
// catWipeDB: special-case for ActionWriteCorrosionSchema when the
// schema drift branch fires (pre-existing sqlite DB gets deleted).
// Only allowed in bootstrap mode and on brand-new hosts in extend
// mode. Never allowed on existing hosts, even with --allow-replace.
catWipeDB
// catCorrosionSchemaFirstWrite: ActionWriteCorrosionSchema when no
// prior schema is present (CorrosionSchemaSha256 is empty). Safe
// everywhere because nothing gets wiped.
catCorrosionSchemaFirstWrite
)
// categorize returns the category for a planned action. The schema action is
// looked up contextually (plan fills Detail with "[schema drift — DB will be
// reset]" when the wipe branch applies).
func categorize(a PlannedAction) actionCategory {
switch a.Type {
case ActionInstallWG,
ActionGenKeyPair,
ActionAllocateMgmtIP,
ActionAllocateContainerSubnet,
ActionEnableService,
ActionInstallPodman,
ActionEnablePodmanSocket,
ActionEnableIPForward,
ActionCreatePodmanNet,
ActionGenerateJWTKeypair,
ActionAddPeer,
ActionRemovePeer:
return catSafeAlways
case ActionWriteConfig,
ActionReloadService,
ActionInstallFirewall,
ActionWriteCorrosionConfig,
ActionInstallCorrosionService,
ActionInstallCooldService,
ActionInstallSchedulerService,
ActionWriteHostJWT,
ActionUpdateCooldSchedulerEnv:
return catPeerRefresh
case ActionRecreatePodmanNet:
return catDestructiveReplace
case ActionInstallCorrosion,
ActionInstallCoold,
ActionInstallScheduler,
ActionInstallBuilder:
return catVersionBump
case ActionWriteCorrosionSchema:
if strings.Contains(a.Detail, "DB will be reset") {
return catWipeDB
}
return catCorrosionSchemaFirstWrite
}
return catSafeAlways
}
// ValidateIntent enforces pre-plan invariants the filter itself can't express.
func ValidateIntent(d *DesiredMesh) error {
switch d.Intent {
case IntentBootstrap:
return nil
case IntentExtend:
if len(d.NewHosts) == 0 {
return fmt.Errorf("extend mode requires at least one host in NewHosts")
}
hostSet := make(map[string]struct{}, len(d.Hosts))
for _, h := range d.Hosts {
hostSet[h] = struct{}{}
}
for _, nh := range d.NewHosts {
if _, ok := hostSet[nh]; !ok {
return fmt.Errorf("extend mode: new host %q not in --servers list", nh)
}
}
return nil
case IntentUpgrade:
if !d.AllowNightly {
for _, pair := range [][2]string{
{"--coold-version", d.CooldVersion},
{"--corrosion-version", d.CorrosionVersion},
{"--scheduler-version", d.SchedulerVersion},
} {
if pair[1] == "nightly" {
return fmt.Errorf(
"upgrade mode rejects %s=nightly (moving target forces re-install every run); pin a version or pass --allow-nightly",
pair[0],
)
}
}
}
return nil
default:
return fmt.Errorf("unknown intent %q", d.Intent)
}
}
// filterByIntent mutates plan.Actions in place, moving blocked/skipped actions
// into plan.Skipped with a reason. For IntentBootstrap (default) it is a no-op.
func filterByIntent(plan *Plan, d *DesiredMesh) {
if d.Intent == IntentBootstrap {
return
}
newHostSet := make(map[string]struct{}, len(d.NewHosts))
for _, h := range d.NewHosts {
newHostSet[h] = struct{}{}
}
kept := plan.Actions[:0]
for _, a := range plan.Actions {
reason := decide(a, d, newHostSet)
if reason == "" {
kept = append(kept, a)
continue
}
plan.Skipped = append(plan.Skipped, SkippedAction{Action: a, Reason: reason})
}
plan.Actions = kept
}
// decide returns an empty string when the action should run, or a short
// human-readable reason when it should be skipped.
func decide(a PlannedAction, d *DesiredMesh, newHostSet map[string]struct{}) string {
cat := categorize(a)
_, isNewHost := newHostSet[a.Host]
switch d.Intent {
case IntentExtend:
if isNewHost {
// Everything runs on a brand-new host — it needs the full install.
return ""
}
// Existing host in extend mode: only peer-refresh and safe-always
// (whose guards prevent re-runs on converged hosts) actions run.
switch cat {
case catSafeAlways, catPeerRefresh:
return ""
case catDestructiveReplace:
if d.AllowReplace {
return ""
}
return "extend: destructive-replace on existing host blocked; pass --allow-replace to override"
case catVersionBump:
return "extend: version-bump on existing host skipped; use `coolify init upgrade` to bump versions"
case catWipeDB:
return "extend: corrosion DB wipe on existing host is never allowed; resolve schema drift with `coolify init upgrade` on a fresh schema"
case catCorrosionSchemaFirstWrite:
return ""
}
case IntentUpgrade:
switch cat {
case catVersionBump:
return ""
case catPeerRefresh:
if isUpgradeServiceRestart(a.Type) {
return ""
}
return "upgrade: peer-refresh skipped; use `coolify init extend` for mesh topology changes"
case catSafeAlways, catDestructiveReplace, catWipeDB, catCorrosionSchemaFirstWrite:
return "upgrade: non-version-bump action skipped"
}
default:
// IntentBootstrap (and unknown intents) keep every action.
}
return ""
}
// isUpgradeServiceRestart returns true when a peer-refresh action is the
// follow-up systemctl restart after a binary install and must run in upgrade
// mode to pick up the new binary.
func isUpgradeServiceRestart(t ActionType) bool {
switch t {
case ActionInstallCorrosionService,
ActionInstallCooldService,
ActionInstallSchedulerService:
return true
default:
return false
}
}
-291
View File
@@ -1,291 +0,0 @@
package wireguard
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestValidateIntent_Bootstrap(t *testing.T) {
d := &DesiredMesh{Intent: IntentBootstrap}
require.NoError(t, ValidateIntent(d))
}
func TestValidateIntent_ExtendRequiresNewHosts(t *testing.T) {
d := &DesiredMesh{
Intent: IntentExtend,
Hosts: []string{"A", "B"},
}
err := ValidateIntent(d)
require.Error(t, err)
assert.Contains(t, err.Error(), "NewHosts")
}
func TestValidateIntent_ExtendNewHostMustBeInServers(t *testing.T) {
d := &DesiredMesh{
Intent: IntentExtend,
Hosts: []string{"A", "B"},
NewHosts: []string{"C"},
}
err := ValidateIntent(d)
require.Error(t, err)
assert.Contains(t, err.Error(), `"C"`)
assert.Contains(t, err.Error(), "--servers")
}
func TestValidateIntent_ExtendHappy(t *testing.T) {
d := &DesiredMesh{
Intent: IntentExtend,
Hosts: []string{"A", "B", "C"},
NewHosts: []string{"C"},
}
require.NoError(t, ValidateIntent(d))
}
func TestValidateIntent_UpgradeRejectsNightlyByDefault(t *testing.T) {
for _, tc := range []struct {
name string
d DesiredMesh
}{
{"coold", DesiredMesh{Intent: IntentUpgrade, CooldVersion: "nightly", CorrosionVersion: "v1", SchedulerVersion: "v1"}},
{"corrosion", DesiredMesh{Intent: IntentUpgrade, CooldVersion: "v1", CorrosionVersion: "nightly", SchedulerVersion: "v1"}},
{"scheduler", DesiredMesh{Intent: IntentUpgrade, CooldVersion: "v1", CorrosionVersion: "v1", SchedulerVersion: "nightly"}},
} {
t.Run(tc.name, func(t *testing.T) {
err := ValidateIntent(&tc.d)
require.Error(t, err)
assert.Contains(t, err.Error(), "nightly")
})
}
}
func TestValidateIntent_UpgradeAllowsNightlyWhenOpted(t *testing.T) {
d := &DesiredMesh{
Intent: IntentUpgrade,
CooldVersion: "nightly",
CorrosionVersion: "nightly",
SchedulerVersion: "nightly",
AllowNightly: true,
}
require.NoError(t, ValidateIntent(d))
}
func TestValidateIntent_UpgradeAllowsPinned(t *testing.T) {
d := &DesiredMesh{
Intent: IntentUpgrade,
CooldVersion: "v1.2.3",
CorrosionVersion: "v0.9.0",
SchedulerVersion: "v0.3.0",
}
require.NoError(t, ValidateIntent(d))
}
func TestValidateIntent_UnknownIntent(t *testing.T) {
d := &DesiredMesh{Intent: Intent("bogus")}
err := ValidateIntent(d)
require.Error(t, err)
assert.Contains(t, err.Error(), "bogus")
}
func TestCategorize(t *testing.T) {
cases := []struct {
t ActionType
want actionCategory
}{
{ActionInstallWG, catSafeAlways},
{ActionGenKeyPair, catSafeAlways},
{ActionAllocateMgmtIP, catSafeAlways},
{ActionAllocateContainerSubnet, catSafeAlways},
{ActionEnableService, catSafeAlways},
{ActionInstallPodman, catSafeAlways},
{ActionEnablePodmanSocket, catSafeAlways},
{ActionEnableIPForward, catSafeAlways},
{ActionCreatePodmanNet, catSafeAlways},
{ActionGenerateJWTKeypair, catSafeAlways},
{ActionAddPeer, catSafeAlways},
{ActionRemovePeer, catSafeAlways},
{ActionWriteConfig, catPeerRefresh},
{ActionReloadService, catPeerRefresh},
{ActionInstallFirewall, catPeerRefresh},
{ActionWriteCorrosionConfig, catPeerRefresh},
{ActionInstallCorrosionService, catPeerRefresh},
{ActionInstallCooldService, catPeerRefresh},
{ActionInstallSchedulerService, catPeerRefresh},
{ActionWriteHostJWT, catPeerRefresh},
{ActionUpdateCooldSchedulerEnv, catPeerRefresh},
{ActionRecreatePodmanNet, catDestructiveReplace},
{ActionInstallCorrosion, catVersionBump},
{ActionInstallCoold, catVersionBump},
{ActionInstallScheduler, catVersionBump},
{ActionInstallBuilder, catVersionBump},
}
for _, tc := range cases {
t.Run(string(tc.t), func(t *testing.T) {
assert.Equal(t, tc.want, categorize(PlannedAction{Type: tc.t}))
})
}
}
func TestCategorize_SchemaWipeVsFirstWrite(t *testing.T) {
firstWrite := PlannedAction{
Type: ActionWriteCorrosionSchema,
Detail: "/etc/corrosion/schemas/coolify.sql",
}
wipe := PlannedAction{
Type: ActionWriteCorrosionSchema,
Detail: "/etc/corrosion/schemas/coolify.sql [schema drift — DB will be reset]",
}
assert.Equal(t, catCorrosionSchemaFirstWrite, categorize(firstWrite))
assert.Equal(t, catWipeDB, categorize(wipe))
}
func TestFilterByIntent_BootstrapNoop(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A", Type: ActionInstallCoold},
{Host: "B", Type: ActionRecreatePodmanNet},
{Host: "B", Type: ActionWriteCorrosionSchema, Detail: "DB will be reset"},
}}
filterByIntent(plan, &DesiredMesh{Intent: IntentBootstrap})
assert.Len(t, plan.Actions, 3)
assert.Empty(t, plan.Skipped)
}
func TestFilterByIntent_ExtendNewHostRunsEverything(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-new", Type: ActionInstallCoold},
{Host: "A-new", Type: ActionInstallCorrosion},
{Host: "A-new", Type: ActionCreatePodmanNet},
{Host: "A-new", Type: ActionWriteCorrosionSchema, Detail: "first write"},
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
})
assert.Len(t, plan.Actions, 4)
assert.Empty(t, plan.Skipped)
}
func TestFilterByIntent_ExtendExistingHostPeerRefreshOnly(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-old", Type: ActionWriteConfig},
{Host: "A-old", Type: ActionReloadService},
{Host: "A-old", Type: ActionWriteCorrosionConfig},
{Host: "A-old", Type: ActionInstallFirewall},
{Host: "A-old", Type: ActionInstallCoold}, // version bump: skipped
{Host: "A-old", Type: ActionInstallBuilder}, // version bump: skipped
{Host: "A-new", Type: ActionInstallCoold}, // new host: kept
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
})
kept := map[ActionType]bool{}
for _, a := range plan.Actions {
kept[a.Type] = true
}
assert.True(t, kept[ActionWriteConfig])
assert.True(t, kept[ActionReloadService])
assert.True(t, kept[ActionWriteCorrosionConfig])
assert.True(t, kept[ActionInstallFirewall])
skippedTypes := map[ActionType]int{}
for _, s := range plan.Skipped {
skippedTypes[s.Action.Type]++
}
// InstallCoold/InstallBuilder appear once for the existing host and kept once for the new host.
assert.Equal(t, 1, skippedTypes[ActionInstallCoold])
assert.Equal(t, 1, skippedTypes[ActionInstallBuilder])
// Exactly one InstallCoold survived — the one targeting the new host.
var survivors []string
for _, a := range plan.Actions {
if a.Type == ActionInstallCoold {
survivors = append(survivors, a.Host)
}
}
assert.Equal(t, []string{"A-new"}, survivors)
}
func TestFilterByIntent_ExtendBlocksDestructiveOnExistingWithoutAllowReplace(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-old", Type: ActionRecreatePodmanNet, Detail: "coolify-default-mesh — dns_enabled=true"},
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
})
assert.Empty(t, plan.Actions)
require.Len(t, plan.Skipped, 1)
assert.Contains(t, plan.Skipped[0].Reason, "--allow-replace")
}
func TestFilterByIntent_ExtendAllowReplaceUnlocksDestructive(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-old", Type: ActionRecreatePodmanNet},
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
AllowReplace: true,
})
assert.Len(t, plan.Actions, 1)
assert.Empty(t, plan.Skipped)
}
func TestFilterByIntent_ExtendAllowReplaceDoesNotUnlockWipeDB(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-old", Type: ActionWriteCorrosionSchema, Detail: "schema drift — DB will be reset"},
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
AllowReplace: true,
})
assert.Empty(t, plan.Actions)
require.Len(t, plan.Skipped, 1)
assert.Contains(t, plan.Skipped[0].Reason, "never allowed")
}
func TestFilterByIntent_UpgradeOnlyKeepsVersionBumpsAndServiceRestarts(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A", Type: ActionInstallCoold},
{Host: "A", Type: ActionInstallCorrosion},
{Host: "A", Type: ActionInstallScheduler},
{Host: "A", Type: ActionInstallBuilder},
{Host: "A", Type: ActionInstallCooldService},
{Host: "A", Type: ActionInstallCorrosionService},
{Host: "A", Type: ActionInstallSchedulerService},
{Host: "A", Type: ActionWriteConfig}, // skipped
{Host: "A", Type: ActionReloadService}, // skipped
{Host: "A", Type: ActionCreatePodmanNet}, // skipped
{Host: "A", Type: ActionRecreatePodmanNet}, // skipped
{Host: "A", Type: ActionInstallFirewall}, // skipped (non-restart peer-refresh)
}}
filterByIntent(plan, &DesiredMesh{Intent: IntentUpgrade})
kept := map[ActionType]bool{}
for _, a := range plan.Actions {
kept[a.Type] = true
}
for _, want := range []ActionType{
ActionInstallCoold, ActionInstallCorrosion, ActionInstallScheduler, ActionInstallBuilder,
ActionInstallCooldService, ActionInstallCorrosionService, ActionInstallSchedulerService,
} {
assert.True(t, kept[want], "expected %s kept in upgrade", want)
}
skippedTypes := map[ActionType]bool{}
for _, s := range plan.Skipped {
skippedTypes[s.Action.Type] = true
}
for _, want := range []ActionType{
ActionWriteConfig, ActionReloadService, ActionCreatePodmanNet, ActionRecreatePodmanNet, ActionInstallFirewall,
} {
assert.True(t, skippedTypes[want], "expected %s skipped in upgrade", want)
}
}
-627
View File
@@ -1,627 +0,0 @@
package wireguard
import (
"crypto/sha256"
"encoding/hex"
"fmt"
"net"
"strings"
"github.com/coollabsio/coolify-cli/internal/services"
)
// ActionType identifies the kind of change required.
type ActionType string
const (
ActionInstallWG ActionType = "install-wg"
ActionGenKeyPair ActionType = "gen-keypair"
ActionAllocateMgmtIP ActionType = "allocate-mgmt-ip"
ActionAllocateContainerSubnet ActionType = "allocate-container-subnet"
ActionWriteConfig ActionType = "write-config"
ActionEnableService ActionType = "enable-service"
ActionReloadService ActionType = "reload-service"
ActionAddPeer ActionType = "add-peer"
ActionRemovePeer ActionType = "remove-peer"
ActionInstallPodman ActionType = "install-podman"
ActionEnablePodmanSocket ActionType = "enable-podman-socket"
ActionEnableIPForward ActionType = "enable-ip-forward"
ActionCreatePodmanNet ActionType = "create-podman-network"
ActionRecreatePodmanNet ActionType = "recreate-podman-network"
ActionInstallFirewall ActionType = "install-firewall"
ActionInstallCorrosion ActionType = "install-corrosion"
ActionInstallCoold ActionType = "install-coold"
ActionWriteCorrosionConfig ActionType = "write-corrosion-config"
ActionWriteCorrosionSchema ActionType = "write-corrosion-schema"
ActionInstallCorrosionService ActionType = "install-corrosion-service"
ActionInstallCooldService ActionType = "install-coold-service"
ActionInstallScheduler ActionType = "install-scheduler"
ActionGenerateJWTKeypair ActionType = "generate-jwt-keypair"
ActionInstallSchedulerService ActionType = "install-scheduler-service"
ActionWriteHostJWT ActionType = "write-host-jwt"
ActionUpdateCooldSchedulerEnv ActionType = "update-coold-scheduler-env"
ActionInstallBuilder ActionType = "install-builder"
)
// PlannedAction is one step that apply must execute on a host.
type PlannedAction struct {
Host string
Namespace string // empty for host-global actions
Type ActionType
Detail string
}
// Plan is the list of actions needed to converge the mesh to the desired state.
type Plan struct {
Actions []PlannedAction
// MgmtAssignments maps host → planned WG management /32 IP.
MgmtAssignments map[string]net.IP
// SubnetAssignments maps namespace → host → planned container subnet.
SubnetAssignments map[string]map[string]*net.IPNet
// Warnings contains non-fatal conflict messages from the IP allocator.
Warnings []Warning
// Skipped lists actions that were filtered out by the Intent gate
// (e.g. destructive-replace on an existing host in extend mode). Exposed
// so the plan preview can show operators what would have fired and why.
Skipped []SkippedAction
}
// SkippedAction is a PlannedAction that BuildPlan would have emitted but the
// Intent filter suppressed. Reason is a short human-readable message.
type SkippedAction struct {
Action PlannedAction
Reason string
}
// IsEmpty returns true when the mesh is already converged (no changes needed).
func (p *Plan) IsEmpty() bool { return len(p.Actions) == 0 }
// BuildPlan computes the actions required to bring current into alignment
// with desired. It is a pure function: no SSH, no I/O.
func BuildPlan(desired *DesiredMesh, current MeshState) (*Plan, error) {
if desired.DefaultDenyContainers && !desired.InstallPodman {
return nil, fmt.Errorf("--default-deny requires --podman")
}
if desired.InstallCoold && !desired.InstallPodman {
return nil, fmt.Errorf("--install-coold requires --podman")
}
if desired.InstallPodman && len(desired.Namespaces) == 0 {
return nil, fmt.Errorf("at least one namespace is required")
}
if err := ValidateIntent(desired); err != nil {
return nil, err
}
// Validate per-host preconditions before computing actions.
for _, host := range desired.Hosts {
if state, ok := current.Servers[host]; ok && desired.DefaultDenyContainers {
if !state.NftAvailable {
return nil, fmt.Errorf(
"host %s: nft binary not available; install nftables or pass --skip-default-deny",
host,
)
}
}
}
mgmtAssignments, mgmtWarns, err := AllocateMgmtIPs(desired.MgmtPool, current.AssignedMgmtIPs(), desired.Hosts)
if err != nil {
return nil, fmt.Errorf("mgmt IP allocation: %w", err)
}
containerAssignments, contWarns, err := AllocateNamespaced(
desired.ContainerPool, desired.ContainerPrefix,
current.AssignedContainerSubnets(), desired.Namespaces, desired.Hosts)
if err != nil {
return nil, fmt.Errorf("container subnet allocation: %w", err)
}
plan := &Plan{
MgmtAssignments: mgmtAssignments,
SubnetAssignments: containerAssignments,
Warnings: append(mgmtWarns, contWarns...),
}
nsSorted := desired.SortedNamespaces()
for _, host := range desired.Hosts {
state, ok := current.Servers[host]
if !ok {
state = &ServerState{Host: host, Namespaces: map[string]*NamespaceServerState{}}
}
if state.Namespaces == nil {
state.Namespaces = map[string]*NamespaceServerState{}
}
// --- WireGuard installation ---
if !state.Installed {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallWG,
Detail: "wireguard not installed",
})
}
// --- Key generation ---
if !state.KeysExist {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionGenKeyPair,
Detail: "no keys at /etc/wireguard/privatekey",
})
}
// --- Mgmt IP allocation ---
mgmtIP := mgmtAssignments[host]
if state.WireGuardMgmtIP == nil ||
!state.WireGuardMgmtIP.Equal(mgmtIP) {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionAllocateMgmtIP,
Detail: fmt.Sprintf("%s/32", mgmtIP),
})
}
// --- Container subnet allocation (one per namespace) ---
if desired.InstallPodman {
for _, ns := range nsSorted {
contSubnet := containerAssignments[ns][host]
current := state.Namespaces[ns]
if current == nil || current.ContainerSubnet == nil ||
current.ContainerSubnet.String() != contSubnet.String() {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Namespace: ns,
Type: ActionAllocateContainerSubnet,
Detail: contSubnet.String(),
})
}
}
}
// --- Peer diff ---
desiredPeerKeys := make(map[string]bool)
for _, peer := range desired.Hosts {
if peer == host {
continue
}
if ps, ok2 := current.Servers[peer]; ok2 && ps.PublicKey != "" {
desiredPeerKeys[ps.PublicKey] = true
}
}
currentPeerKeys := make(map[string]bool)
for _, p := range state.Peers {
currentPeerKeys[p.PublicKey] = true
}
for key := range desiredPeerKeys {
if !currentPeerKeys[key] {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionAddPeer,
Detail: truncateKey(key),
})
}
}
for key := range currentPeerKeys {
if !desiredPeerKeys[key] {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionRemovePeer,
Detail: truncateKey(key),
})
}
}
// --- Config write ---
mgmtMismatch := state.WireGuardMgmtIP == nil || !state.WireGuardMgmtIP.Equal(mgmtIP)
allowedIPsDrift := allowedIPsNeedsRewrite(host, desired, current, containerAssignments, mgmtAssignments, state)
needsConfig := mgmtMismatch ||
allowedIPsDrift ||
len(plan.actionsForHost(host, ActionAddPeer)) > 0 ||
len(plan.actionsForHost(host, ActionRemovePeer)) > 0 ||
!state.KeysExist ||
!state.Installed ||
len(desired.Hosts) > 1 && state.ListenPort != desired.ListenPort
if needsConfig {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionWriteConfig,
Detail: fmt.Sprintf("%s.conf (%d peer(s))", desired.Interface, len(desired.Hosts)-1),
})
}
// --- WG service ---
if !state.Active {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionEnableService,
Detail: fmt.Sprintf("systemctl enable --now wg-quick@%s", desired.Interface),
})
} else if needsConfig {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionReloadService,
Detail: fmt.Sprintf("systemctl reload wg-quick@%s (config changed)", desired.Interface),
})
}
// --- Podman stack ---
if desired.InstallPodman {
if !state.PodmanInstalled {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallPodman,
Detail: "podman not installed",
})
}
if !state.PodmanSocketActive {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionEnablePodmanSocket,
Detail: "systemctl enable --now podman.socket",
})
}
if !state.IPForwardEnabled {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionEnableIPForward,
Detail: "net.ipv4.ip_forward=1",
})
}
for _, ns := range nsSorted {
contSubnet := containerAssignments[ns][host]
netName := PodmanNetworkFor(ns)
nss := state.Namespaces[ns]
gw := MachineIP(contSubnet)
if nss == nil || !nss.NetworkExists {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Namespace: ns,
Type: ActionCreatePodmanNet,
Detail: fmt.Sprintf("%s subnet=%s gateway=%s", netName, contSubnet, gw),
})
continue
}
if nss.DNSEnabled ||
(nss.ContainerSubnet != nil && nss.ContainerSubnet.String() != contSubnet.String()) ||
nss.Label != ns {
reasons := []string{}
if nss.DNSEnabled {
reasons = append(reasons, "dns_enabled=true")
}
if nss.ContainerSubnet != nil && nss.ContainerSubnet.String() != contSubnet.String() {
reasons = append(reasons, fmt.Sprintf("subnet drift (have %s, want %s)", nss.ContainerSubnet, contSubnet))
}
if nss.Label != ns {
reasons = append(reasons, fmt.Sprintf("label=%q mismatch", nss.Label))
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Namespace: ns,
Type: ActionRecreatePodmanNet,
Detail: fmt.Sprintf("%s — %s", netName, strings.Join(reasons, "; ")),
})
}
}
// Expected firewall unit text — hash it and compare against the
// remote unit so adding/removing a namespace reinstalls the unit.
var subnets []*net.IPNet
for _, ns := range nsSorted {
subnets = append(subnets, containerAssignments[ns][host])
}
expectedUnit := FirewallServiceUnit(desired.Interface, desired.SortedNamespaces(), subnets, desired.DefaultDenyContainers)
expectedUnitHash := sha256Hex([]byte(expectedUnit))
unitDrift := state.FirewallUnitSha256 != expectedUnitHash
if !state.FirewallActive ||
state.DefaultDenyActive != desired.DefaultDenyContainers ||
unitDrift {
detail := fmt.Sprintf("coolify-mesh-fw.service (%s, %d namespace(s), default-deny=%v)",
desired.Interface, len(subnets), desired.DefaultDenyContainers)
if unitDrift && state.FirewallUnitSha256 != "" {
detail += " [unit drift]"
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallFirewall,
Detail: detail,
})
}
}
// --- Corrosion + coold stack (v5 control plane) ---
if desired.InstallCoold {
corrosionDrift := binaryVersionDrift(desired.CorrosionVersion, state.CorrosionInstalled, state.CorrosionVersion)
cooldDrift := binaryVersionDrift(desired.CooldVersion, state.CooldInstalled, state.CooldVersion)
if corrosionDrift {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallCorrosion,
Detail: fmt.Sprintf("corrosion %s → /usr/local/bin/corrosion", desired.CorrosionVersion),
})
}
if cooldDrift {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallCoold,
Detail: fmt.Sprintf("coold %s → /usr/local/bin/coold", desired.CooldVersion),
})
}
peers := peerMgmtIPs(host, desired.Hosts, mgmtAssignments)
expectedConfig := services.CorrosionConfigBytes(mgmtIP,
desired.CorrosionGossipPort, desired.CorrosionAPIPort, peers)
expectedHash := sha256Hex(expectedConfig)
configDrift := state.CorrosionConfigHash != expectedHash
if configDrift {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionWriteCorrosionConfig,
Detail: fmt.Sprintf("/etc/corrosion/config.toml (peers=%d)", len(peers)),
})
}
expectedSchemaSha := sha256Hex([]byte(services.CoolifySchemaSQL))
schemaDrift := state.CorrosionSchemaSha256 != expectedSchemaSha
if !state.CorrosionSchemaExists || schemaDrift {
detail := "/etc/corrosion/schemas/coolify.sql"
if schemaDrift && state.CorrosionSchemaSha256 != "" {
detail += " [schema drift — DB will be reset]"
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionWriteCorrosionSchema,
Detail: detail,
})
}
nsConfigs := buildNamespaceConfigs(host, nsSorted, containerAssignments)
expectedCooldUnit := services.CooldServiceUnit(mgmtIP, nsConfigs)
cooldUnitDrift := state.CooldUnitSha256 != sha256Hex([]byte(expectedCooldUnit))
if !state.CorrosionActive || configDrift || corrosionDrift || schemaDrift {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallCorrosionService,
Detail: "systemctl enable --now corrosion",
})
}
if !state.CooldActive || configDrift || cooldDrift || cooldUnitDrift {
detail := fmt.Sprintf("systemctl enable --now coold (mgmt=%s, namespaces=%d)", mgmtIP, len(nsConfigs))
if cooldUnitDrift && state.CooldUnitSha256 != "" {
detail += " [unit drift]"
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallCooldService,
Detail: detail,
})
}
}
}
// --- Scheduler + JWT stack (central-only) ---
if desired.CentralHost != "" {
plan.Actions = append(plan.Actions,
PlannedAction{
Host: desired.CentralHost,
Type: ActionInstallScheduler,
Detail: fmt.Sprintf("scheduler %s → /usr/local/bin/scheduler", desired.SchedulerVersion),
},
PlannedAction{
Host: desired.CentralHost,
Type: ActionGenerateJWTKeypair,
Detail: "ES256 EC P-256 keypair at /etc/coolify/jwt.{priv,pub}",
},
PlannedAction{
Host: desired.CentralHost,
Type: ActionInstallSchedulerService,
Detail: fmt.Sprintf("scheduler.service (:%d)",
services.SchedulerGRPCPort),
},
)
// Per-host: JWT + coold unit rewrite (inject scheduler env).
for _, host := range desired.Hosts {
plan.Actions = append(plan.Actions,
PlannedAction{
Host: host,
Type: ActionWriteHostJWT,
Detail: services.HostJWTPath,
},
PlannedAction{
Host: host,
Type: ActionUpdateCooldSchedulerEnv,
Detail: "coold.service += SCHEDULER_URL + HOST_JWT_PATH",
},
)
}
}
// --- Builder capability (per-host, requires scheduler) ---
//
// No separate systemd unit and no second JWT — the builder binary is a
// short-lived subprocess coold spawns under a `systemd-run --pipe`
// transient unit. All we install at provisioning time is the binary plus
// its tool deps (buildah, git); coold advertises the capability via its
// Hello frame, and the JWT `caps` claim (handled by the host-JWT action
// above) authorizes it. Only hosts in the desired builder set get the
// binary install; others stay coold-only.
if desired.CentralHost != "" {
for _, host := range desired.Hosts {
if !desired.HasBuilderCap(host) {
continue
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallBuilder,
Detail: fmt.Sprintf("builder %s → %s (+ buildah, git; capacity=%d)", desired.CooldVersion, services.BuilderBinaryPath, maxCapacity(desired.BuilderCapacity)),
})
}
}
filterByIntent(plan, desired)
return plan, nil
}
func maxCapacity(c int) int {
if c <= 0 {
return 2
}
return c
}
// buildNamespaceConfigs builds the per-namespace CooldNamespace slice for this
// host, in namespace name order. Gateway IP for each namespace is the .1 of
// that namespace's per-host container subnet.
func buildNamespaceConfigs(host string, nsSorted []string, assignments map[string]map[string]*net.IPNet) []services.CooldNamespace {
out := make([]services.CooldNamespace, 0, len(nsSorted))
for _, ns := range nsSorted {
subnet := assignments[ns][host]
if subnet == nil {
continue
}
out = append(out, services.CooldNamespace{
Name: ns,
Network: PodmanNetworkFor(ns),
BridgeGateway: MachineIP(subnet),
})
}
return out
}
// binaryVersionDrift returns true when a binary needs (re-)installation.
// Rules:
// - not installed → always drift
// - marker absent (empty haveVersion) → treat as drift (first-migration case)
// - "nightly" tag → always re-install (moving target)
// - pinned tag → drift only when marker differs from desired
func binaryVersionDrift(desiredVersion string, installed bool, haveVersion string) bool {
if !installed || haveVersion == "" {
return true
}
if desiredVersion == "nightly" {
return true
}
return haveVersion != desiredVersion
}
// allowedIPsNeedsRewrite returns true when any [Peer] block on host does not
// have the expected AllowedIPs (peer mgmt /32 + every namespace subnet).
func allowedIPsNeedsRewrite(
host string,
desired *DesiredMesh,
current MeshState,
containerAssignments map[string]map[string]*net.IPNet,
mgmtAssignments map[string]net.IP,
state *ServerState,
) bool {
if state == nil {
return false
}
nsSorted := desired.SortedNamespaces()
// Build pub-key → expected AllowedIPs set for every peer we should have.
want := map[string]map[string]struct{}{}
for _, peer := range desired.Hosts {
if peer == host {
continue
}
ps, ok := current.Servers[peer]
if !ok || ps.PublicKey == "" {
continue
}
mgmtIP := mgmtAssignments[peer]
if mgmtIP == nil {
continue
}
entries := map[string]struct{}{fmt.Sprintf("%s/32", mgmtIP): {}}
for _, ns := range nsSorted {
if sn := containerAssignments[ns][peer]; sn != nil {
entries[sn.String()] = struct{}{}
}
}
want[ps.PublicKey] = entries
}
// Compare against parsed peers in the current config. If any desired peer
// has different AllowedIPs (missing or extra), we need to rewrite.
have := map[string]map[string]struct{}{}
for _, p := range state.Peers {
s := map[string]struct{}{}
for _, a := range p.AllowedIPs {
s[strings.TrimSpace(a)] = struct{}{}
}
have[p.PublicKey] = s
}
for pk, wantSet := range want {
haveSet, ok := have[pk]
if !ok {
return true
}
if !sameStringSet(wantSet, haveSet) {
return true
}
}
return false
}
func sameStringSet(a, b map[string]struct{}) bool {
if len(a) != len(b) {
return false
}
for k := range a {
if _, ok := b[k]; !ok {
return false
}
}
return true
}
// peerMgmtIPs returns the mgmt IPs of all hosts except self, drawn from the
// planned assignments so the result is stable even before any host has been
// probed.
func peerMgmtIPs(self string, hosts []string, assignments map[string]net.IP) []net.IP {
out := make([]net.IP, 0, len(hosts)-1)
for _, h := range hosts {
if h == self {
continue
}
if ip, ok := assignments[h]; ok && ip != nil {
out = append(out, ip)
}
}
return out
}
func sha256Hex(b []byte) string {
sum := sha256.Sum256(b)
return hex.EncodeToString(sum[:])
}
// actionsForHost returns the subset of plan.Actions matching host and atype.
func (p *Plan) actionsForHost(host string, atype ActionType) []PlannedAction {
var out []PlannedAction
for _, a := range p.Actions {
if a.Host == host && a.Type == atype {
out = append(out, a)
}
}
return out
}
// truncateKey shortens a base64 key to the first 8 chars + "…" for display.
func truncateKey(key string) string {
if len(key) <= 8 {
return key
}
return key[:8] + "..."
}
-658
View File
@@ -1,658 +0,0 @@
package wireguard
import (
"net"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
var (
defaultMgmtPool = mustParseCIDR("100.64.0.0/16")
defaultContainerPool = mustParseCIDR("10.210.0.0/16")
)
func desiredTwoHosts() *DesiredMesh {
return &DesiredMesh{
Hosts: []string{"1.1.1.1", "2.2.2.2"},
Interface: "wg0",
MgmtPool: defaultMgmtPool,
ContainerPool: defaultContainerPool,
ContainerPrefix: 24,
ListenPort: 51820,
}
}
func desiredWithPodman() *DesiredMesh {
d := desiredTwoHosts()
d.InstallPodman = true
d.Namespaces = []string{DefaultNamespace}
return d
}
// convergedServer returns a ServerState fully reconciled for the single
// `default` namespace with the supplied subnet.
func convergedServer(host, pubkey, peerKey, mgmtIP, contSubnet string) *ServerState {
sn := mustParseCIDR(contSubnet)
firewallHash := sha256Hex([]byte(FirewallServiceUnit("wg0", []string{"default"}, []*net.IPNet{sn}, false)))
return &ServerState{
Host: host,
Installed: true,
KeysExist: true,
PublicKey: pubkey,
WireGuardMgmtIP: net.ParseIP(mgmtIP).To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{
PublicKey: peerKey,
AllowedIPs: []string{peerMgmtForPub(peerKey), peerSubnetForPub(peerKey)},
}},
PodmanInstalled: true,
PodmanSocketActive: true,
IPForwardEnabled: true,
FirewallActive: true,
FirewallUnitSha256: firewallHash,
Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {
Namespace: DefaultNamespace,
NetworkExists: true,
ContainerSubnet: sn,
DNSEnabled: false,
Label: DefaultNamespace,
},
},
}
}
// peerMgmtForPub / peerSubnetForPub map the well-known test public keys to
// the mgmt /32 and /24 each peer is expected to own in the two-host fixture.
func peerMgmtForPub(pub string) string {
switch pub {
case "AAAAAAAA=":
return "100.64.0.1/32"
case "BBBBBBBB=":
return "100.64.0.2/32"
}
return ""
}
func peerSubnetForPub(pub string) string {
switch pub {
case "AAAAAAAA=":
return "10.210.0.0/24"
case "BBBBBBBB=":
return "10.210.1.0/24"
}
return ""
}
func TestBuildPlan_AlreadyConverged_NoPodman(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("100.64.0.1").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "BBBBBBBB=", AllowedIPs: []string{"100.64.0.2/32"}}},
},
"2.2.2.2": {
Host: "2.2.2.2",
Installed: true,
KeysExist: true,
PublicKey: "BBBBBBBB=",
WireGuardMgmtIP: net.ParseIP("100.64.0.2").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "AAAAAAAA=", AllowedIPs: []string{"100.64.0.1/32"}}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.True(t, plan.IsEmpty(), "expected empty plan, got: %+v", plan.Actions)
}
func TestBuildPlan_FreshBootstrap(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{Servers: map[string]*ServerState{}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.False(t, plan.IsEmpty())
actionTypes := func(host string) []ActionType {
var out []ActionType
for _, a := range plan.Actions {
if a.Host == host {
out = append(out, a.Type)
}
}
return out
}
for _, host := range []string{"1.1.1.1", "2.2.2.2"} {
types := actionTypes(host)
assert.Contains(t, types, ActionInstallWG, host)
assert.Contains(t, types, ActionGenKeyPair, host)
assert.Contains(t, types, ActionAllocateMgmtIP, host)
assert.Contains(t, types, ActionWriteConfig, host)
assert.Contains(t, types, ActionEnableService, host)
}
}
func TestBuildPlan_MgmtIPMismatchTriggersRewrite(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("10.210.0.1").To4(), // outside 100.64/16
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "BBBBBBBB="}},
},
"2.2.2.2": {
Host: "2.2.2.2",
Installed: true,
KeysExist: true,
PublicKey: "BBBBBBBB=",
WireGuardMgmtIP: net.ParseIP("100.64.0.2").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "AAAAAAAA="}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.NotEmpty(t, plan.Warnings)
var aTypes []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
aTypes = append(aTypes, a.Type)
}
}
assert.Contains(t, aTypes, ActionAllocateMgmtIP)
assert.Contains(t, aTypes, ActionWriteConfig)
}
func TestBuildPlan_AddPeer(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("100.64.0.1").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{},
},
"2.2.2.2": {
Host: "2.2.2.2",
Installed: true,
KeysExist: true,
PublicKey: "BBBBBBBB=",
WireGuardMgmtIP: net.ParseIP("100.64.0.2").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "AAAAAAAA="}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
var types []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
types = append(types, a.Type)
}
}
assert.Contains(t, types, ActionAddPeer)
assert.Contains(t, types, ActionWriteConfig)
assert.Contains(t, types, ActionReloadService)
}
func TestBuildPlan_RemovePeer(t *testing.T) {
desired := &DesiredMesh{
Hosts: []string{"1.1.1.1"},
Interface: "wg0",
MgmtPool: defaultMgmtPool,
ContainerPool: defaultContainerPool,
ContainerPrefix: 24,
ListenPort: 51820,
}
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("100.64.0.1").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "STALEKEY="}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
var types []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
types = append(types, a.Type)
}
}
assert.Contains(t, types, ActionRemovePeer)
assert.Contains(t, types, ActionWriteConfig)
}
func TestBuildPlan_StableMgmtAndContainerAssignments(t *testing.T) {
desired := desiredWithPodman()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
WireGuardMgmtIP: net.ParseIP("100.64.0.7").To4(),
Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {
Namespace: DefaultNamespace,
NetworkExists: true,
ContainerSubnet: mustParseCIDR("10.210.5.0/24"),
Label: DefaultNamespace,
},
},
},
"2.2.2.2": {
Host: "2.2.2.2",
WireGuardMgmtIP: nil,
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.Equal(t, "100.64.0.7", plan.MgmtAssignments["1.1.1.1"].String())
assert.Equal(t, "10.210.5.0/24", plan.SubnetAssignments[DefaultNamespace]["1.1.1.1"].String())
}
func TestBuildPlan_PodmanFullStack(t *testing.T) {
desired := desiredWithPodman()
current := MeshState{Servers: map[string]*ServerState{}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
collect := func(host string) []ActionType {
var out []ActionType
for _, a := range plan.Actions {
if a.Host == host {
out = append(out, a.Type)
}
}
return out
}
for _, h := range []string{"1.1.1.1", "2.2.2.2"} {
types := collect(h)
assert.Contains(t, types, ActionInstallPodman, h)
assert.Contains(t, types, ActionEnablePodmanSocket, h)
assert.Contains(t, types, ActionEnableIPForward, h)
assert.Contains(t, types, ActionCreatePodmanNet, h)
assert.Contains(t, types, ActionInstallFirewall, h)
assert.Contains(t, types, ActionAllocateContainerSubnet, h)
}
}
func TestBuildPlan_PodmanIdempotent(t *testing.T) {
desired := desiredWithPodman()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24"),
"2.2.2.2": convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24"),
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.True(t, plan.IsEmpty(), "expected empty plan, got: %+v", plan.Actions)
}
func TestBuildPlan_PodmanNotRequested(t *testing.T) {
desired := desiredTwoHosts() // InstallPodman == false
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("100.64.0.1").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "BBBBBBBB="}},
},
"2.2.2.2": {
Host: "2.2.2.2",
Installed: true,
KeysExist: true,
PublicKey: "BBBBBBBB=",
WireGuardMgmtIP: net.ParseIP("100.64.0.2").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "AAAAAAAA="}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
for _, a := range plan.Actions {
assert.NotEqual(t, ActionInstallPodman, a.Type)
assert.NotEqual(t, ActionEnablePodmanSocket, a.Type)
assert.NotEqual(t, ActionEnableIPForward, a.Type)
assert.NotEqual(t, ActionCreatePodmanNet, a.Type)
assert.NotEqual(t, ActionInstallFirewall, a.Type)
assert.NotEqual(t, ActionAllocateContainerSubnet, a.Type)
}
}
func TestBuildPlan_PodmanDNSEnabledTriggersRecreate(t *testing.T) {
desired := desiredWithPodman()
srvA := convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24")
srvA.Namespaces[DefaultNamespace].DNSEnabled = true // drift: aardvark-dns would squat :53
srvB := convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24")
current := MeshState{Servers: map[string]*ServerState{"1.1.1.1": srvA, "2.2.2.2": srvB}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
var aTypes, bTypes []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
aTypes = append(aTypes, a.Type)
}
if a.Host == "2.2.2.2" {
bTypes = append(bTypes, a.Type)
}
}
assert.Contains(t, aTypes, ActionRecreatePodmanNet, "host A must recreate (dns_enabled=true)")
assert.NotContains(t, aTypes, ActionCreatePodmanNet, "host A already exists — only recreate")
assert.NotContains(t, bTypes, ActionRecreatePodmanNet, "host B fine, no recreate")
assert.NotContains(t, bTypes, ActionCreatePodmanNet, "host B fine, no create")
}
func TestBuildPlan_FirewallMissing(t *testing.T) {
desired := desiredWithPodman()
srvA := convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24")
srvA.FirewallActive = false
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": srvA,
"2.2.2.2": convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24"),
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
var aTypes []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
aTypes = append(aTypes, a.Type)
}
}
assert.Equal(t, []ActionType{ActionInstallFirewall}, aTypes)
}
func TestBuildPlan_NftUnavailable_ReturnsError(t *testing.T) {
desired := desiredWithPodman()
desired.DefaultDenyContainers = true
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
NftAvailable: false,
},
"2.2.2.2": {
Host: "2.2.2.2",
NftAvailable: false,
},
},
}
_, err := BuildPlan(desired, current)
require.Error(t, err)
assert.Contains(t, err.Error(), "nft binary not available")
}
func TestBuildPlan_DefaultDenyRequiresPodman(t *testing.T) {
desired := desiredTwoHosts()
desired.DefaultDenyContainers = true // InstallPodman left false
_, err := BuildPlan(desired, MeshState{Servers: map[string]*ServerState{}})
require.Error(t, err)
assert.Contains(t, err.Error(), "--default-deny requires --podman")
}
func TestBuildPlan_DefaultDenyDriftReinstalls(t *testing.T) {
desired := desiredWithPodman()
desired.DefaultDenyContainers = true
// Both hosts converged in mode A (default-deny OFF) — must reinstall to flip on.
srvA := convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24")
srvA.DefaultDenyActive = false
srvA.NftAvailable = true
srvB := convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24")
srvB.DefaultDenyActive = false
srvB.NftAvailable = true
current := MeshState{Servers: map[string]*ServerState{"1.1.1.1": srvA, "2.2.2.2": srvB}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
for _, h := range []string{"1.1.1.1", "2.2.2.2"} {
var found bool
for _, a := range plan.Actions {
if a.Host == h && a.Type == ActionInstallFirewall {
found = true
break
}
}
assert.True(t, found, "expected ActionInstallFirewall for %s", h)
}
}
func TestBuildPlan_DefaultDenyConverged(t *testing.T) {
desired := desiredWithPodman()
desired.DefaultDenyContainers = true
srvA := convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24")
srvA.DefaultDenyActive = true
srvA.NftAvailable = true
srvA.FirewallUnitSha256 = sha256Hex([]byte(FirewallServiceUnit("wg0",
[]string{"default"}, []*net.IPNet{mustParseCIDR("10.210.0.0/24")}, true)))
srvB := convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24")
srvB.DefaultDenyActive = true
srvB.NftAvailable = true
srvB.FirewallUnitSha256 = sha256Hex([]byte(FirewallServiceUnit("wg0",
[]string{"default"}, []*net.IPNet{mustParseCIDR("10.210.1.0/24")}, true)))
current := MeshState{Servers: map[string]*ServerState{"1.1.1.1": srvA, "2.2.2.2": srvB}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.True(t, plan.IsEmpty(), "expected empty plan, got: %+v", plan.Actions)
}
func TestBuildPlan_SurfacesWarnings(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {Host: "1.1.1.1", WireGuardMgmtIP: net.ParseIP("100.64.0.5").To4()},
"2.2.2.2": {Host: "2.2.2.2", WireGuardMgmtIP: net.ParseIP("100.64.0.5").To4()},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.NotEmpty(t, plan.Warnings, "expected warning for duplicate mgmt IP")
}
func TestBuildPlan_MultiNamespacePlansPerNamespace(t *testing.T) {
desired := desiredWithPodman()
desired.Namespaces = []string{DefaultNamespace, "alpha"}
current := MeshState{Servers: map[string]*ServerState{}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
// Two hosts × two namespaces = four create-podman-net actions.
var creates []PlannedAction
for _, a := range plan.Actions {
if a.Type == ActionCreatePodmanNet {
creates = append(creates, a)
}
}
assert.Len(t, creates, 4)
namespaces := map[string]bool{}
for _, a := range creates {
namespaces[a.Namespace] = true
}
assert.True(t, namespaces[DefaultNamespace])
assert.True(t, namespaces["alpha"])
// SubnetAssignments is namespace → host → subnet.
assert.NotNil(t, plan.SubnetAssignments[DefaultNamespace])
assert.NotNil(t, plan.SubnetAssignments["alpha"])
assert.NotEqual(t, plan.SubnetAssignments[DefaultNamespace]["1.1.1.1"].String(),
plan.SubnetAssignments["alpha"]["1.1.1.1"].String(),
"namespaces must carve disjoint subnets")
}
func TestBuildPlan_PodmanRequiresNamespace(t *testing.T) {
desired := desiredTwoHosts()
desired.InstallPodman = true
// no namespaces set
_, err := BuildPlan(desired, MeshState{Servers: map[string]*ServerState{}})
require.Error(t, err)
assert.Contains(t, err.Error(), "namespace")
}
func TestBinaryVersionDrift(t *testing.T) {
tests := []struct {
name string
desiredVersion string
installed bool
haveVersion string
wantDrift bool
}{
{"not installed", "nightly", false, "", true},
{"installed no marker", "nightly", true, "", true},
{"nightly always drifts", "nightly", true, "nightly", true},
{"pinned matches", "v1.2.3", true, "v1.2.3", false},
{"pinned mismatch", "v1.2.4", true, "v1.2.3", true},
{"pinned no marker", "v1.2.3", true, "", true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := binaryVersionDrift(tt.desiredVersion, tt.installed, tt.haveVersion)
assert.Equal(t, tt.wantDrift, got)
})
}
}
func TestBuildPlan_CooldVersionDrift(t *testing.T) {
desired := desiredWithPodman()
desired.InstallCoold = true
desired.CooldVersion = "v1.2.3"
desired.CorrosionVersion = "v1.2.3"
desired.CorrosionGossipPort = 8787
desired.CorrosionAPIPort = 8080
host := "1.1.1.1"
sn := mustParseCIDR("10.210.0.0/24")
fwHash := sha256Hex([]byte(FirewallServiceUnit("wg0", []string{"default"}, []*net.IPNet{sn}, false)))
state := &ServerState{
Host: host, Installed: true, KeysExist: true, Active: true,
PodmanInstalled: true, PodmanSocketActive: true, IPForwardEnabled: true,
FirewallActive: true, DefaultDenyActive: false, FirewallUnitSha256: fwHash,
CorrosionInstalled: true, CooldInstalled: true,
CorrosionVersion: "v1.2.3", CooldVersion: "v1.2.2", // coold is stale
CorrosionActive: true, CooldActive: true,
Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace, NetworkExists: true, ContainerSubnet: sn, Label: DefaultNamespace},
},
}
plan, err := BuildPlan(desired, MeshState{Servers: map[string]*ServerState{host: state}})
require.NoError(t, err)
types := make(map[ActionType]bool)
for _, a := range plan.Actions {
if a.Host == host {
types[a.Type] = true
}
}
assert.True(t, types[ActionInstallCoold], "stale coold version should trigger install-coold")
assert.False(t, types[ActionInstallCorrosion], "matching corrosion version should not trigger install")
}
func TestBuildPlan_CooldNightlyAlwaysDrifts(t *testing.T) {
desired := desiredWithPodman()
desired.InstallCoold = true
desired.CooldVersion = "nightly"
desired.CorrosionVersion = "nightly"
desired.CorrosionGossipPort = 8787
desired.CorrosionAPIPort = 8080
host := "1.1.1.1"
sn := mustParseCIDR("10.210.0.0/24")
fwHash := sha256Hex([]byte(FirewallServiceUnit("wg0", []string{"default"}, []*net.IPNet{sn}, false)))
state := &ServerState{
Host: host, Installed: true, KeysExist: true, Active: true,
PodmanInstalled: true, PodmanSocketActive: true, IPForwardEnabled: true,
FirewallActive: true, DefaultDenyActive: false, FirewallUnitSha256: fwHash,
CorrosionInstalled: true, CooldInstalled: true,
CorrosionVersion: "nightly", CooldVersion: "nightly",
CorrosionActive: true, CooldActive: true,
Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace, NetworkExists: true, ContainerSubnet: sn, Label: DefaultNamespace},
},
}
plan, err := BuildPlan(desired, MeshState{Servers: map[string]*ServerState{host: state}})
require.NoError(t, err)
types := make(map[ActionType]bool)
for _, a := range plan.Actions {
if a.Host == host {
types[a.Type] = true
}
}
assert.True(t, types[ActionInstallCoold], "nightly tag always triggers install-coold")
assert.True(t, types[ActionInstallCorrosion], "nightly tag always triggers install-corrosion")
}
-318
View File
@@ -1,318 +0,0 @@
package wireguard
import (
"context"
"fmt"
"net"
"strconv"
"strings"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// Probe SSHes into host and reads its current WireGuard + Podman state.
// All commands use `|| true` so a missing package or interface never
// causes a non-zero exit that would abort the probe.
func Probe(ctx context.Context, runner ssh.Runner, host, user string, port int, iface string, namespaces []string) (*ServerState, error) {
state := &ServerState{
Host: host,
Interface: iface,
Namespaces: map[string]*NamespaceServerState{},
}
// 1. Check if WireGuard is installed.
stdout, _, _ := runner.Run(ctx, host, user, port,
`dpkg-query -W -f='${Status}' wireguard 2>/dev/null | grep -c 'install ok installed' || echo 0`)
if strings.TrimSpace(stdout) == "1" {
state.Installed = true
}
// 2. Read public key.
stdout, _, _ = runner.Run(ctx, host, user, port,
`cat /etc/wireguard/publickey 2>/dev/null || true`)
if pk := strings.TrimSpace(stdout); pk != "" {
state.PublicKey = pk
state.KeysExist = true
}
// 3. Parse the config file for management IP and peer list.
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`cat /etc/wireguard/%s.conf 2>/dev/null || true`, iface))
if strings.TrimSpace(stdout) != "" {
parseConfigFile(state, stdout)
}
// 4. Check if WG interface is currently up.
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`wg show %s dump 2>/dev/null || true`, iface))
if strings.TrimSpace(stdout) != "" {
state.Active = true
}
// 5. Podman package installed.
stdout, _, _ = runner.Run(ctx, host, user, port,
`dpkg-query -W -f='${Status}' podman 2>/dev/null | grep -c 'install ok installed' || echo 0`)
if strings.TrimSpace(stdout) == "1" {
state.PodmanInstalled = true
}
// 6. podman.socket active.
stdout, _, _ = runner.Run(ctx, host, user, port,
`systemctl is-active podman.socket 2>/dev/null || true`)
if strings.TrimSpace(stdout) == "active" {
state.PodmanSocketActive = true
}
// 7. Per-namespace podman network state.
if state.PodmanInstalled {
for _, ns := range namespaces {
nss := &NamespaceServerState{Namespace: ns}
netName := PodmanNetworkFor(ns)
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`podman network exists %s 2>/dev/null && echo yes || echo no`, netName))
if strings.TrimSpace(stdout) == "yes" {
nss.NetworkExists = true
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`podman network inspect %s -f '{{(index .Subnets 0).Subnet}}' 2>/dev/null || true`, netName))
if s := strings.TrimSpace(stdout); s != "" {
if _, n, err := net.ParseCIDR(s); err == nil {
nss.ContainerSubnet = n
}
}
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`podman network inspect %s -f '{{.DNSEnabled}}' 2>/dev/null || true`, netName))
if strings.TrimSpace(stdout) == "true" {
nss.DNSEnabled = true
}
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`podman network inspect %s -f '{{index .Labels "io.coolify.namespace"}}' 2>/dev/null || true`, netName))
nss.Label = strings.TrimSpace(stdout)
}
state.Namespaces[ns] = nss
}
}
// 8. IP forwarding enabled.
stdout, _, _ = runner.Run(ctx, host, user, port,
`sysctl -n net.ipv4.ip_forward 2>/dev/null || echo 0`)
if strings.TrimSpace(stdout) == "1" {
state.IPForwardEnabled = true
}
// 9. coolify-mesh-fw.service active.
stdout, _, _ = runner.Run(ctx, host, user, port,
`systemctl is-active coolify-mesh-fw.service 2>/dev/null || true`)
if strings.TrimSpace(stdout) == "active" {
state.FirewallActive = true
}
// 9a. Firewall unit hash — detects drift when the desired namespace set
// changes (FORWARD jumps gain/lose subnets).
stdout, _, _ = runner.Run(ctx, host, user, port,
`sha256sum /etc/systemd/system/coolify-mesh-fw.service 2>/dev/null | awk '{print $1}' || true`)
if h := strings.TrimSpace(stdout); h != "" {
state.FirewallUnitSha256 = h
}
// 10. Default-deny scaffold present (COOLIFY-INTRA chain ends in DROP).
stdout, _, _ = runner.Run(ctx, host, user, port,
`iptables -nL COOLIFY-INTRA 2>/dev/null | grep -q DROP && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.DefaultDenyActive = true // will be AND-ed with BridgeTableExists below
}
// 10a. nft binary available.
stdout, _, _ = runner.Run(ctx, host, user, port,
`command -v nft >/dev/null 2>&1 && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.NftAvailable = true
}
// 10b. nft bridge table for intra-namespace default-deny present.
stdout, _, _ = runner.Run(ctx, host, user, port,
`nft list table bridge coolify_bridge >/dev/null 2>&1 && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.BridgeTableExists = true
}
state.DefaultDenyActive = state.DefaultDenyActive && state.BridgeTableExists
// 11. Corrosion binary installed.
stdout, _, _ = runner.Run(ctx, host, user, port,
`test -x /usr/local/bin/corrosion && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.CorrosionInstalled = true
}
// 12. Corrosion systemd service active.
stdout, _, _ = runner.Run(ctx, host, user, port,
`systemctl is-active corrosion 2>/dev/null || true`)
if strings.TrimSpace(stdout) == "active" {
state.CorrosionActive = true
}
// 13. Corrosion config hash (empty when missing).
stdout, _, _ = runner.Run(ctx, host, user, port,
`sha256sum /etc/corrosion/config.toml 2>/dev/null | awk '{print $1}' || true`)
if h := strings.TrimSpace(stdout); h != "" {
state.CorrosionConfigHash = h
}
// 14. Corrosion schema file present.
stdout, _, _ = runner.Run(ctx, host, user, port,
`test -f /etc/corrosion/schemas/coolify.sql && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.CorrosionSchemaExists = true
}
// 14a. sha256 of remote schema file (empty when absent). Used to detect
// schema revisions so a new schema triggers re-write + DB reset.
stdout, _, _ = runner.Run(ctx, host, user, port,
`sha256sum /etc/corrosion/schemas/coolify.sql 2>/dev/null | awk '{print $1}' || true`)
if h := strings.TrimSpace(stdout); h != "" {
state.CorrosionSchemaSha256 = h
}
// 15. Coold binary installed.
stdout, _, _ = runner.Run(ctx, host, user, port,
`test -x /usr/local/bin/coold && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.CooldInstalled = true
}
// 15a. version marker for corrosion (empty when absent / pre-migration).
stdout, _, _ = runner.Run(ctx, host, user, port,
`cat /usr/local/bin/corrosion.version 2>/dev/null || true`)
state.CorrosionVersion = strings.TrimSpace(stdout)
// 15b. version marker for coold (empty when absent / pre-migration).
stdout, _, _ = runner.Run(ctx, host, user, port,
`cat /usr/local/bin/coold.version 2>/dev/null || true`)
state.CooldVersion = strings.TrimSpace(stdout)
// 15c. sha256 of remote coold.service unit (empty when absent).
stdout, _, _ = runner.Run(ctx, host, user, port,
`sha256sum /etc/systemd/system/coold.service 2>/dev/null | awk '{print $1}' || true`)
if h := strings.TrimSpace(stdout); h != "" {
state.CooldUnitSha256 = h
}
// 16. Coold systemd service active.
stdout, _, _ = runner.Run(ctx, host, user, port,
`systemctl is-active coold 2>/dev/null || true`)
if strings.TrimSpace(stdout) == "active" {
state.CooldActive = true
}
return state, nil
}
// Reconstruct runs Probe on every host in parallel and assembles a MeshState.
func Reconstruct(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
port int,
iface string,
namespaces []string,
concurrency int,
) (MeshState, error) {
results := ssh.ForEachServer(ctx, hosts, concurrency, func(ctx context.Context, host string) (*ServerState, error) {
return Probe(ctx, runner, host, user, port, iface, namespaces)
})
mesh := MeshState{Servers: make(map[string]*ServerState, len(hosts))}
var errs []string
for _, r := range results {
if r.Err != nil {
errs = append(errs, fmt.Sprintf("%s: %v", r.Host, r.Err))
mesh.Servers[r.Host] = &ServerState{Host: r.Host, Interface: iface, Namespaces: map[string]*NamespaceServerState{}}
continue
}
mesh.Servers[r.Host] = r.Result
}
if len(errs) > 0 {
return mesh, fmt.Errorf("probe errors:\n %s", strings.Join(errs, "\n "))
}
return mesh, nil
}
// parseConfigFile extracts WireGuard management IP, listen port, and peer list
// from the text content of /etc/wireguard/<iface>.conf.
func parseConfigFile(state *ServerState, content string) {
var (
inInterface bool
inPeer bool
currentPeer Peer
)
for _, line := range strings.Split(content, "\n") {
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "#") {
continue
}
switch strings.ToLower(line) {
case "[interface]":
inInterface = true
inPeer = false
continue
case "[peer]":
if inPeer {
state.Peers = append(state.Peers, currentPeer)
currentPeer = Peer{}
}
inInterface = false
inPeer = true
continue
}
key, value, ok := strings.Cut(line, "=")
if !ok {
continue
}
key = strings.TrimSpace(key)
value = strings.TrimSpace(value)
if inInterface {
switch strings.ToLower(key) {
case "address":
// Parse the host portion of "<ip>/<prefix>"; this is the
// actual management IP, not the network address.
ip, _, err := net.ParseCIDR(value)
if err == nil {
state.WireGuardMgmtIP = ip.To4()
}
case "listenport":
if p, err := strconv.Atoi(value); err == nil {
state.ListenPort = p
}
}
}
if inPeer {
switch strings.ToLower(key) {
case "publickey":
currentPeer.PublicKey = value
case "endpoint":
currentPeer.Endpoint = value
case "allowedips":
for _, a := range strings.Split(value, ",") {
currentPeer.AllowedIPs = append(currentPeer.AllowedIPs, strings.TrimSpace(a))
}
case "presharedkey":
currentPeer.PresharedKey = value
case "persistentkeepalive":
if n, err := strconv.Atoi(value); err == nil {
currentPeer.PersistentKeepalive = n
}
}
}
}
if inPeer && currentPeer.PublicKey != "" {
state.Peers = append(state.Peers, currentPeer)
}
}
-219
View File
@@ -1,219 +0,0 @@
package wireguard
import (
"context"
"os"
"path/filepath"
"strings"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// fakeReconRunner is a deterministic ssh.Runner for reconstruct unit tests.
type fakeReconRunner struct {
responses map[string]string
}
func (f *fakeReconRunner) Run(_ context.Context, _, _ string, _ int, cmd string) (string, string, error) {
for substr, resp := range f.responses {
if strings.Contains(cmd, substr) {
return resp, "", nil
}
}
return "", "", nil
}
func readFixture(t *testing.T, name string) string {
t.Helper()
path := filepath.Join("..", "..", "test", "fixtures", "wg", name)
b, err := os.ReadFile(path)
require.NoError(t, err, "missing fixture %s", name)
return string(b)
}
func TestParseConfigFile_Full(t *testing.T) {
content := readFixture(t, "wg0.conf")
state := &ServerState{}
parseConfigFile(state, content)
require.NotNil(t, state.WireGuardMgmtIP)
assert.Equal(t, "100.64.0.1", state.WireGuardMgmtIP.String())
assert.Equal(t, 51820, state.ListenPort)
require.Len(t, state.Peers, 1)
assert.Equal(t, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBK=", state.Peers[0].PublicKey)
assert.Equal(t, "203.0.113.11:51820", state.Peers[0].Endpoint)
assert.Equal(t, 25, state.Peers[0].PersistentKeepalive)
}
func TestParseConfigFile_Empty(t *testing.T) {
state := &ServerState{}
parseConfigFile(state, "")
assert.Nil(t, state.WireGuardMgmtIP)
assert.Empty(t, state.Peers)
}
func TestParseConfigFile_MultiplePeers(t *testing.T) {
content := `[Interface]
Address = 100.64.0.1/32
ListenPort = 51820
PrivateKey = aaa
[Peer]
PublicKey = BBB=
AllowedIPs = 100.64.0.2/32, 10.210.1.0/24
Endpoint = 1.2.3.4:51820
PersistentKeepalive = 25
[Peer]
PublicKey = CCC=
AllowedIPs = 100.64.0.2/32, 10.210.2.0/24
Endpoint = 1.2.3.5:51820
PersistentKeepalive = 25
`
state := &ServerState{}
parseConfigFile(state, content)
require.Len(t, state.Peers, 2)
assert.Equal(t, "BBB=", state.Peers[0].PublicKey)
assert.Equal(t, "CCC=", state.Peers[1].PublicKey)
}
func TestParseConfigFile_IgnoresComments(t *testing.T) {
content := `# This is a comment
[Interface]
# Another comment
Address = 100.64.0.5/32
ListenPort = 51820
PrivateKey = xxx
`
state := &ServerState{}
parseConfigFile(state, content)
require.NotNil(t, state.WireGuardMgmtIP)
assert.Equal(t, "100.64.0.5", state.WireGuardMgmtIP.String())
assert.Empty(t, state.Peers)
}
func TestParseConfigFile_CaseInsensitiveKeys(t *testing.T) {
content := `[interface]
address = 100.64.0.10/32
listenport = 12345
privatekey = xxx
`
state := &ServerState{}
parseConfigFile(state, content)
require.NotNil(t, state.WireGuardMgmtIP)
assert.Equal(t, "100.64.0.10", state.WireGuardMgmtIP.String())
assert.Equal(t, 12345, state.ListenPort)
}
func TestMeshState_AssignedMgmtIPs(t *testing.T) {
mesh := MeshState{
Servers: map[string]*ServerState{
"a": {Host: "a", WireGuardMgmtIP: []byte{100, 64, 0, 1}},
"b": {Host: "b", WireGuardMgmtIP: nil},
"c": {Host: "c", WireGuardMgmtIP: []byte{100, 64, 0, 3}},
},
}
ips := mesh.AssignedMgmtIPs()
assert.Len(t, ips, 2)
assert.Contains(t, ips, "a")
assert.NotContains(t, ips, "b")
assert.Contains(t, ips, "c")
}
func TestMeshState_AssignedContainerSubnets(t *testing.T) {
mesh := MeshState{
Servers: map[string]*ServerState{
"a": {Host: "a", Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace, ContainerSubnet: mustParseCIDR("10.210.0.0/24")},
"alpha": {Namespace: "alpha", ContainerSubnet: mustParseCIDR("10.220.0.0/24")},
}},
"b": {Host: "b", Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace}, // ContainerSubnet nil
}},
"c": {Host: "c", Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace, ContainerSubnet: mustParseCIDR("10.210.2.0/24")},
}},
},
}
subs := mesh.AssignedContainerSubnets()
// Nested: namespace → host → subnet.
assert.Contains(t, subs[DefaultNamespace], "a")
assert.NotContains(t, subs[DefaultNamespace], "b")
assert.Contains(t, subs[DefaultNamespace], "c")
assert.Contains(t, subs["alpha"], "a")
}
func TestTruncateKey(t *testing.T) {
tests := []struct {
input string
want string
}{
{"", ""},
{"short", "short"},
{"12345678", "12345678"},
{"123456789", "12345678..."},
{"AAAAAAAABBBBBBBB", "AAAAAAAA..."},
}
for _, tt := range tests {
assert.Equal(t, tt.want, truncateKey(tt.input), "input: %q", tt.input)
}
}
func TestProbe_NftAvailableAndBridgeTableExists_True(t *testing.T) {
runner := &fakeReconRunner{
responses: map[string]string{
"dpkg-query": "1\n",
"wg show": "",
"cat /etc/wireguard/": "",
"wg pubkey": "",
"ip -4 -o addr show": "",
"systemctl is-active wg-quick": "active\n",
"podman --version": "podman version 4.9.0\n",
"systemctl is-active podman.socket": "active\n",
"sysctl net.ipv4.ip_forward": "net.ipv4.ip_forward = 1\n",
"podman network inspect": `[{"name":"coolify-default-mesh","subnets":[{"subnet":"10.210.0.0/24","gateway":"10.210.0.1"}],"dns_enabled":false,"labels":{"io.coolify.managed":"true","io.coolify.namespace":"default"}}]` + "\n",
"systemctl is-active coolify-mesh-fw": "active\n",
"sha256sum /etc/systemd/system/coolify-mesh-fw.service": "",
"iptables -nL COOLIFY-INTRA": "yes\n",
"command -v nft": "yes\n",
"nft list table bridge coolify_bridge": "yes\n",
"test -x /usr/local/bin/corrosion": "yes\n",
"systemctl is-active corrosion": "active\n",
"sha256sum /etc/corrosion/config.toml": "",
"test -x /usr/local/bin/coold": "yes\n",
"systemctl is-active coold": "active\n",
"cat /etc/coolify/coold-version": "",
"cat /etc/coolify/corrosion-version": "",
},
}
state, err := Probe(context.Background(), runner, "1.1.1.1", "root", 22, "wg0", []string{"default"})
require.NoError(t, err)
assert.True(t, state.NftAvailable, "NftAvailable should be true")
assert.True(t, state.BridgeTableExists, "BridgeTableExists should be true")
// DefaultDenyActive = COOLIFY-INTRA DROP && BridgeTableExists
assert.True(t, state.DefaultDenyActive, "DefaultDenyActive should be true when both conditions met")
}
func TestProbe_NftNotAvailable_BridgeTableAbsent(t *testing.T) {
runner := &fakeReconRunner{
responses: map[string]string{
"dpkg-query": "1\n",
"iptables -nL COOLIFY-INTRA": "yes\n",
"command -v nft": "no\n",
"nft list table bridge coolify_bridge": "no\n",
},
}
state, err := Probe(context.Background(), runner, "1.1.1.1", "root", 22, "wg0", []string{"default"})
require.NoError(t, err)
assert.False(t, state.NftAvailable, "NftAvailable should be false")
assert.False(t, state.BridgeTableExists, "BridgeTableExists should be false")
// DefaultDenyActive must be false even though COOLIFY-INTRA has DROP
assert.False(t, state.DefaultDenyActive, "DefaultDenyActive should be false when BridgeTableExists is false")
}
-400
View File
@@ -1,400 +0,0 @@
// Package wireguard implements the WireGuard mesh bootstrap logic for
// the coolify init command (alpha, Coolify v5).
package wireguard
import (
"net"
"sort"
)
// DefaultNamespace is the namespace used when the user does not pass
// --namespaces. It is also always present even in a multi-namespace setup —
// coold's config assumes a `default` entry.
const DefaultNamespace = "default"
// PodmanNetworkFor returns the podman bridge name backing namespace ns on
// every host. Derived as `coolify-<ns>-mesh` so the namespace is visible
// directly in `podman network ls`.
func PodmanNetworkFor(ns string) string {
return "coolify-" + ns + "-mesh"
}
// Peer represents a single WireGuard peer as seen in the config or
// from `wg show <iface> dump`.
type Peer struct {
PublicKey string
PresharedKey string // "(none)" when absent
Endpoint string // "ip:port" or empty
AllowedIPs []string
LatestHandshake int64 // Unix timestamp; 0 means no handshake yet
PersistentKeepalive int // seconds; 0 means disabled
}
// NamespaceServerState captures per-namespace podman state on one host. A
// ServerState carries one entry per namespace in the desired set.
type NamespaceServerState struct {
// Namespace is the logical namespace name (e.g. "default", "alpha").
Namespace string
// NetworkExists is true when the per-namespace podman bridge
// (coolify-<ns>-mesh) already exists on this host.
NetworkExists bool
// ContainerSubnet is the /<prefix> owned by the per-namespace bridge
// (read from `podman network inspect`). nil when not yet created.
ContainerSubnet *net.IPNet
// DNSEnabled is true when the per-namespace network has `dns_enabled=true`
// (netavark auto-starts aardvark-dns on the bridge gateway:53). coold owns
// that socket, so drift triggers ActionRecreatePodmanNet.
DNSEnabled bool
// Label is the `io.coolify.namespace` label on the network. Used only as
// an assertion that the network was created by us — label mismatch is
// treated like "the network exists but is not ours" and triggers recreate.
Label string
}
// ServerState holds the reconstructed WireGuard + Podman state for one server.
// It is built from live SSH probes and never cached to disk.
type ServerState struct {
// Host is the SSH address used to reach this server.
// It also serves as the WireGuard Endpoint value for peer configs.
Host string
// Installed is true when the wireguard package is present.
Installed bool
// KeysExist is true when /etc/wireguard/privatekey exists.
KeysExist bool
// PublicKey is the content of /etc/wireguard/publickey (trimmed).
// Empty when KeysExist is false.
PublicKey string
// WireGuardMgmtIP is the /32 management IP assigned to wg0 (parsed from
// the [Interface] Address line). Lives outside the container pool so the
// Podman bridge can own the full per-host /24 without conflict.
// nil when not yet assigned.
WireGuardMgmtIP net.IP
// ListenPort is the WireGuard listen port from the config.
ListenPort int
// Interface is the WireGuard interface name (e.g., "wg0").
Interface string
// Active is true when `wg show <iface>` returns output (interface up).
Active bool
// Peers lists the peers currently present in the config file.
Peers []Peer
// PodmanInstalled is true when the podman package is present.
PodmanInstalled bool
// PodmanSocketActive is true when podman.socket systemd unit is active.
PodmanSocketActive bool
// Namespaces maps namespace name → per-namespace podman state on this
// host. Populated by Probe for every namespace in the desired set.
Namespaces map[string]*NamespaceServerState
// IPForwardEnabled is true when net.ipv4.ip_forward == 1.
IPForwardEnabled bool
// FirewallActive is true when coolify-mesh-fw.service is active.
FirewallActive bool
// DefaultDenyActive is true when the COOLIFY-INTRA chain exists and
// terminates in DROP (the default-deny scaffold is in place).
DefaultDenyActive bool
// FirewallUnitSha256 is the sha256 of /etc/systemd/system/coolify-mesh-fw.service
// (hex), or empty when absent. Used to detect unit drift when the desired
// set of namespace subnets changes.
FirewallUnitSha256 string
// BridgeTableExists is true when `nft list table bridge coolify_bridge`
// succeeds on this host (nft bridge-family deny scaffold is in place).
BridgeTableExists bool
// NftAvailable is true when `nft --version` exits 0 on this host.
NftAvailable bool
// CorrosionInstalled is true when /usr/local/bin/corrosion exists and is executable.
CorrosionInstalled bool
// CorrosionActive is true when the corrosion systemd service is active.
CorrosionActive bool
// CorrosionConfigHash is the sha256 of /etc/corrosion/config.toml, or empty
// when the file is absent. Used to detect drift when peer list changes.
CorrosionConfigHash string
// CorrosionSchemaExists is true when /etc/corrosion/schemas/coolify.sql exists.
CorrosionSchemaExists bool
// CorrosionSchemaSha256 is the sha256 of /etc/corrosion/schemas/coolify.sql
// (hex), or empty when absent. Used by BuildPlan to detect schema drift so
// a new schema revision triggers re-write + corrosion restart + DB reset.
CorrosionSchemaSha256 string
// CooldInstalled is true when /usr/local/bin/coold exists and is executable.
CooldInstalled bool
// CooldActive is true when the coold systemd service is active.
CooldActive bool
// CorrosionVersion is the content of /usr/local/bin/corrosion.version
// (trimmed), or empty when absent. Matches the version tag passed to
// CorrosionInstallCommand (e.g. "nightly", "v1.2.3").
CorrosionVersion string
// CooldVersion is the content of /usr/local/bin/coold.version (trimmed),
// or empty when absent.
CooldVersion string
// CooldUnitSha256 is the sha256 of /etc/systemd/system/coold.service (hex),
// or empty when absent. Used by BuildPlan to detect generator changes
// (e.g. Requires→Wants) that would otherwise be invisible.
CooldUnitSha256 string
}
// MeshState is the reconstructed state across all servers in the mesh.
type MeshState struct {
// Servers maps host → *ServerState.
Servers map[string]*ServerState
}
// AssignedMgmtIPs returns a map of host → net.IP for all servers that
// already have a WG management IP assigned.
func (m *MeshState) AssignedMgmtIPs() map[string]net.IP {
out := make(map[string]net.IP, len(m.Servers))
for host, s := range m.Servers {
if s.WireGuardMgmtIP != nil {
out[host] = s.WireGuardMgmtIP
}
}
return out
}
// AssignedContainerSubnets returns the per-(namespace, host) subnets that are
// already assigned on remote podman networks. The result is nested:
// `out[namespace][host] = subnet`.
func (m *MeshState) AssignedContainerSubnets() map[string]map[string]*net.IPNet {
out := map[string]map[string]*net.IPNet{}
for host, s := range m.Servers {
if s == nil {
continue
}
for ns, nss := range s.Namespaces {
if nss == nil || nss.ContainerSubnet == nil {
continue
}
if out[ns] == nil {
out[ns] = map[string]*net.IPNet{}
}
out[ns][host] = nss.ContainerSubnet
}
}
return out
}
// FirewallSubnets returns the sorted-by-namespace list of this host's
// container subnets across all namespaces (one /prefix per namespace). Used
// by the firewall service unit generator.
func (s *ServerState) FirewallSubnets() []*net.IPNet {
var out []*net.IPNet
names := make([]string, 0, len(s.Namespaces))
for n := range s.Namespaces {
names = append(names, n)
}
sort.Strings(names)
for _, n := range names {
if ns := s.Namespaces[n]; ns != nil && ns.ContainerSubnet != nil {
out = append(out, ns.ContainerSubnet)
}
}
return out
}
// DesiredMesh describes the target WireGuard + Podman configuration.
type DesiredMesh struct {
// Hosts lists the SSH addresses of all servers (also used as WG endpoints).
Hosts []string
// Interface is the WireGuard interface name (default "wg0").
Interface string
// MgmtPool is the address pool from which per-host /32 management IPs
// are carved and assigned to wg0 (default 100.64.0.0/16 — RFC 6598 CGNAT).
MgmtPool *net.IPNet
// ContainerPool is the address pool from which per-(namespace, host)
// container subnets are carved (default 10.210.0.0/16). One pool is
// shared across all namespaces so subnets cannot overlap.
ContainerPool *net.IPNet
// ContainerPrefix is the prefix length of each per-host, per-namespace
// container subnet (default 24, giving each host 254 usable container IPs
// per namespace).
ContainerPrefix int
// ListenPort is the WireGuard UDP listen port (default 51820).
ListenPort int
// InstallPodman, when true, installs Podman, enables its socket, creates
// the per-namespace bridge networks, installs firewall rules, and enables
// IP forwarding on each server.
InstallPodman bool
// Namespaces lists every namespace the mesh should carry. Ordered —
// deterministic iteration produces stable subnet assignments. At least
// one entry (typically "default") is always expected.
Namespaces []string
// DefaultDenyContainers, when true (and InstallPodman is true), installs
// default-deny iptables rules for ALL container traffic on the host's
// container subnets (intra-host AND cross-host via wg0). The v5 control
// plane manages the explicit allow-list in the COOLIFY-ALLOW chain.
DefaultDenyContainers bool
// InstallCoold, when true, downloads corrosion + coold from GitHub releases
// to each host, writes their configs/unit files, and enables both services.
// Requires InstallPodman (coold depends on podman.socket).
InstallCoold bool
// CooldVersion is the release tag to download (e.g. "nightly", "v1.2.3").
CooldVersion string
// CorrosionVersion is the release tag to download for corrosion.
CorrosionVersion string
// CorrosionGossipPort is the SWIM gossip UDP port (default 8787).
CorrosionGossipPort int
// CorrosionAPIPort is the corrosion HTTP API port bound to 127.0.0.1 (default 8080).
CorrosionAPIPort int
// CentralHost is the SSH address of the central VM that runs scheduler.
// Empty string disables phases 4+5 (scheduler setup).
// Must be an element of Hosts.
CentralHost string
// SchedulerVersion is the release tag for scheduler (e.g. "nightly").
SchedulerVersion string
// EnableBuilder, when true and BuilderHosts is empty, installs buildah/
// git and the builder binary on every host in Hosts and advertises
// "builder" in each host's JWT caps claim. When BuilderHosts is non-
// empty it wins and EnableBuilder is ignored. Requires a non-empty
// CentralHost (scheduler issues the JWT) and InstallPodman (buildah needs
// podman's containers-storage).
EnableBuilder bool
// BuilderHosts is the explicit list of hosts that should carry the
// builder capability. Empty slice means "fall back to EnableBuilder".
// Hosts not present in this set get `caps:["coold"]` only and the
// builder binary is not installed on them.
BuilderHosts []string
// BuilderCapacity caps concurrent builds per host. 0 falls back to 2 (the
// coold builder adapter's own default).
BuilderCapacity int
// BuilderCPUQuota is the systemd CPUQuota applied to each build subprocess
// (e.g. "200%" for two full cores). Empty string falls back to coold's
// own default ("200%").
BuilderCPUQuota string
// BuilderMemoryMax is the systemd MemoryMax applied to each build
// subprocess (e.g. "2G"). Empty string falls back to coold's own default
// ("2G").
BuilderMemoryMax string
// BuilderTimeoutSecs is the hard per-build wall-clock timeout in seconds.
// 0 falls back to coold's own default (1800).
BuilderTimeoutSecs int
// Intent selects the action filter applied after BuildPlan computes the
// raw action list. IntentBootstrap (default, zero value) emits every
// applicable action (today's behavior). IntentExtend limits destructive
// and version-bump actions to NewHosts only; existing hosts get just the
// peer-refresh actions required to route traffic to the new peer.
// IntentUpgrade emits only binary-fetch + service-restart actions
// cluster-wide.
Intent Intent
// NewHosts is the subset of Hosts that are brand-new to the mesh on this
// run. Only meaningful when Intent == IntentExtend. Empty = treat every
// host as existing (no-op safe mode).
NewHosts []string
// AllowReplace unlocks destructive-replace actions on existing hosts in
// extend mode (e.g. ActionRecreatePodmanNet). Never unlocks the wipe-DB
// branch of ActionWriteCorrosionSchema.
AllowReplace bool
// AllowNightly lets the upgrade intent accept version tag "nightly".
// Upgrade mode otherwise rejects nightly because it forces a re-install
// on every run instead of only when the pinned version changes.
AllowNightly bool
}
// Intent selects the action filter applied by BuildPlan to match the caller's
// operation (first-time bootstrap vs. adding servers vs. bumping agent
// versions). See DesiredMesh.Intent.
type Intent string
const (
// IntentBootstrap allows every action. Matches pre-subcommand-split
// behavior and is the default for DesiredMesh (zero value).
IntentBootstrap Intent = ""
// IntentExtend runs the full install on hosts in NewHosts and limits
// existing hosts to peer-refresh actions (WG config rewrite + service
// reload + corrosion config rewrite + firewall unit reinstall on drift).
IntentExtend Intent = "extend"
// IntentUpgrade only emits binary-fetch actions + the service-restart
// actions that follow them.
IntentUpgrade Intent = "upgrade"
)
// BuilderHostSet returns the set of hosts that should carry the builder
// capability given EnableBuilder + BuilderHosts. Hosts in the result are a
// subset of Hosts.
func (d *DesiredMesh) BuilderHostSet() map[string]bool {
set := make(map[string]bool, len(d.Hosts))
if len(d.BuilderHosts) > 0 {
allow := make(map[string]struct{}, len(d.BuilderHosts))
for _, h := range d.BuilderHosts {
allow[h] = struct{}{}
}
for _, h := range d.Hosts {
if _, ok := allow[h]; ok {
set[h] = true
}
}
return set
}
if d.EnableBuilder {
for _, h := range d.Hosts {
set[h] = true
}
}
return set
}
// HasBuilderCap reports whether host should advertise the builder capability.
func (d *DesiredMesh) HasBuilderCap(host string) bool {
return d.BuilderHostSet()[host]
}
// SortedNamespaces returns the desired namespaces in deterministic order.
func (d *DesiredMesh) SortedNamespaces() []string {
out := append([]string(nil), d.Namespaces...)
sort.Strings(out)
return out
}
-88
View File
@@ -1,88 +0,0 @@
package wireguard
import (
"reflect"
"sort"
"testing"
)
func TestBuilderHostSet_EnableBuilderAppliesToAll(t *testing.T) {
d := &DesiredMesh{
Hosts: []string{"a", "b", "c"},
EnableBuilder: true,
}
got := d.BuilderHostSet()
want := map[string]bool{"a": true, "b": true, "c": true}
if !reflect.DeepEqual(got, want) {
t.Fatalf("want %v, got %v", want, got)
}
}
func TestBuilderHostSet_ExplicitListWinsOverEnable(t *testing.T) {
d := &DesiredMesh{
Hosts: []string{"a", "b", "c"},
EnableBuilder: true,
BuilderHosts: []string{"b"},
}
got := d.BuilderHostSet()
want := map[string]bool{"b": true}
if !reflect.DeepEqual(got, want) {
t.Fatalf("want %v, got %v", want, got)
}
}
func TestBuilderHostSet_FiltersToServersOnly(t *testing.T) {
// A --builder-hosts entry not present in --servers is dropped.
d := &DesiredMesh{
Hosts: []string{"a", "b"},
BuilderHosts: []string{"a", "z"},
}
got := d.BuilderHostSet()
want := map[string]bool{"a": true}
if !reflect.DeepEqual(got, want) {
t.Fatalf("want %v, got %v", want, got)
}
}
func TestBuilderHostSet_DefaultDisabled(t *testing.T) {
d := &DesiredMesh{Hosts: []string{"a"}}
if len(d.BuilderHostSet()) != 0 {
t.Fatalf("want empty set, got %v", d.BuilderHostSet())
}
if d.HasBuilderCap("a") {
t.Fatalf("HasBuilderCap should be false by default")
}
}
func TestBuilderHostSet_EnableBuilderFalse_NoBuilderHosts(t *testing.T) {
d := &DesiredMesh{
Hosts: []string{"a"},
EnableBuilder: false,
}
if len(d.BuilderHostSet()) != 0 {
t.Fatalf("want empty set, got %v", d.BuilderHostSet())
}
}
func TestBuilderHostSet_Stable(t *testing.T) {
// Test that calling twice produces the same set (sanity — no side effects).
d := &DesiredMesh{
Hosts: []string{"a", "b"},
BuilderHosts: []string{"a"},
}
a := keys(d.BuilderHostSet())
b := keys(d.BuilderHostSet())
sort.Strings(a)
sort.Strings(b)
if !reflect.DeepEqual(a, b) {
t.Fatalf("unstable: %v vs %v", a, b)
}
}
func keys(m map[string]bool) []string {
out := make([]string, 0, len(m))
for k := range m {
out = append(out, k)
}
return out
}
-341
View File
@@ -1,341 +0,0 @@
package wireguard
import (
"encoding/binary"
"fmt"
"net"
"sort"
)
// Warning describes a non-fatal conflict discovered during IP allocation.
type Warning struct {
Host string
Reason string
}
// MachineIP returns the host address within a per-host subnet — the first
// usable IP (network address + 1). For example, 10.210.5.0/24 → 10.210.5.1.
//
// Used for the Podman bridge gateway. WireGuard does NOT use this — wg0
// gets a separate /32 from the management pool (see AllocateMgmtIPs).
func MachineIP(subnet *net.IPNet) net.IP {
return uint32ToIP(ipToUint32(subnet.IP.To4()) + 1)
}
// Allocate assigns a per-host subnet (of size hostPrefix) to every host in
// hosts, carving them from pool.
//
// Rules:
// - Duplicate host names in hosts → error (user input bug).
// - Existing subnet within pool with correct prefix → kept unchanged (stable).
// - Existing subnet outside pool or wrong prefix → warning, reassign.
// - Two existing hosts with the same subnet → first (alphabetical) kept,
// second gets a warning and is reassigned.
// - New hosts receive the lowest free subnet in pool.
//
// Returns (assignments, warnings, error).
func Allocate(
pool *net.IPNet,
hostPrefix int,
existing map[string]*net.IPNet,
hosts []string,
) (map[string]*net.IPNet, []Warning, error) {
// 1. Dedup hosts.
hostCount := make(map[string]int, len(hosts))
for _, h := range hosts {
hostCount[h]++
}
for h, n := range hostCount {
if n > 1 {
return nil, nil, fmt.Errorf("duplicate host in --servers: %s", h)
}
}
pool4 := pool.IP.To4()
if pool4 == nil {
return nil, nil, fmt.Errorf("only IPv4 pools are supported")
}
result := make(map[string]*net.IPNet, len(hosts))
usedNetworks := make(map[uint32]bool)
var warnings []Warning
subnetClaim := make(map[uint32]string)
// 2. Seed from existing — sorted for deterministic conflict resolution.
existingHosts := make([]string, 0, len(existing))
for h := range existing {
existingHosts = append(existingHosts, h)
}
sort.Strings(existingHosts)
// Pool bounds (used for both validation and iteration).
pool4Network := ipToUint32(pool4)
poolOnes, poolBits := pool.Mask.Size()
poolHostBits := poolBits - poolOnes
pool4Broadcast := pool4Network | (uint32(1)<<uint(poolHostBits) - 1)
for _, host := range existingHosts {
subnet := existing[host]
if subnet == nil {
continue
}
subnet4 := subnet.IP.To4()
ones, _ := subnet.Mask.Size()
if subnet4 == nil || !pool.Contains(subnet4) || ones != hostPrefix {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("existing subnet %s is not a /%d inside pool %s, reassigning", subnet, hostPrefix, pool),
})
continue
}
networkU32 := ipToUint32(subnet4)
// For /32 mgmt IPs, reject pool's network address (.0) and broadcast
// (.255.255) — many tools refuse them as host addresses.
if hostPrefix == 32 && (networkU32 == pool4Network || networkU32 == pool4Broadcast) {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("existing mgmt IP %s is the pool network or broadcast address, reassigning", subnet4),
})
continue
}
if claimant, exists := subnetClaim[networkU32]; exists {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("duplicate subnet %s (already claimed by %s), reassigning", subnet, claimant),
})
continue
}
subnetClaim[networkU32] = host
usedNetworks[networkU32] = true
result[host] = cloneIPNet(subnet)
}
// 3. Iterate the pool to assign new hosts.
hostSubnetSize := 32 - hostPrefix
step := uint32(1) << uint(hostSubnetSize)
nextFreeSubnet := func() (*net.IPNet, error) {
// For /32 allocations (mgmt IPs), skip both the pool network address
// (.0) and the pool broadcast address (.255.255) since many tools
// refuse them as host IPs. For larger subnets (e.g. /24), the bridge
// inside the subnet handles its own .0/.broadcast — we only need to
// not start the iterator at the broadcast itself.
start := pool4Network
end := pool4Broadcast
if hostPrefix == 32 {
start = pool4Network + 1
// end stays at broadcast; loop is u < end so broadcast is excluded.
}
for u := start; u < end; u += step {
if !usedNetworks[u] {
mask := net.CIDRMask(hostPrefix, 32)
return &net.IPNet{IP: uint32ToIP(u), Mask: mask}, nil
}
}
return nil, fmt.Errorf("pool %s is exhausted (no free /%d subnets)", pool, hostPrefix)
}
for _, host := range hosts {
if _, already := result[host]; already {
continue
}
subnet, err := nextFreeSubnet()
if err != nil {
return nil, warnings, fmt.Errorf("allocating subnet for %s: %w", host, err)
}
usedNetworks[ipToUint32(subnet.IP.To4())] = true
result[host] = subnet
}
return result, warnings, nil
}
// AllocateNamespaced assigns a per-host /<hostPrefix> subnet for every
// (namespace, host) pair in `namespaces × hosts`, carving them from a single
// shared pool. Stable: existing valid assignments are preserved so re-runs
// reproduce the same subnets. Invalid or duplicate existing assignments
// produce a warning and get reassigned to the next free block.
//
// Iteration order is deterministic (namespaces then hosts as passed in),
// which keeps warnings and subnet layout reproducible for tests.
//
// Returns nested map[namespace][host] = *net.IPNet.
func AllocateNamespaced(
pool *net.IPNet,
hostPrefix int,
existing map[string]map[string]*net.IPNet,
namespaces []string,
hosts []string,
) (map[string]map[string]*net.IPNet, []Warning, error) {
pool4 := pool.IP.To4()
if pool4 == nil {
return nil, nil, fmt.Errorf("only IPv4 pools are supported")
}
// Dedup hosts (user input bug).
hostCount := make(map[string]int, len(hosts))
for _, h := range hosts {
hostCount[h]++
}
for h, n := range hostCount {
if n > 1 {
return nil, nil, fmt.Errorf("duplicate host in --servers: %s", h)
}
}
// Dedup namespaces.
nsCount := make(map[string]int, len(namespaces))
for _, ns := range namespaces {
nsCount[ns]++
}
for ns, n := range nsCount {
if n > 1 {
return nil, nil, fmt.Errorf("duplicate namespace in --namespaces: %s", ns)
}
}
pool4Network := ipToUint32(pool4)
poolOnes, poolBits := pool.Mask.Size()
poolHostBits := poolBits - poolOnes
pool4Broadcast := pool4Network | (uint32(1)<<uint(poolHostBits) - 1)
result := make(map[string]map[string]*net.IPNet, len(namespaces))
for _, ns := range namespaces {
result[ns] = make(map[string]*net.IPNet, len(hosts))
}
usedNetworks := make(map[uint32]bool)
subnetClaim := make(map[uint32]string) // "ns/host" for conflict messages
var warnings []Warning
// 1. Seed from existing assignments in deterministic order.
nsSorted := append([]string(nil), namespaces...)
sort.Strings(nsSorted)
for _, ns := range nsSorted {
hostMap, ok := existing[ns]
if !ok {
continue
}
hostKeys := make([]string, 0, len(hostMap))
for h := range hostMap {
hostKeys = append(hostKeys, h)
}
sort.Strings(hostKeys)
for _, host := range hostKeys {
subnet := hostMap[host]
if subnet == nil {
continue
}
subnet4 := subnet.IP.To4()
ones, _ := subnet.Mask.Size()
if subnet4 == nil || !pool.Contains(subnet4) || ones != hostPrefix {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("existing subnet %s in namespace %q is not a /%d inside pool %s, reassigning", subnet, ns, hostPrefix, pool),
})
continue
}
networkU32 := ipToUint32(subnet4)
if claimant, dup := subnetClaim[networkU32]; dup {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("duplicate subnet %s in namespace %q (already claimed by %s), reassigning", subnet, ns, claimant),
})
continue
}
subnetClaim[networkU32] = ns + "/" + host
usedNetworks[networkU32] = true
result[ns][host] = cloneIPNet(subnet)
}
}
// 2. Assign remaining (ns, host) pairs in input order.
hostSubnetSize := 32 - hostPrefix
step := uint32(1) << uint(hostSubnetSize)
nextFree := func() (*net.IPNet, error) {
for u := pool4Network; u < pool4Broadcast; u += step {
if !usedNetworks[u] {
return &net.IPNet{IP: uint32ToIP(u), Mask: net.CIDRMask(hostPrefix, 32)}, nil
}
}
return nil, fmt.Errorf("pool %s is exhausted (no free /%d subnets)", pool, hostPrefix)
}
for _, ns := range namespaces {
for _, host := range hosts {
if _, ok := result[ns][host]; ok {
continue
}
subnet, err := nextFree()
if err != nil {
return nil, warnings, fmt.Errorf("allocating subnet for %s/%s: %w", ns, host, err)
}
u := ipToUint32(subnet.IP.To4())
usedNetworks[u] = true
subnetClaim[u] = ns + "/" + host
result[ns][host] = subnet
}
}
return result, warnings, nil
}
// AllocateMgmtIPs assigns a /32 management IP to every host in hosts from pool.
// Wraps Allocate by promoting/demoting between net.IP and *net.IPNet.
func AllocateMgmtIPs(
pool *net.IPNet,
existing map[string]net.IP,
hosts []string,
) (map[string]net.IP, []Warning, error) {
wrapped := make(map[string]*net.IPNet, len(existing))
for h, ip := range existing {
ip4 := ip.To4()
if ip4 == nil {
continue
}
wrapped[h] = &net.IPNet{IP: ip4, Mask: net.CIDRMask(32, 32)}
}
subnets, warns, err := Allocate(pool, 32, wrapped, hosts)
if err != nil {
return nil, warns, err
}
out := make(map[string]net.IP, len(subnets))
for h, n := range subnets {
out[h] = cloneIP(n.IP.To4())
}
return out, warns, nil
}
// ipToUint32 converts a 4-byte IP to a uint32 for arithmetic.
func ipToUint32(ip net.IP) uint32 {
return binary.BigEndian.Uint32(ip.To4())
}
// uint32ToIP converts a uint32 back to a net.IP.
func uint32ToIP(u uint32) net.IP {
ip := make(net.IP, 4)
binary.BigEndian.PutUint32(ip, u)
return ip
}
// cloneIP returns a copy of ip so that mutations don't affect the caller.
func cloneIP(ip net.IP) net.IP {
c := make(net.IP, len(ip))
copy(c, ip)
return c
}
// cloneIPNet returns a deep copy of n.
func cloneIPNet(n *net.IPNet) *net.IPNet {
return &net.IPNet{
IP: cloneIP(n.IP),
Mask: append(net.IPMask(nil), n.Mask...),
}
}
-223
View File
@@ -1,223 +0,0 @@
package wireguard
import (
"net"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func mustParseCIDR(s string) *net.IPNet {
_, n, err := net.ParseCIDR(s)
if err != nil {
panic(err)
}
return n
}
func TestMachineIP(t *testing.T) {
tests := []struct {
subnet string
want string
}{
{"10.210.0.0/24", "10.210.0.1"},
{"10.210.5.0/24", "10.210.5.1"},
{"10.210.255.0/24", "10.210.255.1"},
{"192.168.0.0/24", "192.168.0.1"},
}
for _, tt := range tests {
n := mustParseCIDR(tt.subnet)
got := MachineIP(n)
assert.Equal(t, tt.want, got.String(), "subnet=%s", tt.subnet)
}
}
func TestAllocateMgmtIPs_Basic(t *testing.T) {
pool := mustParseCIDR("100.64.0.0/16")
hosts := []string{"h1", "h2", "h3"}
got, warns, err := AllocateMgmtIPs(pool, nil, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
// Allocation skips pool network (.0.0) — starts at .0.1.
assert.Equal(t, "100.64.0.1", got["h1"].String())
assert.Equal(t, "100.64.0.2", got["h2"].String())
assert.Equal(t, "100.64.0.3", got["h3"].String())
}
func TestAllocateMgmtIPs_StableReuse(t *testing.T) {
pool := mustParseCIDR("100.64.0.0/16")
existing := map[string]net.IP{
"h1": net.ParseIP("100.64.0.42"),
}
hosts := []string{"h1", "h2"}
got, warns, err := AllocateMgmtIPs(pool, existing, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
assert.Equal(t, "100.64.0.42", got["h1"].String())
assert.Equal(t, "100.64.0.1", got["h2"].String())
}
func TestAllocateMgmtIPs_RejectsPoolNetworkAndBroadcast(t *testing.T) {
pool := mustParseCIDR("100.64.0.0/16")
existing := map[string]net.IP{
"hN": net.ParseIP("100.64.0.0"), // pool network
"hB": net.ParseIP("100.64.255.255"), // pool broadcast
}
hosts := []string{"hN", "hB"}
got, warns, err := AllocateMgmtIPs(pool, existing, hosts)
require.NoError(t, err)
assert.Len(t, warns, 2)
for _, h := range hosts {
ip := got[h].String()
assert.NotEqual(t, "100.64.0.0", ip, h)
assert.NotEqual(t, "100.64.255.255", ip, h)
}
}
func TestAllocateMgmtIPs_OutOfPool_Warns(t *testing.T) {
pool := mustParseCIDR("100.64.0.0/16")
existing := map[string]net.IP{
"h1": net.ParseIP("10.210.0.1"), // outside pool
}
hosts := []string{"h1"}
got, warns, err := AllocateMgmtIPs(pool, existing, hosts)
require.NoError(t, err)
require.Len(t, warns, 1)
assert.True(t, pool.Contains(got["h1"]), "reassigned IP must be inside pool")
}
func TestAllocate_PerHostSubnets(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
hosts := []string{"h1", "h2", "h3"}
got, warns, err := Allocate(pool, 24, nil, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
assert.Equal(t, "10.210.0.0/24", got["h1"].String())
assert.Equal(t, "10.210.1.0/24", got["h2"].String())
assert.Equal(t, "10.210.2.0/24", got["h3"].String())
}
func TestAllocate_StableReuse(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
existing := map[string]*net.IPNet{
"h1": mustParseCIDR("10.210.5.0/24"),
}
hosts := []string{"h1", "h2"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
// h1 keeps its existing subnet.
assert.Equal(t, "10.210.5.0/24", got["h1"].String())
// h2 gets the lowest free subnet (0 since 5 is taken).
assert.Equal(t, "10.210.0.0/24", got["h2"].String())
}
func TestAllocate_FillsGaps(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
existing := map[string]*net.IPNet{
"h1": mustParseCIDR("10.210.0.0/24"),
"h2": mustParseCIDR("10.210.2.0/24"),
}
hosts := []string{"h1", "h2", "h3"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
assert.Equal(t, "10.210.0.0/24", got["h1"].String())
assert.Equal(t, "10.210.2.0/24", got["h2"].String())
// Gap at .1 is filled.
assert.Equal(t, "10.210.1.0/24", got["h3"].String())
}
func TestAllocate_DuplicateSubnet_Warns(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
// Both ha and hb claim 10.210.5.0/24; ha wins (alphabetical).
existing := map[string]*net.IPNet{
"ha": mustParseCIDR("10.210.5.0/24"),
"hb": mustParseCIDR("10.210.5.0/24"),
}
hosts := []string{"ha", "hb"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
require.Len(t, warns, 1)
assert.Equal(t, "hb", warns[0].Host)
assert.Contains(t, warns[0].Reason, "duplicate subnet")
// ha keeps 10.210.5.0/24; hb is reassigned.
assert.Equal(t, "10.210.5.0/24", got["ha"].String())
assert.NotEqual(t, "10.210.5.0/24", got["hb"].String())
}
func TestAllocate_OutOfPool_Warns(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
existing := map[string]*net.IPNet{
"h1": mustParseCIDR("192.168.0.0/24"), // outside pool
}
hosts := []string{"h1"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
require.Len(t, warns, 1)
assert.Equal(t, "h1", warns[0].Host)
assert.Contains(t, warns[0].Reason, "not a /24 inside pool")
// h1 is reassigned to a pool address.
assert.True(t, pool.Contains(got["h1"].IP), "reassigned IP must be inside pool")
}
func TestAllocate_WrongPrefix_Warns(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
existing := map[string]*net.IPNet{
"h1": mustParseCIDR("10.210.0.0/16"), // wrong prefix (/16 instead of /24)
}
hosts := []string{"h1"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
require.Len(t, warns, 1)
assert.Contains(t, warns[0].Reason, "not a /24 inside pool")
ones, _ := got["h1"].Mask.Size()
assert.Equal(t, 24, ones, "reassigned subnet must have /24 prefix")
}
func TestAllocate_DuplicateHost_Errors(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
hosts := []string{"1.1.1.1", "1.1.1.1"}
_, _, err := Allocate(pool, 24, nil, hosts)
require.Error(t, err)
assert.Contains(t, err.Error(), "duplicate host")
}
func TestAllocate_PoolExhaustion(t *testing.T) {
// /28 pool with /28 subnets — only one slot.
pool := mustParseCIDR("10.0.0.0/28")
hosts := []string{"h1", "h2"}
_, _, err := Allocate(pool, 28, nil, hosts)
require.Error(t, err)
assert.Contains(t, err.Error(), "exhausted")
}
func TestAllocate_EmptyHosts(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
got, warns, err := Allocate(pool, 24, nil, nil)
require.NoError(t, err)
assert.Empty(t, warns)
assert.Empty(t, got)
}
-310
View File
@@ -51,7 +51,6 @@ All commands support `--format` flag:
Aliases are derived from the CLI command tree:
- `coolify app env` | `coolify app envs` | `coolify app environment`
- `coolify app previews` | `coolify app preview`
- `coolify app start` | `coolify app deploy`
- `coolify app storage` | `coolify app storages`
- `coolify app` | `coolify apps` | `coolify application` | `coolify applications`
@@ -852,15 +851,6 @@ Parameters:
required: false
default: 100
Command: coolify app previews delete <app_uuid> <pr_id>
Description: Delete a preview deployment for an application. First argument is the application UUID, second is the pull request ID.
Parameters:
- name: --force
type: boolean
description: Skip confirmation prompt
required: false
default: false
Command: coolify app restart <uuid>
Description: Restart a running application.
Parameters: (None)
@@ -1724,129 +1714,6 @@ Parameters:
required: false
default: 0
Command: coolify firewall
Description: [ALPHA] Manage cross-host container allow rules (Coolify v5)
Parameters:
- name: --all-namespaces
type: boolean
description: Operate across every mesh namespace on each host (list/containers fan out; allow/revoke still require a specific --namespace)
required: false
default: false
- name: --concurrency
type: integer
description: Maximum number of parallel SSH connections
required: false
default: 10
- name: --coold-port
type: integer
description: TCP port coold's REST API listens on (bound to the WG mgmt IP)
required: false
default: 8443
- name: --coold-token
type: string
description: Bearer token override for coold REST API (also reads COOLIFY_COOLD_TOKEN env). When unset, CLI reads /etc/coolify/api-token over SSH per host.
required: false
- name: --namespace
type: string
description: Namespace the command operates against (must match a namespace created by `coolify init`)
required: false
default: default
- name: --servers
type: stringSlice
description: Comma-separated server IPs (required)
required: true
- name: --ssh-key
type: string
description: Path to SSH private key used to connect to servers (required)
required: true
- name: --ssh-passphrase-prompt
type: boolean
description: Prompt for SSH key passphrase (also reads COOLIFY_SSH_PASSPHRASE env var)
required: false
default: false
- name: --ssh-port
type: integer
description: SSH port
required: false
default: 22
- name: --ssh-timeout
type: string
description: SSH connection timeout (e.g. 30s, 1m)
required: false
default: 30s
- name: --ssh-user
type: string
description: SSH username
required: false
default: root
- name: --wg-interface
type: string
description: WireGuard interface name on remote hosts (must match --wg-interface at init)
required: false
default: wg0
Command: coolify firewall allow
Description: Add an allow rule (from container → to container:port)
Parameters:
- name: --bidirectional
type: boolean
description: Also install the reverse rule on the source host (default: one-way; conntrack handles replies)
required: false
default: false
- name: --from
type: string
description: Source container (name, short-id, raw IP, or host:name) — required
required: false
- name: --port
type: integer
description: Destination port (required unless --proto is empty)
required: false
default: 0
- name: --proto
type: string
description: Protocol (tcp, udp, or empty for any)
required: false
default: tcp
- name: --to
type: string
description: Destination container (name, short-id, raw IP, or host:name) — required
required: false
Command: coolify firewall containers
Description: List containers on the Coolify mesh bridge across all servers
Parameters: (None)
Command: coolify firewall list
Description: List installed allow rules across all servers
Parameters: (None)
Command: coolify firewall revoke
Description: Remove an allow rule
Parameters:
- name: --bidirectional
type: boolean
description: Also install the reverse rule on the source host (default: one-way; conntrack handles replies)
required: false
default: false
- name: --from
type: string
description: Source container (name, short-id, raw IP, or host:name) — required
required: false
- name: --port
type: integer
description: Destination port (required unless --proto is empty)
required: false
default: 0
- name: --proto
type: string
description: Protocol (tcp, udp, or empty for any)
required: false
default: tcp
- name: --to
type: string
description: Destination container (name, short-id, raw IP, or host:name) — required
required: false
Command: coolify github branches <app_uuid> <owner/repo>
Description: List branches for a repository
Parameters: (None)
@@ -1992,183 +1859,6 @@ Parameters:
description: GitHub Webhook Secret
required: false
Command: coolify init
Description: [ALPHA] Initialize WireGuard mesh for Coolify v5
Parameters:
- name: --builder-capacity
type: integer
description: Concurrent builds accepted per host (COOLD_BUILDER_CAPACITY).
required: false
default: 2
- name: --builder-cpu-quota
type: string
description: cgroup CPU quota for each build subprocess (COOLD_BUILDER_CPU_QUOTA).
systemd CPUQuota format; "200%" = two full cores.
required: false
default: 200%
- name: --builder-hosts
type: stringSlice
description: Explicit subset of --servers to enroll with the builder capability.
Takes precedence over --enable-builder. Empty (default) means fall back to
--enable-builder for the whole cluster.
required: false
- name: --builder-memory-max
type: string
description: cgroup memory cap for each build subprocess (COOLD_BUILDER_MEMORY_MAX).
systemd MemoryMax format; e.g. "2G", "512M".
required: false
default: 2G
- name: --builder-timeout-secs
type: integer
description: Hard wall-clock timeout per build in seconds (COOLD_BUILDER_TIMEOUT_SECS).
required: false
default: 1800
- name: --central
type: string
description: SSH address of the central VM that will run the scheduler (and later Laravel).
Must be one of the --servers entries. When set, phases 4+5 install the scheduler on that host
and push a per-host JWT to every other server. Leave empty to skip scheduler setup.
required: false
- name: --concurrency
type: integer
description: Maximum number of parallel SSH connections
required: false
default: 10
- name: --container-pool
type: string
description: Shared container address pool — each (namespace, host) pair gets a /<container-prefix> from here, owned by that namespace's Podman bridge
required: false
default: 10.210.0.0/16
- name: --container-prefix
type: integer
description: Prefix length of each per-host, per-namespace container subnet
required: false
default: 24
- name: --coold-version
type: string
description: Release tag to download for coold (e.g. "nightly", "v1.2.3"). nightly always re-installs on every apply.
required: false
default: nightly
- name: --corrosion-api-port
type: integer
description: Corrosion HTTP API port (bound to 127.0.0.1)
required: false
default: 8080
- name: --corrosion-gossip-port
type: integer
description: Corrosion SWIM gossip port (bound to the wg0 mgmt IP)
required: false
default: 8787
- name: --corrosion-version
type: string
description: Release tag to download for corrosion (e.g. "nightly", "v1.2.3"). nightly always re-installs on every apply.
required: false
default: nightly
- name: --enable-builder
type: boolean
description: Cluster-wide shorthand: enable the builder capability on every host
(requires --central). Ignored when --builder-hosts is set.
required: false
default: true
- name: --namespaces
type: stringSlice
description: Comma-separated list of namespaces to create on each host. Each namespace is a separate Podman bridge network (coolify-<ns>-mesh) with its own /<container-prefix> per host
required: false
default: [default]
- name: --scheduler-version
type: string
description: Release tag to download for scheduler (e.g. "nightly", "v1.2.3").
required: false
default: nightly
- name: --servers
type: stringSlice
description: Comma-separated server IPs (required)
required: true
- name: --skip-default-deny
type: boolean
description: Skip installing the default-deny firewall scaffold. By default, both cross-host and intra-host (same bridge) container traffic is blocked; coold manages the allow list at runtime
required: false
default: false
- name: --ssh-key
type: string
description: Path to SSH private key used to connect to servers (required)
required: true
- name: --ssh-passphrase-prompt
type: boolean
description: Prompt for SSH key passphrase (also reads COOLIFY_SSH_PASSPHRASE env var)
required: false
default: false
- name: --ssh-port
type: integer
description: SSH port
required: false
default: 22
- name: --ssh-timeout
type: string
description: SSH connection timeout (e.g. 30s, 1m)
required: false
default: 30s
- name: --ssh-user
type: string
description: SSH username
required: false
default: root
- name: --wg-interface
type: string
description: WireGuard interface name on the remote hosts
required: false
default: wg0
- name: --wg-listen-port
type: integer
description: WireGuard UDP listen port
required: false
default: 51820
- name: --wg-mgmt-pool
type: string
description: WireGuard management address pool — each host gets a /32 from here, assigned to wg0
required: false
default: 100.64.0.0/16
- name: --yes (-y)
type: boolean
description: Skip the interactive alpha confirmation prompt
required: false
default: false
Command: coolify init bootstrap
Description: First-time mesh install (all actions allowed)
Parameters: (None)
Command: coolify init extend
Description: Add new hosts to an existing mesh (existing hosts stay untouched)
Parameters:
- name: --allow-replace
type: boolean
description: Unlock destructive-replace actions on existing hosts (e.g. recreating a drifted podman bridge). Off by default — drifted existing hosts are surfaced as skipped actions instead.
required: false
default: false
- name: --new-hosts
type: stringSlice
description: Comma-separated subset of --servers that is brand-new this run (required). Only these hosts receive the full first-time install; all other hosts get peer-refresh only.
required: true
Command: coolify init plan
Description: Show WireGuard mesh changes without applying them
Parameters:
- name: --intent
type: string
description: Preview filter: "bootstrap" (all actions), "extend" (treat --new-hosts as fresh, existing hosts peer-refresh only), "upgrade" (version bumps only).
required: false
default: bootstrap
Command: coolify init upgrade
Description: Bump agent binary versions (coold / corrosion / scheduler / builder) on every host
Parameters:
- name: --allow-nightly
type: boolean
description: Permit --coold-version/--corrosion-version/--scheduler-version=nightly. Off by default because nightly re-installs on every run instead of only when the pinned version changes.
required: false
default: false
Command: coolify private-key add <key_name> <private_key_or_file>
Description: Add a private key
Parameters: (None)
-283
View File
@@ -1,283 +0,0 @@
#!/usr/bin/env bash
# End-to-end sanity test for the coolify mesh + firewall stack.
#
# 1. `coolify init apply` on two servers with two namespaces (default, alpha).
# 2. Start one nginx ("web-*") on SERVER_A and one alpine client ("client-*")
# on SERVER_B inside each namespace — static --ip, --dns <bridge-gw>,
# --restart=always so they survive reboot.
# Also start client2-default on SERVER_A (same bridge as web-default) to
# test intra-host nft bridge-family deny.
# 3. Verify cross-host traffic is DROPped by default (wget times out).
# 4. Verify intra-host same-bridge traffic is DROPped by default (nft plane).
# 5. Verify nft bridge table coolify_bridge present on both hosts.
# 6. `coolify firewall allow` per namespace (cross-host + intra-host).
# 7. Verify wget succeeds in both planes.
# 8. Re-run init apply to verify nft scaffold idempotency.
#
# Usage:
# SERVERS=1.2.3.4,5.6.7.8 scripts/e2e-mesh.sh
#
# Required env:
# SERVERS — exactly two SSH-reachable IPs, comma-separated.
# First = "host A" (web-* containers).
# Second = "host B" (client-* containers).
# Optional env:
# SSH_KEY — default ~/.ssh/id_ed25519-no-pass (no passphrase)
# SSH_USER — default root
# COOLIFY_SSH_PASSPHRASE — only if SSH_KEY is passphrase-protected;
# requires `sshpass` on PATH
#
# The script assumes `--container-pool` defaults (10.210.0.0/16, /24). With two
# hosts + two namespaces the allocator hands out 10.210.{0,1,2,3}.0/24; gateway
# is always .1, container IPs below are pinned to .10.
set -euo pipefail
SSH_KEY="${SSH_KEY:-$HOME/.ssh/id_ed25519-no-pass}"
SSH_USER="${SSH_USER:-root}"
SERVERS="${SERVERS:?set SERVERS=<host-a>,<host-b>}"
IFS=',' read -r SERVER_A SERVER_B EXTRA <<<"$SERVERS"
SERVER_A="${SERVER_A// /}"
SERVER_B="${SERVER_B// /}"
if [[ -z "$SERVER_A" || -z "$SERVER_B" || -n "${EXTRA:-}" ]]; then
echo "SERVERS must contain exactly two comma-separated IPs (got: $SERVERS)" >&2
exit 1
fi
: "${COOLIFY_SSH_PASSPHRASE:=}"
export COOLIFY_SSH_PASSPHRASE
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$REPO_ROOT"
# Namespace → gateway IP on each host (matches allocator output).
GW_A_DEFAULT=10.210.0.1
GW_B_DEFAULT=10.210.1.1
GW_A_ALPHA=10.210.2.1
GW_B_ALPHA=10.210.3.1
# Container IPs (all pinned to .10 in each /24).
IP_WEB_DEFAULT=10.210.0.10 # host A, namespace default
IP_CLIENT_DEFAULT=10.210.1.10 # host B, namespace default
IP_WEB_ALPHA=10.210.2.10 # host A, namespace alpha
IP_CLIENT_ALPHA=10.210.3.10 # host B, namespace alpha
# Intra-host client on same bridge as web-default (host A, namespace default).
IP_CLIENT2_DEFAULT=10.210.0.11 # host A, namespace default
NGINX_IMAGE=docker.io/library/nginx:alpine
ALPINE_IMAGE=docker.io/library/alpine
SSH_OPTS=(-i "$SSH_KEY" -o StrictHostKeyChecking=accept-new -o ConnectTimeout=10 -o BatchMode=yes)
say() { printf '\n\033[1;36m==> %s\033[0m\n' "$*"; }
warn() { printf '\033[1;33m%s\033[0m\n' "$*" >&2; }
fail() { printf '\033[1;31m%s\033[0m\n' "$*" >&2; exit 1; }
# Use sshpass if passphrase was supplied; otherwise lean on ssh-agent / keyless.
ssh_exec() {
local host="$1"; shift
if [[ -n "$COOLIFY_SSH_PASSPHRASE" ]]; then
SSHPASS="$COOLIFY_SSH_PASSPHRASE" sshpass -P "passphrase" -e \
ssh "${SSH_OPTS[@]}" "$SSH_USER@$host" "$@"
else
ssh "${SSH_OPTS[@]}" "$SSH_USER@$host" "$@"
fi
}
cli() {
if [[ -n "$COOLIFY_SSH_PASSPHRASE" ]]; then
go run ./coolify "$@" --ssh-key "$SSH_KEY" --ssh-user "$SSH_USER"
else
go run ./coolify "$@" --ssh-key "$SSH_KEY" --ssh-user "$SSH_USER"
fi
}
# assert_blocked <host> <container> <target-ip-or-hostname>
assert_blocked() {
local host="$1" client="$2" target="$3"
if ssh_exec "$host" "podman exec $client wget -T 4 -qO- http://$target" >/dev/null 2>&1; then
fail "expected timeout for $client@$host$target but request succeeded"
fi
printf ' blocked: %s@%s → %s ✓\n' "$client" "$host" "$target"
}
# assert_flows <host> <container> <target-ip-or-hostname>
assert_flows() {
local host="$1" client="$2" target="$3"
if ! ssh_exec "$host" "podman exec $client wget -T 5 -qO- http://$target" | grep -q 'nginx'; then
fail "$client@$host$target failed to reach nginx"
fi
printf ' OK: %s@%s → %s ✓\n' "$client" "$host" "$target"
}
# ─── 1. init apply ────────────────────────────────────────────────────────────
say "1/8 coolify init apply on $SERVERS (namespaces: default, alpha)"
cli init apply \
--servers "$SERVERS" \
--namespaces default,alpha \
--yes
# ─── 2. containers ────────────────────────────────────────────────────────────
say "2/8 creating containers with --ip / --dns / --restart=always"
run_container() {
local host="$1" name="$2" network="$3" ip="$4" gw="$5" image="$6"; shift 6
ssh_exec "$host" "podman rm -f $name >/dev/null 2>&1 || true"
ssh_exec "$host" "podman run -d --name $name \
--network $network --ip $ip --dns $gw --restart=always \
$image $*"
}
# host A: nginx servers
run_container "$SERVER_A" web-default coolify-default-mesh "$IP_WEB_DEFAULT" "$GW_A_DEFAULT" "$NGINX_IMAGE"
run_container "$SERVER_A" web-alpha coolify-alpha-mesh "$IP_WEB_ALPHA" "$GW_A_ALPHA" "$NGINX_IMAGE"
# host B: alpine clients (sleep forever so we can exec into them)
run_container "$SERVER_B" client-default coolify-default-mesh "$IP_CLIENT_DEFAULT" "$GW_B_DEFAULT" "$ALPINE_IMAGE" sleep infinity
run_container "$SERVER_B" client-alpha coolify-alpha-mesh "$IP_CLIENT_ALPHA" "$GW_B_ALPHA" "$ALPINE_IMAGE" sleep infinity
# host A: 2nd client on same bridge as web-default — tests intra-host nft plane
run_container "$SERVER_A" client2-default coolify-default-mesh "$IP_CLIENT2_DEFAULT" "$GW_A_DEFAULT" "$ALPINE_IMAGE" sleep infinity
# ─── 3. cross-host default-deny ───────────────────────────────────────────────
say "3/8 confirming default-deny blocks cross-host traffic (expect timeouts)"
assert_blocked "$SERVER_B" client-default web-default.default.coolify.internal
assert_blocked "$SERVER_B" client-alpha web-alpha.alpha.coolify.internal
# ─── 4. intra-host same-bridge default-deny (nft bridge plane) ────────────────
say "4/8 confirming intra-host same-bridge traffic blocked (nft bridge plane)"
# Raw IP intentional — DNS via bridge gateway also crosses the nft bridge hook;
# using raw IP isolates the firewall check from DNS-path correctness.
assert_blocked "$SERVER_A" client2-default "$IP_WEB_DEFAULT"
# ─── 5. nft table present on both hosts ───────────────────────────────────────
say "5/8 verifying nft bridge table coolify_bridge present on both hosts"
for host in "$SERVER_A" "$SERVER_B"; do
ssh_exec "$host" "nft list table bridge coolify_bridge" >/dev/null \
|| fail "nft table coolify_bridge missing on $host"
printf ' present: %s ✓\n' "$host"
done
# ─── 6. allow rules ───────────────────────────────────────────────────────────
say "6/8 adding allow rules (cross-host + intra-host)"
cli firewall allow \
--servers "$SERVERS" \
--namespace default \
--from client-default --to web-default --port 80
cli firewall allow \
--servers "$SERVERS" \
--namespace alpha \
--from client-alpha --to web-alpha --port 80
# Intra-host allow: client2-default → web-default on host A.
# Rule lands on host A (destination-host ownership); passing both servers is
# idempotent on the non-owner side.
cli firewall allow \
--servers "$SERVERS" \
--namespace default \
--from client2-default --to web-default --port 80
# ─── 7. verify flow ───────────────────────────────────────────────────────────
say "7/8 verifying HTTP flows in both planes"
# Cross-host (iptables FORWARD plane)
assert_flows "$SERVER_B" client-default web-default.default.coolify.internal
assert_flows "$SERVER_B" client-alpha web-alpha.alpha.coolify.internal
# Intra-host (nft bridge plane) — raw IP, same rationale as step 4
assert_flows "$SERVER_A" client2-default "$IP_WEB_DEFAULT"
# ─── 8. re-apply idempotency ──────────────────────────────────────────────────
say "8/10 re-running init apply — verifies nft scaffold idempotency (chain already exists regression)"
cli init apply \
--servers "$SERVERS" \
--namespaces default,alpha \
--yes
# ─── 9. builder smoke test (static build) ─────────────────────────────────────
# Requires --central to have been passed to init apply. The script above does
# not pass --central, so builder capability may be disabled — gate on a marker
# file or just skip when /etc/coolify/jwt.priv is absent.
if ssh_exec "$SERVER_A" "test -f /etc/coolify/jwt.priv" >/dev/null 2>&1; then
say "9/10 builder smoke test — POST /v1/build/dispatch, expect localhost image on central"
# Scheduler UDS; central runs scheduler as root so the default 0600 socket is
# reachable for ssh-exec'd curl without group setup.
SCHEDULER_SOCK="/run/coolify/scheduler.sock"
UDS_CURL="curl -sS --unix-socket $SCHEDULER_SOCK"
REQ_ID="e2e-$(date +%s)"
BUILD_PAYLOAD="{\"request_id\":\"$REQ_ID\",\"command\":{\"type\":\"static_build\",\"repo_url\":\"https://github.com/coollabsio/static-test-site\",\"git_ref\":\"main\",\"target_image\":\"localhost/e2e-$REQ_ID\"}}"
ACK=$(ssh_exec "$SERVER_A" "$UDS_CURL -w '\\n%{http_code}' -X POST -H 'Content-Type: application/json' --data '$BUILD_PAYLOAD' http://localhost/v1/build/dispatch")
echo "$ACK" | tail -n1 | grep -q '^202$' || fail "dispatch did not return 202: $ACK"
DEADLINE=$(($(date +%s)+180))
RESP=""
while :; do
OUT=$(ssh_exec "$SERVER_A" "$UDS_CURL -w '\\n%{http_code}' 'http://localhost/v1/build/result/$REQ_ID?timeout_ms=25000'")
CODE=$(echo "$OUT" | tail -n1)
RESP=$(echo "$OUT" | sed '$d')
[[ "$CODE" == "200" ]] && break
[[ "$CODE" != "408" && "$CODE" != "404" ]] && fail "build result unexpected $CODE: $RESP"
[[ $(date +%s) -ge $DEADLINE ]] && fail "builder smoke timed out after 180s"
done
echo "$RESP" | grep -q '"status":"ok"' || fail "builder smoke returned error: $RESP"
IMG_HOST=""
for host in "$SERVER_A" "$SERVER_B"; do
if ssh_exec "$host" "buildah images 2>/dev/null | grep -q localhost/e2e-$REQ_ID"; then
IMG_HOST="$host"; break
fi
done
[[ -n "$IMG_HOST" ]] || fail "image localhost/e2e-$REQ_ID not found on any host"
printf ' OK: build succeeded; image on %s ✓\n' "$IMG_HOST"
# ─── 10. cancel test ────────────────────────────────────────────────────────
say "10/10 cancel test — dispatch then POST /v1/build/:id/cancel; expect scope killed and cancel response"
CAN_ID="e2e-cancel-$(date +%s)"
CAN_BUILD="{\"request_id\":\"$CAN_ID\",\"command\":{\"type\":\"static_build\",\"repo_url\":\"https://github.com/torvalds/linux\",\"git_ref\":\"master\",\"target_image\":\"localhost/$CAN_ID\"}}"
ACK=$(ssh_exec "$SERVER_A" "$UDS_CURL -w '\\n%{http_code}' -X POST -H 'Content-Type: application/json' --data '$CAN_BUILD' http://localhost/v1/build/dispatch")
echo "$ACK" | tail -n1 | grep -q '^202$' || fail "cancel-test dispatch did not return 202: $ACK"
SCOPE_HOST=""
for _ in 1 2 3 4 5 6 7 8 9 10; do
sleep 2
for host in "$SERVER_A" "$SERVER_B"; do
if ssh_exec "$host" "systemctl list-units --no-legend --plain 'coolify-build-*.service' 2>/dev/null | grep -q $CAN_ID"; then
SCOPE_HOST="$host"; break 2
fi
done
done
[[ -n "$SCOPE_HOST" ]] || fail "scope coolify-build-$CAN_ID.service never appeared"
printf ' scope running on %s ✓\n' "$SCOPE_HOST"
ssh_exec "$SERVER_A" "$UDS_CURL -X POST http://localhost/v1/build/$CAN_ID/cancel" >/dev/null
DEADLINE=$(($(date +%s)+30))
RESP=""
while :; do
OUT=$(ssh_exec "$SERVER_A" "$UDS_CURL -w '\\n%{http_code}' 'http://localhost/v1/build/result/$CAN_ID?timeout_ms=10000'")
CODE=$(echo "$OUT" | tail -n1)
RESP=$(echo "$OUT" | sed '$d')
[[ "$CODE" == "200" ]] && break
[[ "$CODE" != "408" && "$CODE" != "404" ]] && fail "cancel result unexpected $CODE: $RESP"
[[ $(date +%s) -ge $DEADLINE ]] && fail "cancel response timed out"
done
echo "$RESP" | grep -q '"stage":"cancel"' || fail "expected stage=cancel in response, got: $RESP"
if ssh_exec "$SCOPE_HOST" "systemctl is-active coolify-build-$CAN_ID.service >/dev/null 2>&1"; then
fail "scope still active after cancel: coolify-build-$CAN_ID.service"
fi
printf ' OK: cancel SIGTERM killed cgroup; stage=cancel ✓\n'
else
warn "skipping steps 9/10 (builder smoke + cancel): --central was not passed to init apply, so builder capability is not enabled"
fi
say "all checks passed"
+2 -2
View File
@@ -6,7 +6,7 @@
set -e # Exit on error
# Configuration
REPO="IranAccess/coolify-cli/"
REPO="coollabsio/coolify-cli"
BINARY_NAME="coolify"
GLOBAL_INSTALL_DIR="/usr/local/bin"
USER_INSTALL_DIR="$HOME/.local/bin"
@@ -125,7 +125,7 @@ detect_platform() {
get_latest_version() {
echo "Fetching latest release version..." >&2
local latest_version
latest_version=$(curl -sSf "https://api.gitamin.ir/repos/${REPO}/releases/latest" | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
latest_version=$(curl -sSf "https://api.github.com/repos/${REPO}/releases/latest" | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
if [ -z "$latest_version" ]; then
error_exit "Failed to fetch latest release version from GitHub"
-51
View File
@@ -1,51 +0,0 @@
[Unit]
Description=Coolify mesh firewall rules
After=wg-quick@wg0.service network-online.target
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/sh -c "/usr/sbin/iptables -t nat -C POSTROUTING -s 10.210.0.0/24 -o wg0 -j RETURN 2>/dev/null || /usr/sbin/iptables -t nat -I POSTROUTING -s 10.210.0.0/24 -o wg0 -j RETURN"
ExecStart=/bin/sh -c "/usr/sbin/iptables -t nat -C POSTROUTING -s 10.220.0.0/24 -o wg0 -j RETURN 2>/dev/null || /usr/sbin/iptables -t nat -I POSTROUTING -s 10.220.0.0/24 -o wg0 -j RETURN"
# Remove blanket ACCEPT from prior mode-A run.
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -s 10.210.0.0/24 -j ACCEPT 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -d 10.210.0.0/24 -j ACCEPT 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -s 10.220.0.0/24 -j ACCEPT 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -d 10.220.0.0/24 -j ACCEPT 2>/dev/null || true"
# Create chains (idempotent).
ExecStart=/bin/sh -c "/usr/sbin/iptables -N COOLIFY-ALLOW 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -N COOLIFY-INTRA 2>/dev/null || true"
# Flush COOLIFY-INTRA so order is deterministic on every restart.
ExecStart=/usr/sbin/iptables -F COOLIFY-INTRA
ExecStart=/usr/sbin/iptables -A COOLIFY-INTRA -j COOLIFY-ALLOW
ExecStart=/usr/sbin/iptables -A COOLIFY-INTRA -j DROP
# Repopulate COOLIFY-ALLOW from coold's canonical snapshot. File is rewritten
# by coold on every rule mutate, so it is the source of truth across reboots
# and service restarts. Flush first because 'iptables-restore --noflush'
# leaves existing chain contents in place and would otherwise duplicate every
# rule on re-run.
ExecStart=/bin/sh -c "[ -s /etc/coolify/allow.rules ] && /usr/sbin/iptables -F COOLIFY-ALLOW && /usr/sbin/iptables-restore --noflush < /etc/coolify/allow.rules || true"
# Conntrack early-accept at top of FORWARD (idempotent).
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT 2>/dev/null || /usr/sbin/iptables -I FORWARD 1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT"
# Top-level FORWARD jumps for every namespace's subnet (both directions).
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -d 10.210.0.0/24 -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -d 10.210.0.0/24 -j COOLIFY-INTRA"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -s 10.210.0.0/24 -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -s 10.210.0.0/24 -j COOLIFY-INTRA"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -d 10.220.0.0/24 -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -d 10.220.0.0/24 -j COOLIFY-INTRA"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -s 10.220.0.0/24 -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -s 10.220.0.0/24 -j COOLIFY-INTRA"
# Bridge-family nft scaffold — intra-namespace default-deny.
ExecStart=/bin/sh -c "nft list table bridge coolify_bridge >/dev/null 2>&1 || nft add table bridge coolify_bridge"
ExecStart=/bin/sh -c "nft add chain bridge coolify_bridge coolify_allow '{ }' 2>/dev/null || true"
ExecStart=/bin/sh -c "nft delete chain bridge coolify_bridge forward 2>/dev/null || true"
ExecStart=/bin/sh -c "nft delete chain bridge coolify_bridge coolify_intra 2>/dev/null || true"
ExecStart=/bin/sh -c "nft -f /etc/coolify/bridge-fw.nft"
ExecStart=/bin/sh -c "[ -s /etc/coolify/allow.nft ] && nft -f /etc/coolify/allow.nft || true"
[Install]
WantedBy=multi-user.target
-11
View File
@@ -1,11 +0,0 @@
[Interface]
Address = 100.64.0.1/32
ListenPort = 51820
PrivateKey = aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789abcde=
[Peer]
# 203.0.113.11
PublicKey = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBK=
AllowedIPs = 100.64.0.2/32, 10.210.1.0/24
Endpoint = 203.0.113.11:51820
PersistentKeepalive = 25
View File
-2
View File
@@ -1,2 +0,0 @@
aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789abcde= AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAK= 51820 off
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBK= (none) 203.0.113.11:51820 10.8.0.2/32 1700000000 92 180 25