Compare commits

...

51 Commits

Author SHA1 Message Date
smx 35a2015aa3 replace
gitamin
IranAccess
2026-05-10 16:27:55 +00:00
smx 678099af8d replace gitamin.ir 2026-05-10 16:23:43 +00:00
Andras Bacsai 1bac524008 test: replace assert.Error with require.Error across test files
Follows same pattern as da3479c. require.Error halts test immediately
on failure, preventing nil dereference in subsequent assert.Contains calls.
2026-05-02 20:21:31 +02:00
Andras Bacsai da3479c65a test: replace assert.NoError with require.NoError across test files
Aligns remaining test files with the pattern established in 6e80c95.
Using require halts the test immediately on fatal errors instead of
continuing with invalid state.
2026-05-02 18:59:38 +02:00
Andras Bacsai 6e80c95183 test(firewall): use require.NoError to halt on fatal errors 2026-05-02 18:17:28 +02:00
Andras Bacsai a896d5f991 chore(lint): add revive config and fix exhaustive switch defaults
Add revive var-naming rule with skipPackageNameChecks to suppress
package-name lint violations. Add explicit default cases to switch
statements in wireguard/intent.go for exhaustiveness. Upgrade
assert.NoError to require.NoError in firewall tests to halt on error.
2026-05-02 18:14:04 +02:00
Andras Bacsai c6445f9c80 docs(init): update llms-full.txt for intent-scoped subcommands
Reflect bootstrap/extend/upgrade split (replacing apply) and new
--intent flag on plan. Fix trailing-whitespace alignment in intent
tests.
2026-05-02 18:08:31 +02:00
Andras Bacsai d3b6ebffd9 refactor(init): replace apply with intent-scoped bootstrap/extend/upgrade
Split the monolithic `apply` subcommand into three purpose-built commands:
- `bootstrap`: first-time mesh install, keeps interactive alpha gate
- `extend`: adds new hosts to an existing mesh, peer-refresh only on existing hosts
- `upgrade`: bumps agent binaries across fleet, leaves mesh config untouched

Intent filtering lives in `internal/wireguard/intent.go` (ValidateIntent +
filterByIntent). Suppressed actions surface on plan.Skipped so operators see
what would have fired and why.

Also renames broker → scheduler (service + tests) to match its actual role.
2026-04-30 19:57:50 +02:00
Andras Bacsai 483fa075f7 refactor(broker): replace Redis with HTTP-over-UDS transport
Drop Redis as a broker dependency. Broker now exposes an HTTP listener
on a Unix domain socket at /run/coolify/broker.sock instead of reading
from Redis streams.

- Remove RedisInstallCommand and redis.go entirely
- Remove ActionInstallRedis from plan and apply phases
- Drop redisURL param from BrokerServiceUnit; add BrokerUnixSocketPath
  constant; systemd unit gains RuntimeDirectory=coolify (creates socket dir)
- e2e smoke tests switch from redis-cli XADD/LPOP to curl --unix-socket
  against /v1/build/dispatch, /v1/build/result/:id, /v1/build/:id/cancel
2026-04-22 21:10:44 +02:00
Andras Bacsai 92a45c6b0d test(e2e): move live builder tests to coold repo
These tests exercise coold/broker/builder internals over Redis+SSH and
don't touch any coolify-cli code. Moving them to the coold workspace
keeps the test code next to the binaries it validates; coolify-cli's
responsibility stays on provisioning.

Replacement lives under coold/e2e-tests/ as a Rust integration test
crate gated by #[ignore] so default cargo test skips it.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 13:42:30 +02:00
Andras Bacsai dea323aa5e test(e2e): Go-native live builder test suite against real servers
Replaces the ad-hoc smoke additions in scripts/e2e-mesh.sh with a
proper Go test harness under ./test/e2e, gated behind the `e2e` build
tag so normal `go test ./...` skips it.

Covers the routing and survival guarantees exercised by hand so far:

  TestPinToBuilderHost                 - pinned dispatch to a
                                         builder-capable host
  TestPinToCooldOnlyHostReturns503     - cap-missing negative
  TestUnknownHostIdReturns503          - unknown host negative
  TestLoadBalancePicksBuilderHost      - host_id=none picks the
                                         builder-capable host
  TestBuildCancelEmitsStageCancel      - cancel via build:cmd delivers
                                         code=499 stage=cancel
  TestCooldRestartAdoptsInFlightBuild  - systemctl restart coold
                                         mid-build; unit survives;
                                         new coold adopts; cancel
                                         flows through the adopted
                                         stream; workdir cleaned

Tests drive Redis via ssh + redis-cli on the central host and assert
on-host state via `buildah images` and `systemctl is-active`. No
broker/coold code is imported — the harness exercises the black-box
contract.

Run:

  BUILDER_HOST=... COOLD_ONLY_HOST=... \
  BUILDER_MGMT=100.64.0.1 COOLD_ONLY_MGMT=100.64.0.2 \
  CENTRAL_HOST=... SSH_KEY=... \
  go test -tags e2e -v -timeout 15m ./test/e2e/...

Live run: 6 tests pass in ~32s against 78.47.80.33 + 159.69.186.231.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 12:49:40 +02:00
Andras Bacsai c71f5ef491 Merge pull request #72 from coollabsio/coolify-init-wireguard-mesh
feat(init,firewall): add WireGuard mesh bootstrap and coold firewall client (alpha, v5)
2026-04-22 12:29:51 +02:00
Andras Bacsai eb854da7c8 feat(init): per-host builder enrollment via --builder-hosts
Replace the cluster-wide --enable-builder bool with a per-host subset
controlled by --builder-hosts=<ip>,<ip>. Semantics:
  * --builder-hosts empty + --enable-builder=true: every host in
    --servers gets the builder capability (previous behavior)
  * --builder-hosts non-empty: only listed hosts get the capability;
    --enable-builder is ignored
  * --builder-hosts entries not in --servers are dropped

DesiredMesh.BuilderHostSet() + HasBuilderCap(host) compute the final
set and are used by:
  * phase 3 (install-builder): only on builder-capable hosts
  * phase 5 (JWT caps + coold BuilderConfig): per-host caps claim,
    COOLD_BUILDER_* env only when enabled
  * plan.go (ActionInstallBuilder): planned only for enrolled hosts

Adds 6 unit tests for BuilderHostSet covering empty/all/subset cases
and regenerates llms-full.txt for the new flag.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 11:44:52 +02:00
Andras Bacsai 286917cd95 feat(init): pass mgmt + container pools to coold as builder deny nets
When --enable-builder is set, populate BuilderConfig.DenyNets with
the mesh management pool (default 100.64.0.0/16) and the container
pool (default 10.210.0.0/16). coold emits these as
COOLD_BUILDER_DENY_NETS, which the builder adapter expands into
systemd IPAddressDeny entries for every build subprocess.

This keeps the policy in sync with the operator's actual --wg-mgmt-pool
and --container-pool choices without hard-coding RFC1918 defaults.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 10:56:35 +02:00
Andras Bacsai 93e3e626e3 feat(init): collapse builder into coold capability, drop phase 6
Mirrors the coold-side refactor that merges builder traffic onto coold's
gRPC stream. The provisioner no longer installs a separate builder
systemd unit, mints a builder JWT, or exposes a second broker listener:

- --install-builder → --enable-builder (capability toggle, not a daemon
  install). --builder-version removed; the builder binary tracks the
  coold release.
- Phase 6 (builder service + builder JWT) deleted. Phase 5 now mints
  the host JWT with a `caps` claim ("coold" always; "builder" when
  enabled) and rewrites the coold unit with COOLD_BUILDER_* env.
- Phase 3 picks up a single extra step when EnableBuilder is true:
  install buildah/git and drop the builder binary at
  /usr/local/bin/builder (short-lived subprocess, no unit file).
- internal/services: BrokerServiceUnit drops the builder bind arg;
  CooldServiceUnit gains an optional *BuilderConfig; builder.go keeps
  only install + workdir constants; jwt.go has a single MintHostJWT.
- e2e-mesh.sh adds steps 9+10 — push build:cmd through Redis and
  assert the resulting image, then dispatch and cancel a slow build
  and assert the scope is killed with stage=cancel.
- llms-full.txt regenerated to reflect the flag rename.

Breaking: pairs with the coold commit that deletes :6444 and
builder.proto. Deploy in lockstep.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-22 09:58:24 +02:00
Andras Bacsai bfb07a5f04 test: strengthen assertions and drop redundant nil check
Switch error-path assertions in resolve_test.go from assert to require
so test stops immediately on unexpected error/success. Remove nil check
in config_test.go — cobra never returns nil from NewConfigCommand.
2026-04-21 22:27:34 +02:00
Andras Bacsai c9b6df3171 refactor(firewall): rename FirewallFlags→Flags, drop error from discoverNamespacesOnHosts
Rename `FirewallFlags` to `Flags` and `bindFirewallFlags` to `bindFlags`
within the firewall package — the `Firewall` prefix is redundant inside
the `firewall` package.

Drop the unused error return from `discoverNamespacesOnHosts`; the
function accumulates per-host errors into `ServerResult` slices and has
no package-level error path, so the third return value was always nil.

Also switches test assertions from `assert.Error/NoError` to
`require.Error/NoError` where the test cannot continue meaningfully on
failure, and adds broker service tests.
2026-04-21 22:01:58 +02:00
Andras Bacsai 346320504c style: align struct literals and promote deps to direct
Promote golang-jwt/jwt/v5, mattn/go-isatty, golang.org/x/crypto, and
golang.org/x/term from indirect to direct dependencies in go.mod.

Fix data races in firewall test fakes by guarding calls slice with sync.Mutex.

Reformat struct literals and map literals across cmd, internal/wireguard,
and internal/firewall for consistent column alignment.
2026-04-21 21:24:21 +02:00
Andras Bacsai 8341802c88 fix(init): wire central host through phase 5 too
Phase 5 was filtering central out via hostsExcluding(), leaving the
coold instance on the central VM without broker env vars and without a
host-jwt. That breaks single-server deploys (only one host, which is
also central) and leaves central's own coold as a standalone API-only
process in fleet mode.

Run phase 5 on desired.Hosts directly so central also receives a JWT
and gets COOLD_BROKER_URL/COOLD_HOST_JWT_PATH injected. Drop
hostsExcluding() since it has no other callers.

Verified end-to-end on a single-server bed: `coolify init apply
--servers X --central X` now produces a working broker <-> coold dispatch
path on the same box.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 19:30:32 +02:00
Andras Bacsai 298bd28cd1 feat(init): add --central flag for broker + Redis setup on a host
Adds phases 4 + 5 to `coolify init apply` for bootstrapping the v5 central
transport plane without Laravel:

- Phase 4 (central-only): apt-install Redis, download coolify-broker from
  GitHub releases, generate an EC P-256 JWT keypair under /etc/coolify/, and
  enable coolify-broker.service bound to the wg0 mgmt IP:6443.
- Phase 5 (per non-central host): read jwt.priv from central, mint a 1-year
  ES256 JWT (sub = host wg0 IP), write it to /etc/coolify/host-jwt, rewrite
  coold.service with COOLD_BROKER_URL + COOLD_HOST_JWT_PATH, restart coold.

New service generators under internal/services:
- broker.go — unit, install command, JWT keypair setup
- redis.go  — apt install + enable
- jwt.go    — golang-jwt/jwt/v5 ES256 minting

coold.go gains CooldServiceUnitWithBroker + BrokerConfig so the unit can
carry broker env vars on non-central hosts. DesiredMesh gains CentralHost +
BrokerVersion; empty CentralHost skips phases 4+5 (existing behavior).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
2026-04-21 17:35:47 +02:00
Andras Bacsai 0380aac05b fix(firewall): key bridge dispatch on ip saddr/daddr, not iifname
Bridge interface names (e.g. "coolify-default-mesh") exceed Linux
IFNAMSIZ=16, so iifname/oifname matching silently fails at the kernel
level. Switch renderBridgeScaffold to accept []*net.IPNet and emit
`ip saddr`/`ip daddr` set rules instead.

Also fix nft chain-declaration order: coolify_intra must be declared
before the forward chain's jump rules reference it, as nft validates
jump targets at add-rule time.

Add `mkdir -p /etc/coolify` before bridge scaffold write so `cat >.tmp`
doesn't ENOENT on fresh hosts where coold hasn't run yet.
2026-04-21 13:27:43 +02:00
Andras Bacsai 8d30292fb6 test(e2e): extend mesh e2e script with intra-host nft bridge checks
Add 3 new test steps that exercise the nft bridge-family deny plane
introduced alongside the iptables FORWARD scaffold:

- Step 4: assert intra-host same-bridge traffic is blocked by default
  (client2-default on host A cannot reach web-default on same bridge).
  Uses raw IP to isolate from DNS-path — DNS to bridge gateway also
  crosses the nft hook.
- Step 5: assert `nft list table bridge coolify_bridge` succeeds on
  both hosts after init apply.
- Step 7: assert intra-host flow opens after coold dual-write (both
  iptables and nft coolify_allow planes receive the rule).
- Step 8: re-run init apply and assert exit 0 — catches
  "chain already exists" regression from non-idempotent nft scaffold.

Also add intra-host allow rule in step 6 (client2-default → web-default
:80 via coold REST), and spawn client2-default on host A in step 2.

Refactor: parameterize assert_blocked/assert_flows with host as first
arg (was hardcoded to SERVER_B); update all existing callsites.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:59:22 +02:00
Andras Bacsai 0980f1e363 test(wireguard): add nft bridge scaffold tests and golden fixture
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
2026-04-21 12:49:26 +02:00
Andras Bacsai 3a500014b2 feat(init): add nft/bridge-table probes and nft precondition check
Probe 10a/10b detect nft binary availability and the coolify_bridge nft
table on each host. DefaultDenyActive is now gated on BridgeTableExists
so a host with only the iptables chain (no bridge table) triggers
reinstall. BuildPlan validates NftAvailable per host before computing
actions, surfacing a clear error instead of a silent nft shell failure.
Update --skip-default-deny help text to reflect the full scope of the
default-deny scaffold (cross-host and intra-host).
2026-04-21 12:42:52 +02:00
Andras Bacsai 6dc6e0770a fix(firewall): fix nft idempotency, move table delete, use constants, pre-alloc slice
- Pre-delete forward + coolify_intra chains before nft -f to prevent
  "chain already exists" error on second apply (Fix 1)
- Move nft delete table before blanket iptables ACCEPT in mode-A to
  close the window where bridge traffic could be dropped (Fix 2)
- Replace hardcoded nft path/table strings with BridgeTableName,
  BridgeScaffoldPath, BridgeAllowRulesPath constants (Fix 3)
- Pre-allocate ifNames slice with make([]string, 0, len(sorted)) (Fix 4)
2026-04-21 12:40:50 +02:00
Andras Bacsai f70d779d0b feat(firewall): add nft bridge-family scaffold for intra-namespace default-deny
- Add BridgeTableName, BridgeAllowRulesPath, BridgeScaffoldPath consts
- Add namespaces []string param to FirewallServiceUnit and InstallFirewallCommand
- Emit nft bridge table/chain scaffold in default-deny mode; tear it down in permissive mode
- Write /etc/coolify/bridge-fw.nft atomically on apply (delete it in permissive mode)
- Add BridgeTableExists and NftAvailable fields to ServerState
- Order coold after coolify-mesh-fw.service so the bridge scaffold is in place before coold starts
2026-04-21 12:37:09 +02:00
Andras Bacsai 901097e541 feat(init): replace binary upload with GitHub release download
Switch coold/corrosion installation from uploading local binaries via
SSH to downloading from GitHub releases on each remote host.

- Remove --coold-binary / --corrosion-binary flags and elfcheck
- Add --coold-version / --corrosion-version flags (default: nightly)
- Add CooldInstallCommand / CorrosionInstallCommand with arch detection
- nightly tag always re-downloads; pinned tags skip if already installed
- Drop FileSha256 pre-flight checks (no longer needed)
- Add tests for version substitution and arch detection in install cmds
2026-04-21 12:05:51 +02:00
Andras Bacsai ef8e740476 feat(mesh): add multi-namespace support to WireGuard overlay
Introduce per-namespace Podman bridges (`coolify-<ns>-mesh`) and
subnet allocation so a single mesh cluster can carry multiple isolated
container networks carved from a shared `--container-pool`.

- Add `cmd/common/meshnet.go`: shared `MeshNetFlags`, `PodmanNetworkFor`,
  `ValidateNamespace`, and flag-binding helpers used by both `init` and
  `firewall` sub-commands.
- Replace flat `PodmanNetworkName` field on `FirewallFlags` with
  `Namespace` + `AllNamespaces`; `--all-namespaces` fans out discovery
  across every `io.coolify.managed=true` bridge on each host.
- Thread `Namespace` into `AllowRule`, `ComputeID`, and coold REST
  payloads so rules are scoped per namespace.
- Extend WireGuard planner (`internal/wireguard/plan.go`,
  `subnet.go`) to allocate one deterministically-ordered subnet per
  host per namespace; `AllowedIPs` now lists every peer's namespace
  subnets, keeping `wg0.conf` stable across re-runs.
- Pass `COOLD_NAMESPACES=<ns>:<network>:<gw>,...` env to coold so it
  can bind DNS and track rules per namespace.
- Add `scripts/e2e-mesh.sh` for end-to-end multi-namespace smoke test.
- Update CLAUDE.md architecture docs to reflect namespace layout.
2026-04-21 09:15:49 +02:00
Andras Bacsai 95250d32a0 feat(firewall): add --wg-interface flag and thread iface through coold client
Add WGInterface field to FirewallFlags with --wg-interface flag (default
from DefaultWGInterface). Thread iface parameter through CooldApply,
CooldRevoke, CooldList, and CooldListAll so the WireGuard interface name
is configurable instead of hardcoded to wg0.

Also replace hardcoded "coolify-mesh" strings with PodmanNetworkName
where applicable.
2026-04-20 21:14:36 +02:00
Andras Bacsai 7c89c3a6c8 refactor(init): make podman/coold/firewall unconditional, add --skip-default-deny
Remove --podman, --default-deny, --install-coold flags. Podman stack,
coold/corrosion agents, and default-deny iptables scaffold now always
install. --skip-default-deny opts out of firewall scaffold for testing.
2026-04-20 20:51:51 +02:00
Andras Bacsai e5e33b46ae refactor(firewall): replace SSH/iptables with coold REST client
Drop direct iptables manipulation over SSH. Firewall allow/revoke/list
now POST/DELETE/GET against coold's REST API via SSH-bounced curl.

- Add internal/firewall/coold_client.go with CooldApply, CooldRevoke,
  CooldList and per-host bearer-token resolution (reads
  /etc/coolify/api-token over SSH when no override given)
- Delete apply.go, list.go, persist.go — coold owns kernel rules,
  persistence (allow.rules snapshot), and the systemd unit
- Add --coold-port (default 8443) and --coold-token persistent flags
- Update CLAUDE.md and CONTROL_PLANE.md to reflect coold-owned surface
  and outbound WSS/gRPC dial architecture
2026-04-20 17:35:27 +02:00
Andras Bacsai 8f358b3115 docs(firewall): document coolify firewall subcommand and control plane split
Add comprehensive docs for the `coolify firewall` cross-host allow-rule
test harness (alpha, v5) in CLAUDE.md: subcommands, flags, rule lifecycle,
reboot persistence via coolify-mesh-allow.service, and testing patterns.

Update CONTROL_PLANE.md to clarify the three-layer ownership model
(central DB for metadata/audit, coold/CLI for raw kernel rules, init for
chain scaffold), document /etc/coolify/allow.rules file format and the
pre→post-coold handoff strategy (same file format, coold takes over as
writer with no migration step).
2026-04-20 14:26:02 +02:00
Andras Bacsai 84fec60a60 feat(firewall): add cross-host container allow-rule command
Add `coolify firewall` command tree (alpha) for managing iptables
COOLIFY-ALLOW rules across SSH-reachable servers in the coolify-mesh
Podman network.

New subcommands:
- containers: discover running containers across all servers
- list: show installed allow rules
- allow: add src→dst:port allow rule
- revoke: remove an allow rule

Extract shared SSH-mesh flags (--servers, --ssh-key, --ssh-user,
--ssh-port, --concurrency, --ssh-timeout) into cmd/common.SSHMeshFlags
so both `init` and `firewall` reuse the same flag set. Trim duplicated
flag definitions from cmd/init/flags.go accordingly.

Internal packages added:
- internal/firewall/rule.go: AllowRule model + iptables rule rendering
- internal/firewall/discover.go: fan-out container discovery via podman ps
- internal/firewall/list.go: fan-out rule listing via iptables-save
- internal/firewall/apply.go: apply/revoke rules over SSH
- internal/firewall/persist.go: rule persistence helpers
- internal/models/firewall.go: ContainerRow / AllowRuleRow display models

Full unit-test coverage added for all new packages.
2026-04-20 13:53:36 +02:00
Andras Bacsai 0df8f401e1 Merge remote-tracking branch 'origin/v4.x' into coolify-init-wireguard-mesh 2026-04-20 13:36:31 +02:00
github-actions[bot] 594e274b6b chore: bump version to v1.6.2 2026-04-20 10:54:34 +00:00
Andras Bacsai b126ed52c4 docs: update llms-full.txt with preview delete command and alias 2026-04-20 12:50:14 +02:00
Andras Bacsai 4d9b21a662 feat(application): add preview deployment delete command
Add `application previews delete` subcommand to delete PR preview
deployments. Includes service method, CLI command with confirmation
prompt and --force flag, and full test coverage.
2026-04-19 14:55:42 +02:00
Andras Bacsai 6f1b38cf84 feat(dns): bind coold DNS to bridge gateway, disable aardvark-dns
Add --disable-dns to podman network create so netavark never starts
aardvark-dns on the bridge gateway IP:53 — coold owns that socket for
cluster-wide service discovery (CONTROL_PLANE.md §5).

- CooldServiceUnit takes bridgeGatewayIP param; injects
  COOLD_BRIDGE_GATEWAY_IP and COOLD_DNS_ZONE env vars into systemd unit
- podmanNetRecreateCmd drops and recreates network to fix pre-alpha
  drift where dns_enabled=true; phase2 detects via PodmanDNSEnabled
- Add namespace column to service_endpoints schema (reserved for
  per-app isolation / multi-tenant scoping)
- Pass containerAssignments to phase3Server
- Document port 53 conflict handling layers in CONTROL_PLANE.md
2026-04-17 16:24:57 +02:00
Andras Bacsai 1e67e5e3f5 feat(corrosion): detect schema drift and auto-reset DB on schema change
Replace boolean `healthy` column with `state` (liveness) and `health`
(readiness) columns in the CR-SQLite schema.

Add sha256-based schema drift detection: Probe reads the remote schema
file hash into CorrosionSchemaSha256; BuildPlan triggers
ActionWriteCorrosionSchema when hash mismatches; phase3Server stops
corrosion and wipes the DB before writing the new schema so CR-SQLite
can re-bootstrap cleanly.

Fix systemd activation: use `enable` + `restart` instead of
`enable --now` so already-active services pick up new config without a
separate reload step.
2026-04-17 15:07:26 +02:00
Andras Bacsai 67e53195bb feat(init): add corrosion + coold install support
Add --install-coold flag to `coolify init` that uploads and installs the
corrosion (gossip/CRSQLite) and coold (host agent) binaries on each node.

- New internal/services package: pure config generators for corrosion
  TOML, CoolifySchemaSQL, and coold systemd unit; ELF64/aarch64 validator
- wireguard.DesiredMesh gains InstallCoold, binary paths/shas, and port fields
- apply/plan wire pre-flight checks: ELF arch validation + SHA-256 hashing
  before any SSH connection is opened
- SSH client gains helpers used by the wireguard apply layer
2026-04-17 13:56:36 +02:00
Andras Bacsai 76ce28e65f docs(init): document WireGuard mesh bootstrap in CLAUDE.md
Add comprehensive developer reference for `coolify init` — covers
architecture, flags, code layout, key invariants, firewall modes,
cross-host vs intra-host isolation, and future coold boundary.
2026-04-17 12:07:35 +02:00
Andras Bacsai ab44a5a107 docs(control-plane): design embedded DNS service discovery via Corrosion
Replace env-injection placeholder with full DNS-via-coold design.
Covers Corrosion schema, embedded DNS server pseudocode, resolution
flow, backend movement, health/staleness, failure modes, and REST API
surface for the service discovery subsystem.
2026-04-16 19:34:29 +02:00
Andras Bacsai b38f6178b5 docs(control-plane): clarify coold as sole allow-rule owner and persistence model
Replace per-rule systemd dropin approach with coold-owned DB + batch apply.
Adds division of labour table, updated API surface (/v1), nftables set optimization
note, and scale comparison (iptables-restore vs nft vs per-rule).
2026-04-16 15:44:05 +02:00
Andras Bacsai f4d8049867 docs(control-plane): introduce coold as per-host agent boundary
Document coold as the security/audit layer between Coolify control
plane and the podman socket. Add architecture diagram showing the
communication flow. Update all references from direct podman socket
access to coold REST API over wg0.

Also add comment to enablePodmanSocketCmd clarifying the socket stays
Unix-only and is never exposed on TCP.
2026-04-16 15:31:50 +02:00
Andras Bacsai 1dfbc8cb7b feat(init): add WireGuard mesh bootstrap command
Add `coolify init plan` and `coolify init apply` commands for
bootstrapping a WireGuard full-mesh overlay between servers.

- SSH fanout to reconstruct current WireGuard state per host
- Plan engine diffs desired vs actual mesh (peers, IPs, firewall)
- Apply executes plan idempotently over SSH with concurrency control
- Podman install + coolify-mesh bridge network setup
- iptables firewall rules with optional default-deny container policy
- Subnet allocators for mgmt pool (100.64.0.0/16) and container pool (10.210.0.0/16)
- CONTROL_PLANE.md spec for v5 control plane responsibilities
2026-04-16 15:25:44 +02:00
github-actions[bot] ab951a561c chore: bump version to v1.6.1 2026-04-16 10:01:33 +00:00
Andras Bacsai bc36a44f2c docs: add Homebrew tap details to release guide
Document the GoReleaser-managed Homebrew tap, update verification
steps to be non-destructive, and tighten the post-release checklist.
2026-04-16 11:49:42 +02:00
Andras Bacsai d3489a49ce chore: remove conductor setup script and config
Deletes conductor-setup.sh and conductor.json, which were used for
workspace bootstrapping and hot-reload tooling via the conductor runner.
2026-04-16 11:44:58 +02:00
Andras Bacsai e2f0b47579 docs: update release guide for automated version bumping
Document that version is injected at build time via ldflags and that
the post-release update-version CI job handles committing the bump to
internal/version/checker.go. Remove the manual version bump step from
the pre-release checklist.
2026-04-16 11:39:27 +02:00
Andras Bacsai 8e35e61aa0 Merge pull request #70 from YaRissi/fix/format-flag
fix: json format for service commands
2026-04-16 11:34:52 +02:00
YaRissi 0197333e41 fix: json format for service commands 2026-04-12 22:54:12 +02:00
85 changed files with 11537 additions and 110 deletions
+8
View File
@@ -42,6 +42,14 @@ linters:
exhaustive:
default-signifies-exhaustive: true
revive:
rules:
- name: var-naming
arguments:
- []
- []
- - skipPackageNameChecks: true
staticcheck:
checks: ["all", "-ST1005", "-S1016"]
+326
View File
@@ -174,6 +174,332 @@ type Resource struct {
- UUIDs are more secure (don't expose database sequencing)
- Coolify API uses UUIDs as the primary resource identifier
## `coolify init` — WireGuard mesh + Podman bootstrap (alpha, v5)
**This subcommand is an outlier**: it does NOT talk to the Coolify API. It SSHes into remote hosts and installs/configures WireGuard, Podman, the bridge network, and a firewall scaffold. It's the fleet-provisioning command tree consumed by the v5 control plane (coold), split into three intent-scoped subcommands — `bootstrap`, `extend`, `upgrade` — plus a read-only `plan`. Coolify's backend calls `extend` when the operator adds a server and `upgrade` when agent versions move; direct-CLI operators run `bootstrap` for the initial install.
### What it does
- Establishes a full-mesh WireGuard overlay across N hosts.
- Each host gets a mgmt IP `/32` from `--wg-mgmt-pool` (default `100.64.0.0/16`, RFC 6598 CGNAT) on `wg0`.
- For every namespace (see **Namespaces** below; default: just `default`), each host gets a container subnet `/<container-prefix>` carved from the shared `--container-pool` (default `10.210.0.0/16`, default prefix `/24`). Each namespace is owned by its own Podman bridge named `coolify-<namespace>-mesh` (default → `coolify-default-mesh`).
- Installs Podman + enables `podman.socket` + creates every namespace bridge + installs `coolify-mesh-fw.service` (always; required for v5 runtime).
- Downloads and installs coold + corrosion (v5 control-plane agents; always) from GitHub releases on each remote host. Release tag controlled by `--coold-version` / `--corrosion-version` (default `nightly`). coold receives the full namespace list via `COOLD_NAMESPACES=<ns>:<network>:<gateway-ip>,...` so it can bind DNS and track rules per namespace.
- Installs default-deny firewall scaffold by default — host-global `COOLIFY-INTRA` + empty `COOLIFY-ALLOW` chains, with FORWARD jumps for every namespace subnet. Use `--skip-default-deny` to fall back to blanket-allow (mode A) for testing.
### Architecture (why this layout)
The mgmt pool and container pool are **separate** so the Podman bridge can own the full container `/24` without conflicting with `wg0`. Pattern adopted from uncloud (psviderski/uncloud).
WG config per host (e.g. host A with two namespaces `default` + `alpha`):
```
[Interface]
Address = 100.64.0.1/32 # mgmt IP, NOT in container pool
ListenPort = 51820
PrivateKey = <gen on host>
[Peer] # one per other host
PublicKey = <peer pubkey>
AllowedIPs = 100.64.0.2/32, 10.210.1.0/24, 10.220.1.0/24 # mgmt + every namespace subnet
Endpoint = <peer SSH ip>:51820
```
Critical: `AllowedIPs` lists the peer's full per-namespace `/24`s so the kernel routes each namespace subnet via `wg0`. Namespace order is deterministic (sorted) so `wg0.conf` is stable across re-runs.
Every namespace bridge `coolify-<ns>-mesh` is created with `--disable-dns --label io.coolify.managed=true --label io.coolify.namespace=<ns>` — the bridge gateway `:53` is reserved for coold's embedded cluster DNS (see CONTROL_PLANE.md §5). Pre-alpha networks with `dns_enabled=true` are detected on re-run and recreated.
Firewall service (`coolify-mesh-fw.service`) installed unconditionally and stays host-global:
- POSTROUTING `RETURN` rule per namespace subnet prevents Podman MASQUERADE from rewriting container egress source on `wg0`.
- Mode A (`--skip-default-deny`): blanket FORWARD ACCEPT for every namespace subnet.
- Mode B (default): `COOLIFY-INTRA` chain (ESTABLISHED accept → `COOLIFY-ALLOW` → DROP), FORWARD jumps for `-s/-d <ns-subnet>` per namespace. v5 control plane (coold) fills `COOLIFY-ALLOW`.
### Cross-host vs intra-host firewall
- **Cross-host default-deny WORKS** — those packets cross interfaces (wg0 ↔ bridge) and traverse iptables FORWARD. Empirically verified.
- **Intra-host (same bridge) is NOT enforced** — Linux + netavark + Ubuntu 24.04 quirk: bridge L2 traffic bypasses iptables FORWARD even with `bridge-nf-call-iptables=1`. v5 control plane handles intra-host isolation via per-app podman networks (`--opt isolate=true`), not iptables.
### Subcommands
Three intent-scoped subcommands. Each runs the same probe → plan → filter → apply → verify pipeline; what differs is the filter applied to the action list. The filter lives in `internal/wireguard/intent.go` (`ValidateIntent` + `filterByIntent`). Suppressed actions surface on `plan.Skipped` so the preview shows operators what would have fired and why.
```bash
coolify init plan --servers IP1,IP2,IP3 --ssh-key KEY [--intent bootstrap|extend|upgrade]
coolify init bootstrap --servers IP1,IP2,IP3 --ssh-key KEY [--yes]
coolify init extend --servers IP1,IP2,IP3,IP4 --new-hosts IP4 --ssh-key KEY [--allow-replace]
coolify init upgrade --servers IP1,IP2,IP3 --ssh-key KEY --coold-version v1.7.0 [--allow-nightly]
```
- `plan` is read-only: probes, reconstructs, shows what the selected intent would execute. Default intent is `bootstrap` (broadest preview).
- `bootstrap` is the first-time install — every applicable action on every host. Keeps the interactive alpha gate (unless `--yes`, `COOLIFY_NON_INTERACTIVE=1`, or non-TTY). 2-phase parallel: phase 1 = install + keygen + podman + socket + IP forward. Re-probe. Phase 2 = write WG config + enable/reload service + create podman networks + install firewall + install coold/corrosion (+ scheduler on `--central` + builder on `--builder-hosts`).
- `extend` adds the hosts listed in `--new-hosts` (required subset of `--servers`) to an existing mesh. Brand-new hosts get the full first-time install. Existing hosts get **only peer-refresh** actions (WG config rewrite picks up the new peer's mgmt `/32` + namespace `/24`s in `AllowedIPs`, corrosion peer list refreshed, firewall unit reinstalled only when the namespace list changed). Agent binaries are not re-downloaded on existing hosts. Destructive-replace actions (podman network recreate because of `dns_enabled=true` drift or a subnet/label mismatch) are **blocked on existing hosts** unless `--allow-replace` is passed. The corrosion-schema wipe-DB branch is never unlocked — resolve schema drift with `upgrade` on a fresh schema.
- `upgrade` bumps agent binaries across every host. Only binary-fetch actions (`install-coold`, `install-corrosion`, `install-scheduler`, `install-builder`) and their follow-up service restarts (`install-coold-service`, `install-corrosion-service`, `install-scheduler-service`) run. WG config, podman networks, firewall rules, and the corrosion schema stay untouched. `nightly` tags are rejected by default (they force a re-install every run); pin a version with `--coold-version=v1.7.0` etc. or pass `--allow-nightly`.
`extend` and `upgrade` skip the interactive alpha gate because they are the paths the Coolify backend calls in production. `bootstrap` keeps the gate for direct-CLI runs.
### Flags (defined in `cmd/init/flags.go`)
Persistent (inherited by `plan`, `bootstrap`, `extend`, `upgrade`):
| Flag | Default | Purpose |
|---|---|---|
| `--servers` | required | comma-separated SSH IPs (full list of every host in the mesh, including already-converged ones on extend/upgrade) |
| `--ssh-key` | required | path to SSH private key |
| `--ssh-passphrase-prompt` | false | prompt for key passphrase (also reads `COOLIFY_SSH_PASSPHRASE` env) |
| `--ssh-user` | `root` | SSH user |
| `--ssh-port` | `22` | SSH port |
| `--wg-mgmt-pool` | `100.64.0.0/16` | mgmt IP pool, /32 per host on wg0 |
| `--container-pool` | `10.210.0.0/16` | container pool, carved per host |
| `--container-prefix` | `24` | per-host container subnet prefix |
| `--wg-interface` | `wg0` | WG iface name on remote |
| `--wg-listen-port` | `51820` | WG UDP port |
| `--namespaces` | `default` | comma-separated list of namespaces. Each creates its own `coolify-<ns>-mesh` bridge with its own per-host `/24` carved from `--container-pool` |
| `--skip-default-deny` | false | skip the default-deny firewall scaffold. Default installs COOLIFY-INTRA + empty COOLIFY-ALLOW chains for cross-host deny |
| `--coold-version` | `nightly` | release tag to download for coold (e.g. `nightly`, `v1.2.3`). `nightly` always re-downloads on every run; pinned tags skip when the on-host version marker matches. Fetched from `coollabsio/coold` GitHub releases on the remote host. |
| `--corrosion-version` | `nightly` | release tag to download for corrosion. Same drift semantics as `--coold-version`. Fetched from `coollabsio/corrosion` GitHub releases. |
| `--scheduler-version` | `nightly` | release tag for scheduler (only fetched when `--central` is set). |
| `--corrosion-gossip-port` | `8787` | corrosion SWIM gossip port (bound to wg0 mgmt IP) |
| `--corrosion-api-port` | `8080` | corrosion HTTP API port (bound to 127.0.0.1) |
| `--central` | `""` | SSH address of the central VM (must be in `--servers`). When set, scheduler installs there and per-host JWTs are pushed to every peer. Empty = skip scheduler setup. |
| `--enable-builder` | true | cluster-wide shorthand: enable the builder capability on every host (requires `--central`). Ignored when `--builder-hosts` is set. |
| `--builder-hosts` | `[]` | explicit subset of `--servers` to enroll with the builder capability. Takes precedence over `--enable-builder`. |
| `--builder-capacity` | `2` | concurrent builds per host (`COOLD_BUILDER_CAPACITY`) |
| `--builder-cpu-quota` | `200%` | systemd CPUQuota per build subprocess |
| `--builder-memory-max` | `2G` | systemd MemoryMax per build subprocess |
| `--builder-timeout-secs` | `1800` | wall-clock cap per build |
| `--concurrency` | `10` | parallel SSH connections |
| `--ssh-timeout` | `30s` | SSH connect timeout |
| `--yes`, `-y` | false | skip alpha confirmation prompt (honored by `bootstrap`; `extend` and `upgrade` always skip it) |
Subcommand-local:
| Flag | Subcommand | Default | Purpose |
|---|---|---|---|
| `--intent` | `plan` | `bootstrap` | preview filter: `bootstrap` (all actions), `extend` (treat `--new-hosts` as fresh, existing hosts peer-refresh only), `upgrade` (version bumps only) |
| `--new-hosts` | `extend` | required | comma-separated subset of `--servers` that is brand-new this run. Only these hosts receive the full install; all other hosts get peer-refresh only. |
| `--allow-replace` | `extend` | false | unlock destructive-replace actions on existing hosts (e.g. recreating a drifted podman bridge). Off by default — drifted existing hosts surface as skipped actions. |
| `--allow-nightly` | `upgrade` | false | permit `nightly` as a version tag. Off by default because `nightly` re-installs every run instead of only when the pinned version changes. |
### Namespaces
Namespaces are the tenancy unit the mesh carries. A namespace is:
- **A podman bridge network** on every host, named `coolify-<ns>-mesh` (default → `coolify-default-mesh`), labelled `io.coolify.managed=true` + `io.coolify.namespace=<ns>`.
- **A per-host `/<container-prefix>` subnet** carved from the shared `--container-pool`. Allocation is deterministic across `(namespace, host)` pairs so re-runs reproduce the same layout.
- **A DNS view** coold serves on that bridge's gateway: records take the shape `<container>.<namespace>.coolify.internal`. Bare `<container>.coolify.internal` is deliberately NXDOMAIN — callers must fully qualify.
- **A firewall tenant**: allow-rule cids hash the namespace in, so identical src/dst/proto/port tuples in different namespaces are distinct rules. iptables chains stay host-global (`COOLIFY-INTRA` / `COOLIFY-ALLOW`) for alpha; namespace isolation comes from separate podman bridges + namespace-qualified allow rules.
Config knobs:
- `coolify init bootstrap --namespaces default,alpha,beta` provisions every namespace on every host in one pass. Re-running `bootstrap` (or running `extend` with the new namespace in `--namespaces`) installs only the new per-namespace assets (bridge + FORWARD jumps + WG `AllowedIPs` refresh + firewall unit reinstall because of unit-hash drift). Removing a namespace is **not** idempotent today — destroy/rebuild is the documented path for alpha.
- `coolify firewall --namespace <ns>` (default `default`) scopes allow/revoke/list/containers to one namespace. `list` and `containers` also accept `--all-namespaces` for cross-namespace observability.
- coold receives the full namespace list via `COOLD_NAMESPACES=<ns>:<network>:<gateway-ip>,…` (see `internal/services/coold.go`). DNS binds and rule storage derive from that.
Deliberately deferred (tracked in the active plan):
- Per-namespace iptables chains. Host-global keeps kernel state simple; revisit when a user asks for kernel-enforced per-namespace default-deny.
- Cross-namespace L2 bridging. Different namespaces = different podman bridges = no intra-host connectivity. Cross-namespace flows require explicit allow rules + dual-attach containers.
- Wildcard / DNS search domain. Start strict; loosen once real workloads push back.
### Code layout
- `cmd/common/` — flag structs shared between `init` and `firewall`.
- `sshmesh.go``SSHMeshFlags` + `BindSSHMeshFlags`, `BuildSSHClient`, `ParseSSHTimeout`, `ResolvePassphrase`, `Validate`.
- `meshnet.go``MeshNetFlags` (namespaces + container pool/prefix) + `BindMeshNetMultiFlags` (init-style: many namespaces) + `BindMeshNetSingleFlags` (firewall-style: one namespace) + `PodmanNetworkFor(ns)` + `ValidateNamespaces` / `ValidateNamespace` (DNS-label check).
- `cmd/init/` — Cobra subcommands (`init`, `init plan`, `init bootstrap`, `init extend`, `init upgrade`).
- `flags.go``InitFlags` struct (embeds `common.SSHMeshFlags` + `common.MeshNetFlags`) + bindings + SSH client builder. Carries subcommand-scoped knobs: `NewHosts`, `AllowReplace`, `AllowNightly`, `Intent`.
- `desired.go``buildDesired(flags)`: flag → `wireguard.DesiredMesh`. One source of truth so every subcommand produces the same struct modulo `Intent`.
- `plan.go``runPlan`: validate, `buildDesired`, `ValidateIntent`, build SSH client, probe, `BuildPlan`, render actions + skipped rows. `--intent` flag selects the filter for preview.
- `apply.go``runApply(ctx, cmd, flags, applyOptions)`: shared pipeline for all three executing subcommands. `applyOptions{SkipAlphaGate, Header}` differentiates them.
- `bootstrap.go``NewBootstrapCommand`: sets `flags.Intent = "bootstrap"`, keeps alpha gate.
- `extend.go``NewExtendCommand`: binds `--new-hosts` + `--allow-replace`, validates subset, sets `flags.Intent = "extend"`, skips alpha gate.
- `upgrade.go``NewUpgradeCommand`: binds `--allow-nightly`, sets `flags.Intent = "upgrade"`, skips alpha gate.
- `init.go` — registers the four subcommands; package is `initcmd` (not `init` — Go reserved keyword).
- `internal/wireguard/` — pure Go logic (no SSH, no I/O — `apply.go` is the SSH boundary).
- `state.go``ServerState` (with `Namespaces map[string]*NamespaceServerState`), `MeshState`, `DesiredMesh` (with `Intent`, `NewHosts`, `AllowReplace`, `AllowNightly`). `Intent` enum: `IntentBootstrap` (zero value), `IntentExtend`, `IntentUpgrade`.
- `intent.go``ValidateIntent` (pre-plan invariants: extend needs `NewHosts ⊆ Hosts`; upgrade rejects nightly unless opted-in), `filterByIntent` (mutates `plan.Actions` + `plan.Skipped`), `categorize` (action → `catSafeAlways` / `catPeerRefresh` / `catDestructiveReplace` / `catVersionBump` / `catWipeDB` / `catCorrosionSchemaFirstWrite`).
- `subnet.go``Allocate` (per `(namespace, host)` pair: `map[ns]map[host]*net.IPNet`) + `AllocateMgmtIPs` (per-host /32) + conflict detection. Provably stable: adding host D never shifts A/B/C.
- `config.go``RenderConfig` + `WriteConfigCommand` for `wg0.conf` (Address /32, AllowedIPs = mgmt /32 + every peer namespace subnet, deterministic order).
- `reconstruct.go``Probe` (per-namespace podman network inspect + label read) + `Reconstruct` (parallel) + `parseConfigFile`.
- `plan.go``BuildPlan` (pure: desired - actual = actions, then `ValidateIntent` + `filterByIntent`). `Plan.Skipped []SkippedAction` carries intent-filtered entries with reasons. Podman actions carry a `Namespace` field; one create/recreate action per namespace per host.
- `apply.go``ApplyMesh` (2-phase fanout via `internal/ssh/fanout.go`). Phase 2 loops over namespaces per host; firewall unit takes the union of every namespace subnet.
- `firewall.go``coolify-mesh-fw.service` unit generator (two-mode: blanket allow vs default-deny, one FORWARD/POSTROUTING pair per namespace subnet).
- `internal/ssh/` — generic SSH runner + parallel `ForEachServer[T]`.
- `test/fixtures/wg/wg0.conf` — fixture for parser tests.
### Key invariants
- **Reconstructed-only state**: no local state file. Every run re-probes via SSH. State lives on the hosts.
- **Idempotent**: re-running with no changes produces an empty plan. State drift triggers re-converge (e.g. flipping `--skip-default-deny` reinstalls the firewall service; bumping `--coold-version` re-fetches the binary).
- **Intent gates destruction**: `extend` on an existing host never re-downloads agents, never wipes the corrosion DB, and never recreates a drifted podman bridge without `--allow-replace`. Suppressed actions surface on `plan.Skipped` with a reason. `upgrade` never touches WG / podman / firewall / schema.
- **Private key never leaves host**: WG private key generated on remote via `wg genkey`; config written using `$PRIVKEY=$(cat /etc/wireguard/privatekey)` shell expansion.
- **Atomic config writes**: write to `.conf.tmp`, `mv` to `.conf`.
- **Non-disruptive WG reload**: service-restart uses `systemctl restart wg-quick@wg0 || wg syncconf wg0 <(wg-quick strip wg0)` — the fallback updates peers in kernel without tearing the tunnel.
- **Stable subnet assignment**: existing valid assignments are preserved across re-runs; adding a host never shifts existing `(namespace, host)` `/24`s. Only invalid (out-of-pool, wrong prefix, duplicate, network/broadcast IP) trigger reassignment with a warning.
- **Firewall reinstall is content-hashed**: `coolify-mesh-fw.service` is only rewritten when its expected unit text differs from the on-host sha256, so noisy restarts don't happen on converged re-runs.
### Future control plane (v5 / coold)
`coolify init` owns **fleet provisioning**: first-time bootstrap, adding hosts, and bumping agent versions — each via its own intent-scoped subcommand. Day-to-day container/firewall ops are the v5 control plane's job. See `CONTROL_PLANE.md` for the full spec, including:
- coold per-host agent (REST API on wg0, bind-mounts `/run/podman/podman.sock`, NEVER exposes socket on TCP).
- Service discovery via embedded DNS in coold + Corrosion-replicated sqlite (no env injection, no container restart on backend movement).
- Allow-rule persistence via coold's own DB + `iptables-restore --noflush` or `nft -f` batch (NOT systemd dropins per rule — doesn't scale).
- Cross-host allow rules go on the **destination host** (where DROP would otherwise fire).
When extending `coolify init`, defer dynamic responsibilities to coold. Bootstrap stays narrow: scaffold the mesh, install runtime, prep firewall chains. `extend` and `upgrade` stay narrower still: add peers and bump binaries, nothing else. coold owns everything that changes at runtime.
### Testing init
Tests live in `internal/wireguard/*_test.go` and `cmd/init/*_test.go`:
```bash
go test ./internal/wireguard/... ./cmd/init/... -v
```
Use the SSH `Runner` interface for mocking — never open real SSH connections in unit tests. `internal/ssh/fanout.go` is generic; reuse for any per-server fanout.
## `coolify firewall` — cross-host allow-rule client (alpha, v5)
**This subcommand is the second outlier** (alongside `coolify init`): it does NOT talk to the Coolify API. It is a thin REST client of the **coold** per-host agent installed by `coolify init` (coold install is unconditional as of v1.6.3). `allow` / `revoke` / `list` all go through coold's REST API (`/api/v1/firewall/allow`). `containers` stays SSH+podman because coold has no container surface yet. Transport is **SSH-bounce**: the laptop running the CLI is not a mesh peer, so it SSHes into the target host and the shell there runs `curl "http://$(wg0-mgmt-ip):8443/api/v1/firewall/..."` against coold on localhost.
coold owns all kernel-rule + persistence logic (iptables/nft backend detection, `/etc/coolify/allow.rules` snapshot, `coolify-mesh-allow.service`). The CLI never writes iptables or systemd units directly.
### What it does
- Discovers containers on the selected namespace's `coolify-<ns>-mesh` bridge (default `coolify-default-mesh`) across all listed hosts (SSH + `podman ps`). `--all-namespaces` fans out across every managed namespace.
- `POST /api/v1/firewall/allow` / `DELETE /api/v1/firewall/allow/{id}` / `GET /api/v1/firewall/allow` against coold on the host that **owns the destination IP** (per `CONTROL_PLANE.md §3`: rules go on dst host).
- Per-host bearer tokens fetched on demand from `/etc/coolify/api-token` (see `EnsureCooldAPITokenCommand` in `internal/services/coold.go` — each host generates its own random 32-byte hex token at install time).
- Idempotent at the coold level: POST of an identical tuple returns the existing id; DELETE of an unknown id returns 204.
### Subcommands
```bash
coolify firewall containers [--namespace <ns>] [--all-namespaces] # discover containers on coolify-<ns>-mesh (SSH+podman)
coolify firewall list [--namespace <ns>] [--all-namespaces] # GET /allow on every host and merge
coolify firewall allow --namespace <ns> --from <ref> --to <ref> [--port N] [--proto tcp|udp] [--bidirectional]
coolify firewall revoke --namespace <ns> --from <ref> --to <ref> [--port N] [--proto tcp|udp] [--bidirectional]
```
`<ref>` accepts: container name (unique across mesh), `host:name`, short 12-char podman ID, or raw IP.
### Flags
Persistent (inherited from `cmd/common/sshmesh.go` — shared with `coolify init`):
| Flag | Default | Purpose |
|---|---|---|
| `--servers` | required | comma-separated SSH IPs |
| `--ssh-key` | required | SSH private key path |
| `--ssh-passphrase-prompt` | false | prompt for passphrase (also `COOLIFY_SSH_PASSPHRASE` env) |
| `--ssh-user` | `root` | SSH user |
| `--ssh-port` | `22` | SSH port |
| `--concurrency` | `10` | parallel SSH connections |
| `--ssh-timeout` | `30s` | SSH connect timeout |
Firewall-specific persistent:
| Flag | Default | Purpose |
|---|---|---|
| `--namespace` | `default` | mesh namespace the command operates on. Derives podman network `coolify-<ns>-mesh` for container discovery and is sent to coold as part of every rule payload / list query |
| `--all-namespaces` | false | applies to `list` + `containers` only — fans out across every namespace the mesh carries (`allow` / `revoke` still require a specific `--namespace`) |
| `--coold-port` | `8443` | TCP port coold's REST API listens on (wg0 mgmt IP). Must match `COOLD_API_BIND` emitted by `internal/services/coold.go` |
| `--coold-token` | `""` | **optional** bearer-token override (also reads `COOLIFY_COOLD_TOKEN` env). When empty (the default), the CLI SSHes each host and reads `/etc/coolify/api-token` — tokens are per-host, not centrally shared |
Allow/revoke local:
| Flag | Default | Purpose |
|---|---|---|
| `--from` | required | source container ref or raw IP |
| `--to` | required | destination container ref or raw IP |
| `--port` | `0` | dst port (0 = any) |
| `--proto` | `tcp` | `tcp`, `udp`, or `""` (any — requires `--port=0`) |
| `--bidirectional` | false | also install reverse rule on src host (needed for server-initiated flows; conntrack ESTABLISHED handles client-initiated replies) |
### Rule identity
`cid = sha256(namespace|src|dst|proto|port)[:12]`. Namespace defaults to `"default"` on the wire when empty so legacy coold peers keep working. coold computes the cid server-side on POST and returns it in the body; the CLI surfaces it as the user-facing rule ID in `firewall list` output and uses it for DELETE. Stable across calls: `revoke --namespace … --from … --to …` rebuilds the same cid and matches. Identical src/dst/proto/port tuples in different namespaces produce different cids and are managed independently.
### SSH-bounce transport
Every coold call is wrapped in a single SSH command that first discovers the host's own wg0 mgmt IP and then curls coold on localhost:
```sh
# emitted for POST / DELETE (hard-fails if wg0 missing — no coold means nothing to apply to)
MGMT=$(ip -4 -o addr show wg0 2>/dev/null | awk '{print $4}' | cut -d/ -f1)
test -n "$MGMT" || { echo "coold mgmt IP (wg0) not found on $(hostname)" >&2; exit 1; }
curl -fsS --max-time 10 \
-H 'Authorization: Bearer <token>' \
-H 'Content-Type: application/json' \
-X POST -d '{"src":"...","dst":"...","proto":"tcp","port":80}' \
"http://$MGMT:8443/api/v1/firewall/allow"
```
`list` uses the **soft** variant: missing wg0 emits `[]` and exits 0 so a partially-deployed mesh doesn't abort the whole fanout.
### Per-host token resolution
`cmd/firewall/helpers.go::tokenResolver` hands out tokens per host with a sync.Mutex-guarded cache:
- `--coold-token` (or `COOLIFY_COOLD_TOKEN` env) set → closure returns the override for every host; no SSH fetch.
- Otherwise → first access per host SSHes `cat /etc/coolify/api-token`, caches the result for the rest of the run. Token-fetch failures surface as a `ServerResult.Err` on the owning host (won't poison others).
The cache is scoped to one CLI invocation — no on-disk caching.
### Persistence across reboots
**coold owns this now.** On every API mutate, coold regenerates `/etc/coolify/allow.rules` (flat `iptables-save` fragment) and the companion `coolify-mesh-allow.service` restores it on boot via `iptables-restore --noflush`. Pre-coold persistence scaffolding was removed from the CLI when it migrated to REST — same file format, different writer.
### Code layout
- `cmd/common/sshmesh.go` — shared SSH/mesh flag struct `SSHMeshFlags` (+ `BindSSHMeshFlags`, `BuildSSHClient`, `ParseSSHTimeout`, `ResolvePassphrase`, `Validate`).
- `cmd/common/meshnet.go` — shared namespace plumbing: `MeshNetFlags` (namespaces + container pool/prefix), `BindMeshNetMultiFlags` (init: many), `BindMeshNetSingleFlags` (firewall: one), `PodmanNetworkFor(ns)`, `ValidateNamespaces` / `ValidateNamespace`.
- `cmd/firewall/` — Cobra layer.
- `firewall.go``NewFirewallCommand()` parent + subcommand registration.
- `flags.go``FirewallFlags` embeds `common.SSHMeshFlags` + `Namespace` + `AllNamespaces` + `CooldToken` + `CooldPort` + `WGInterface`. `PodmanNetworkName()` derives the bridge name from `Namespace`. `ResolveCooldToken()` returns the override or `""` (meaning "fetch per host").
- `allow.go``allowRevokeFlags`, `emitAllowRevoke` (discover → resolve → build rule with namespace → coold POST/DELETE per rule, resolving token per host).
- `list.go``emitList` fans out `CooldList` via `CooldListAll`, forwarding the namespace query param (or omitting it under `--all-namespaces`).
- `containers.go``containers` subcommand (still SSH+podman). Without `--all-namespaces`: single bridge. With `--all-namespaces`: SSH per host for `podman network ls --filter label=io.coolify.managed=true`, then per-namespace fanout.
- `resolve.go``resolveEndpoint(ref, []Container)` (name / host:name / short-id / raw IP).
- `helpers.go``discoverAllViaPkg`, `discoverAcrossNamespaces`, `discoverNamespacesOnHosts`, `tokenResolver` (per-host cached bearer-token closure).
- `internal/firewall/` — REST client + discovery.
- `coold_client.go``FetchCooldToken`, `CooldApply`, `CooldRevoke`, `CooldList(… , namespace)`, `CooldListAll(… , namespace)`. `buildCurlAllow/Revoke/List`, `shellSingleQuote`, `mgmtIPScript` / `mgmtIPScriptSoft`. `cooldRulePayload` carries `namespace` (required on wire; empty normalized to `"default"`).
- `discover.go``Container` (with `Namespace`), `discoverScript`, `DiscoverContainers(… , namespace, network)`, `DiscoverAll`, `DiscoverAllNamespaces` (fan-out over a `networkFor(ns)` mapper).
- `rule.go``AllowRule` (with `Namespace`), `ComputeID(namespace, src, dst, proto, port)`.
- `internal/models/firewall.go` — table/JSON row types (`ContainerRow`, `AllowRuleRow`) both now carry a `Namespace` column.
- `internal/services/coold.go``EnsureCooldAPITokenCommand` (installer writes `/etc/coolify/api-token`, mode 0600), `CooldServiceUnit` emits `COOLD_API_BIND=<mgmt-ip>:8443` + `COOLD_API_TOKEN_FILE=/etc/coolify/api-token` + `COOLD_NAMESPACES=<ns>:<network>:<gateway-ip>,…`.
### Key invariants
- **Destination-host ownership**: every rule lives on exactly one host — the one whose `/24` contains the destination IP. `--bidirectional` adds the reverse rule on the src host.
- **coold is the only kernel writer**: the CLI never runs `iptables` or touches `/etc/coolify/allow.rules` directly. Everything flows through coold's REST API.
- **Per-host tokens by default**: each coold generates its own random token at install. `--coold-token` is an escape hatch for homogeneous test / CI environments, not the common path.
- **Bidirectional is opt-in**: conntrack ESTABLISHED accept (installed by `coolify-mesh-fw.service`) handles reply packets for client-initiated flows. Only set `--bidirectional` for protocols that actually open new connections in both directions.
- **Rule identity is hash, not UUID**: coold computes it server-side so CLI and any future writer agree on the same id for the same tuple.
- **Namespace is part of identity**: `cid = sha256(namespace|src|dst|proto|port)[:12]`. Same tuple in two namespaces = two distinct rules. Empty-string namespace normalizes to `"default"` on the wire so legacy coold peers keep working.
- **Transient token exposure on remote `/proc`**: `curl -H "Authorization: Bearer $TOKEN"` is visible in `/proc/<curl-pid>/cmdline` for the ~ms lifetime of the call, root-only. Acceptable for alpha; TLS + stdin-fed tokens are a follow-up.
### Testing firewall
```bash
go test ./internal/firewall/... ./cmd/firewall/... ./cmd/common/... -v
```
Uses `fakeCooldRunner` / `cmdFakeRunner` pattern (substring → canned stdout map) — same as `cmd/init/plan_test.go`. All SSH calls mocked at the `ssh.Runner` boundary; no real SSH in unit tests. Token-fetch, mgmt-IP script, curl shape, JSON payload, and error propagation are all covered.
### End-to-end flow (verified on real hosts)
After `coolify init bootstrap --servers A,B --namespaces default,alpha ...` ran (coold must be up):
1. Baseline cross-host traffic DROPped by `COOLIFY-INTRA` in every namespace.
2. `coolify firewall containers --servers A,B --ssh-key KEY --all-namespaces` → discovery table columned by namespace.
3. `coolify firewall allow --servers A,B --ssh-key KEY --namespace default --from client --to web --port 80` → CLI SSH-fetches each host's token, POSTs to coold (body includes `"namespace":"default"`), traffic flows in the `default` namespace only.
4. Same tuple with `--namespace alpha` → separate cid, separate rule; doesn't affect `default`.
5. `coolify firewall list --servers A,B --ssh-key KEY --all-namespaces` → merged rules across every namespace on every host with their coold-assigned `cid:…` IDs.
6. `coolify firewall revoke --namespace <ns> …` → coold DELETE, rule gone, traffic DROPped again.
7. Reboot → `coolify-mesh-allow.service` (installed by coold) restores from `/etc/coolify/allow.rules`.
Add `--coold-token <hex>` only when every host was bootstrapped with the same token (CI fixtures, homogeneous test clusters).
## Testing Requirements
**CRITICAL: All code changes MUST include tests. This is non-negotiable.**
+759
View File
@@ -0,0 +1,759 @@
# Coolify v5 Control Plane — Server Management Spec
This document lists everything the Coolify v5 control plane must implement on top of the host provisioning performed by the `coolify init` subcommand tree (`bootstrap` for first install, `extend` for adding hosts, `upgrade` for bumping agent versions) to fully manage a fleet of mesh-connected hosts.
## Architecture overview
```
┌─────────────────────────────────────┐
│ Coolify central UI / API │
│ - Multi-tenant (cloud) or 1-tenant │
│ (self-hosted); same binary │
│ - WSS / gRPC bidi stream listener │
│ on :443 (public) │
│ - Routes commands by host_id │
└────────────────────▲────────────────┘
│ outbound TLS :443 (WSS / gRPC bidi)
│ long-lived, resumable, jittered reconnect
│ per-host JWT (issued at enroll)
┌─────────────────┴──────────────────┐
│ (per-customer gateway, │
│ OPTIONAL — one mesh host │
│ proxies N coolds → 1 stream) │
└─────────────────▲──────────────────┘
│ same stream protocol, over wg0
┌────────────────────┴────────────────┐ ┌─────────────────────────┐
│ coold (per-host agent) │ │ /run/podman/podman.sock│
│ - Dials central (or gateway) out │──┤ bind-mount, host-only │
│ - Local REST on wg0 :8443 │ │ (NEVER on network) │
│ (intra-mesh callers: CLI, peers) │ └─────────────┬───────────┘
│ - Bearer-token authn (both paths) │ │
│ - Talks ONLY to local podman sock │ ▼
└─────────────────────────────────────┘ ┌─────────────────────────────┐
│ podmand (containers, nets) │
└─────────────────────────────┘
```
**Key principles**:
1. **`/run/podman/podman.sock` is never exposed on TCP.** coold bind-mounts it and proxies a curated API. Central Coolify never touches the raw podman socket directly.
2. **coold always dials outbound — never accepts inbound from central or public internet.** One topology for self-hosted and cloud SaaS. Works through any NAT/corp firewall, scales to thousands of hosts per central region (10k+ idle streams are cheap). No "add central to every customer's wg0" — central never joins any mesh.
3. **coold still exposes a local REST API on wg0 mgmt IP** for intra-mesh callers only (the `coolify firewall` CLI via SSH-bounce, other coolds in the same mesh, a per-customer gateway if deployed). Never reachable from public internet; wg0 is the only L3 boundary that can hit it.
4. **Per-customer gateway (optional)**: for large customers, one host in the mesh runs a stream aggregator that dials central once and proxies commands to the other coolds over wg0. Reduces stream fan-out at central from N-per-customer to 1-per-customer; adds one hop of latency. Transparent to both ends — same protocol each side.
## What `coolify init bootstrap` already provides
| Layer | Component | State |
|---|---|---|
| L3 mesh | WireGuard `wg0` per host with mgmt `/32` from `--wg-mgmt-pool` (default `100.64.0.0/16`) | Installed, configured, active |
| L3 mesh | Peer `AllowedIPs = <peer-mgmt>/32, <peer-container>/24` | Configured |
| Container runtime | Podman (distro apt) | Installed |
| Container runtime | `podman.socket` (rootful, `/run/podman/podman.sock`) | Enabled, active |
| Container network | `coolify-mesh` bridge per host with `/24` from `--container-pool` (default `10.210.0.0/16`), gateway `.1` | Created |
| Routing | `net.ipv4.ip_forward=1` (persisted via `/etc/sysctl.d/99-coolify-mesh.conf`) | Enabled |
| Firewall (mode A — `--podman` only) | `coolify-mesh-fw.service` with FORWARD ACCEPT for container subnet + POSTROUTING RETURN to skip podman MASQUERADE on wg0 | Active |
| Firewall (mode B — `--default-deny`) | `COOLIFY-INTRA` chain (ESTABLISHED/RELATED accept → COOLIFY-ALLOW → DROP), FORWARD jumps for `-s/-d <container-subnet>`, blanket ACCEPT removed | Active when set |
| Allow chain | `COOLIFY-ALLOW` (empty filter chain) | Created, ready for runtime rules |
Each host has a stable `(mgmt-ip, container-subnet)` pair. The bootstrap is idempotent — re-running `apply` only changes what drifted.
---
## What v5 control plane MUST implement
### 1. Inventory & state sync
- **Discovery**: query each host's `podman.socket` (over wg0 mgmt IP) for: containers, networks, volumes, images, system stats.
- **Drift detection**: periodically reconcile desired state (Coolify DB) against actual (podman API). Re-converge or alert.
- **Mesh join/leave**: when a host is added or removed from the cluster:
- Add → invoke `coolify init extend --servers <full list> --new-hosts <new host>` (installs the new host end-to-end, regenerates wg0 config on every existing peer with the new mgmt IP + namespace `/24`s, leaves agent binaries on existing hosts untouched).
- Remove → not supported by a first-class subcommand today. Documented workaround for alpha: tear the host out-of-band (stop services, drop it from DNS) and re-run `coolify init bootstrap` with the reduced `--servers` list on a maintenance window; a dedicated `remove-host` flow is a follow-up.
### 2. Container lifecycle
Every container op is a command sent over coold's outbound stream (central → coold) or a local REST call on coold's wg0 listener (intra-mesh → coold). coold executes the command against the local `/run/podman/podman.sock` Unix socket and streams results back.
- Create container with `--network coolify-mesh` and explicit `--ip` from the host's `/24`.
- Reserve container IPs in the control plane DB. Allocator skips `.1` (bridge gateway), reserves `.2` for coold itself, `.3-.254` for app containers.
- Start, stop, restart, remove.
- Stream logs via `/containers/{id}/logs?follow=true` (coold relays podman API frames over the open control stream).
- Health checks via `/containers/{id}/healthcheck/run`.
- Resource limits, env vars, mounts, volumes, secrets — all standard podman API surfaced through coold.
#### coold is a primitive proxy, not an app brain
coold follows the **kubelet analogue**: it knows containers, images, volumes, networks, iptables, and Corrosion writes. It does **not** know apps, compose, Dockerfiles, buildpacks, or Nixpacks. Central Coolify is the apiserver+controllers: it parses app-level config and compiles it into a sequence of primitive ops streamed to coold.
Test for "should this live in coold?": could a second orchestrator (a Nomad-style competitor) reuse this coold with a different app model? If yes → coold. If no → central.
#### Wire surface (enumerable)
Same endpoint set on both transports (outbound stream from central, local REST on wg0 for intra-mesh callers). New verbs require a coold release — there is no `/podman/raw` passthrough.
```
# Images
POST /api/v1/images/pull {ref, auth?} -> {digest}
GET /api/v1/images -> [{ref, digest, size}]
DELETE /api/v1/images/{ref}
# Containers (filtered podman surface)
POST /api/v1/containers <create spec> -> {id}
POST /api/v1/containers/{id}/start
POST /api/v1/containers/{id}/stop {timeout?}
POST /api/v1/containers/{id}/restart
DELETE /api/v1/containers/{id} {force?}
GET /api/v1/containers/{id} (inspect)
GET /api/v1/containers/{id}/logs?follow=true (streamed)
POST /api/v1/containers/{id}/exec {cmd, tty?} (streamed)
POST /api/v1/containers/{id}/healthcheck/run
# Volumes
POST /api/v1/volumes {name, driver, labels}
DELETE /api/v1/volumes/{name}
GET /api/v1/volumes/{name}
# Networks (bootstrap creates coolify-mesh; extra per-app nets created here)
POST /api/v1/networks {name, driver, options, labels}
DELETE /api/v1/networks/{name}
GET /api/v1/networks
# Firewall (coold = sole writer)
POST /api/v1/firewall/allow {src, dst, proto?, port?} -> {id}
DELETE /api/v1/firewall/allow/{id}
GET /api/v1/firewall/allow
# Service endpoints (Corrosion writer; used by central to register deploys)
POST /api/v1/services/register
DELETE /api/v1/services/{id}/endpoints/{container_id}
GET /api/v1/services/{id}/endpoints
# DNS (diagnostics)
GET /api/v1/dns/lookup/{name}
GET /api/v1/dns/stats
# Host facts (read-only; central scrapes these for observability + scheduling)
GET /api/v1/host/info (podman info, kernel, wg state, load)
GET /api/v1/host/containers (podman ps -a)
GET /api/v1/host/stats (podman stats snapshot)
```
**Deny filter on `POST /containers`** (defense-in-depth even though central is trusted):
- Block `--privileged`, `--cap-add=SYS_ADMIN/NET_ADMIN` unless host is marked `allow_privileged=true`.
- Block host-path bind mounts outside a configurable allowlist (default: none).
- Block host netns (`--net=host`) unless the container is coold itself.
Anything not above is not coold's job. No `/apps`, `/deployments`, `/compose`, `/build`, `/podman/raw`. coold does not parse compose, Dockerfiles, buildpacks, or any app-level config — central compiles these into sequences of the primitive ops above and streams them down.
#### Networks
Default = shared `coolify-mesh` bridge. Containers get `.coolify.internal` DNS + flat L3 across the mesh. Users may define extra podman networks per app (docker-compose `networks:` style) via `POST /networks` + container attach on create. Central compiles compose into network-create + container-attach primitives.
#### coold deployment
coold runs as a privileged container on each host (or as a host systemd service). `coolify init bootstrap` puts it in place at install time (and `coolify init upgrade` bumps its version later): binary, systemd unit with `COOLD_API_BIND=<wg0-mgmt-ip>:8443`, random per-host bearer token at `/etc/coolify/api-token` (mode 0600), outbound stream config written atomically to `/etc/coolify/coold.env`.
Reference container spec (equivalent to systemd-service deployment):
```bash
podman run -d --name coold --restart=always \
--network coolify-mesh --ip 10.210.X.2 \
-v /run/podman/podman.sock:/run/podman/podman.sock \
-v /etc/coolify/coold:/etc/coolify/coold:ro \
--security-opt label=disable \
-p 100.64.0.X:8443:8443 \
ghcr.io/coollabs/coold:latest
```
- **Outbound stream**: coold dials `wss://<central-host>/v1/agent` (or gRPC bidi) on start, presenting its per-host JWT. Central routes commands to it by host id over the open stream. Stream is the primary control channel for both self-hosted and cloud SaaS — same code path, same binary.
- **Local REST on wg0 mgmt IP (`100.64.0.X:8443`)**: accepts intra-mesh callers only (the `coolify firewall` CLI via SSH-bounce, other coolds in the same mesh, a per-customer gateway). Not reachable from public internet — wg0 is the L3 boundary. Bearer-token auth on every request.
- **No inbound from central**: central never dials coold. All mutations arrive over the coold-initiated stream; no `COOLIFY-ALLOW` rule for "central → host:8443" needed. Works through NAT/corp firewalls.
#### Control channel transport (stream)
Two candidates; spec-time decision, not per-host:
| Option | Pros | Cons |
|---|---|---|
| **gRPC bidi stream over HTTP/2** *(chosen)* | typed Protobuf schemas, native server-streaming for logs/exec, versionable wire | stricter proxy requirements (some corp proxies still mangle HTTP/2); larger runtime |
| WebSocket (WSS over :443) *(fallback)* | traverses every proxy, tiny overhead, libs everywhere | framing is custom-on-top; manual request/response correlation |
**Decision: gRPC bidi + Protobuf.** Typed schemas + native server-streaming for logs and exec outweigh the proxy risk; WSS remains the documented fallback if gRPC-through-proxy issues show up in the field. Both run on :443, so customer-side egress rules stay unchanged either way.
#### Enrollment
coold registers once at install using a one-time token from central:
```bash
coolify init bootstrap \
--central-url https://cloud.coolify.io \
--enroll-token <one-time-hex>
```
1. coold POSTs `(host_id, wg0_mgmt_ip, container_subnet, enroll_token)` to `https://<central>/v1/enroll`.
2. Central validates the enroll token (scoped to a tenant, single-use, short TTL) and issues a long-lived per-host JWT + TLS-pinned central cert. Response stored in `/etc/coolify/coold.env` (mode 0600).
3. coold burns the enroll token and switches to JWT for the persistent stream.
4. Central revokes by invalidating the JWT in its own DB; next stream reconnect fails auth and the host is quarantined until re-enrolled.
#### Reconnect + fleet-restart storms
Single-central-restart would otherwise trigger simultaneous reconnects from every host. Mitigations:
- **Jittered backoff**: exponential from 1s up to 60s with full jitter. 10k hosts reconnecting spread across ~minutes, not seconds.
- **Resumable streams**: stream carries a monotonic `last_seq` per host so central can replay missed commands after reconnect without central-side queueing beyond an in-memory ring buffer.
- **Region sharding**: DNS round-robin or geo-steering across multiple central stream gateways; each gateway holds O(10k) streams. Stateful routing via consistent-hashing on host_id so a host lands on the same gateway across reconnects (cache affinity).
#### Per-customer gateway (optional)
For customers with 50+ hosts, one designated mesh host runs a **gateway mode coold** (same binary, different role):
- Dials central like any other coold.
- Accepts incoming streams from its peer coolds over wg0 (they dial `wss://<gateway-mgmt-ip>:8443/v1/agent-peer` instead of central).
- Relays commands down, responses up. Maintains O(hosts-in-mesh) inbound streams + 1 outbound to central.
Saves N-1 WAN streams at central per customer; costs one hop of latency + one more thing to keep alive. Opt-in via `coolify init bootstrap --gateway-for-mesh` on the chosen host; peers get `--via-gateway <gateway-mgmt-ip>` at install.
### 3. Network policy (firewall)
When host has `--default-deny` enabled, **all cross-host container traffic is dropped by default**. The control plane decides who talks to whom.
#### Division of labour: bootstrap vs coold vs central
| Layer | Owner | Responsibility |
|---|---|---|
| Chain scaffold (COOLIFY-INTRA, COOLIFY-ALLOW, FORWARD jumps, conntrack early-accept, POSTROUTING RETURN) | `coolify init bootstrap` (also reconverges on `extend`) | Install + idempotently re-converge on flag change. Never touches individual allow rules. |
| Rule metadata (who/when/why, audit log, RBAC, tenant scoping, app→rule mapping) | **Coolify central DB** | Authoritative store. All rich queries, audit trails, and access control live here. |
| Raw rule tuples `(src, dst, proto, port)` on the host | **coold** (single writer) | Apply to kernel + snapshot to `/etc/coolify/allow.rules` for reboot. Stateless-ish — just a cache of what the caller (central Coolify or `coolify firewall` CLI) told it to apply. No metadata, no DB. |
**Key split**: central Coolify owns rich state (metadata, audit, RBAC). Per-host coold owns only the raw rules needed to program the kernel + survive reboot. This keeps coold small and lets a single central DB be the source of truth for all cross-cutting concerns.
**App-topology compilation happens in central.** coold applies the rule tuples it is told to apply; it does not generate rules from app intent (e.g. "allow service `web``db`"). Central compiles that from the app model and sends individual `POST /firewall/allow` frames.
**`coolify init` is intentionally not the rule store.** Bootstrap creates the empty allow chain. coold is the sole writer into it. Callers reach coold via two paths: (a) central Coolify over the coold-initiated outbound stream, (b) intra-mesh callers (`coolify firewall` CLI via SSH-bounce, other coolds, optional per-customer gateway) via coold's local REST API on wg0 mgmt IP.
#### Reboot persistence
Works the same pre- and post-coold because both use the same file format:
- `/etc/coolify/allow.rules` — filter-table fragment, `:COOLIFY-ALLOW` + `-A COOLIFY-ALLOW` lines only. Written atomically (`.tmp` + `mv`) on every rule change.
- `/etc/systemd/system/coolify-mesh-allow.service``Type=oneshot`, `After=coolify-mesh-fw.service`, `Wants=coolify-mesh-fw.service`. `ExecStart=iptables-restore --noflush /etc/coolify/allow.rules`. `--noflush` means only `COOLIFY-ALLOW` is populated; nothing else is disturbed.
coold owns the file: it rewrites `/etc/coolify/allow.rules` on every successful API mutate, keeping it in sync with the live kernel. The `coolify firewall` CLI never touches the file — it POSTs/DELETEs through coold and coold handles persistence + systemd unit install. One writer, one format.
#### Allow-rule lifecycle
For an allow `(srcIP, dstIP)`:
- Add ACCEPT to `COOLIFY-ALLOW` on the host that **owns dstIP** (where DROP would otherwise fire).
- For bidirectional traffic (e.g. TCP, ICMP echo+reply), add the reverse `(dstIP, srcIP)` on the host that owns srcIP. (Reply packets traverse THAT host's FORWARD chain when arriving back, and dst-side check fires there.)
- **One unidirectional allow = one rule on one host. One bidirectional allow = two rules on two hosts.**
- Conntrack ESTABLISHED early-accept (installed by bootstrap) handles in-flow follow-up packets — no need to add per-packet rules.
#### Persistence + scale model
Per-rule systemd dropins do NOT scale (1000 rules × `daemon-reload` + restart = minutes, fs clutter, audit nightmare). Instead, coold is a thin rule-applier backed by central:
```
coold service (per host)
├─ Snapshot file: /etc/coolify/allow.rules (flat iptables-save fragment)
├─ Boot: systemd unit runs iptables-restore --noflush from file
├─ API mutate: apply iptables -A/-D → regen snapshot via iptables-save
└─ Reconcile: central periodically diffs its DB vs coold's live
`iptables -S COOLIFY-ALLOW`; pushes deltas to re-converge
```
Source of truth for **the set of rules that should exist** = central Coolify DB. Source of truth for **what's programmed in the kernel right now** = kernel itself, mirrored to `/etc/coolify/allow.rules` for reboot. coold does not keep its own DB.
#### Write ordering (crash/reboot safety)
Every mutating call from central → coold follows this sequence:
1. **Central writes to its own DB first** (with its own audit/tenant metadata). Durable with the rest of Coolify's state.
2. **Central sends command over the open stream** to coold with just `(src, dst, proto, port)`. No inbound connection to coold — the stream was already established by coold at boot.
3. **coold applies `iptables -A/-D`** to kernel.
4. **coold regenerates `/etc/coolify/allow.rules`** via `iptables-save` (atomic `.tmp` + `mv`).
5. **coold returns success to central** over the same stream (response carries the request id).
6. **On any failure in 35**, central marks the row "pending" in its DB and retries / surfaces to operator. Nothing is lost because step 1 is already durable.
Consequences:
- **Crash between steps 3 and 4** → kernel has the rule, file doesn't. Reboot loses the rule. Central's reconcile loop detects divergence (its DB has the rule, live kernel doesn't after boot) and re-pushes. Safe, with a small drift window bounded by reconcile cadence.
- **Crash between steps 4 and 5** → kernel + file both updated, but central didn't get the ack. Central retries; `iptables -C` guard makes the retry a no-op. Safe.
- **coold down when central wants to mutate** → central queues the change and retries on reconnect. No state loss on either side.
- **Central DB is authoritative** — a reboot can only *shrink* the live rule set compared to central's view, never grow it.
Bulk ops (`/bulk`) ship the whole batch in one REST call. coold applies via `iptables-restore --noflush` / `nft -f` (atomic transaction), then regens snapshot once.
Apply paths:
| Backend | Bulk apply (1000 rules) | Atomicity |
|---|---|---|
| `iptables -A` per rule | ~5s | per-rule |
| `iptables-restore --noflush` (preferred for iptables-legacy) | ~50ms | per-batch |
| `nft -f /tmp/rules.nft` (preferred when host uses nftables backend) | ~10ms | atomic transaction |
coold detects backend (`iptables --version` or presence of nftables socket) and picks. Bootstrap doesn't care.
For **systemctl restart coolify-mesh-fw.service** (e.g. a `coolify init bootstrap` re-run after a flag flip, or `coolify init extend` reinstalling the unit because the namespace list changed): the unit flushes COOLIFY-INTRA but **never flushes COOLIFY-ALLOW** — existing rules survive. If somehow lost (manual `iptables -F COOLIFY-ALLOW`, crash mid-write), central's reconcile loop compares its own DB against `iptables -S COOLIFY-ALLOW` from each host and re-pushes any missing tuples within the reconcile interval.
#### Allow API surface
Same method/path set is served on both transports — stream (central → coold) and local REST (intra-mesh → coold). Stream = JSON-RPC frames carrying the same `(method, path, body)` tuple; REST = plain HTTP on wg0 mgmt IP :8443.
```
POST /api/v1/firewall/allow {src, dst, proto?, port?, comment?} → returns id
DELETE /api/v1/firewall/allow/{id}
GET /api/v1/firewall/allow list
GET /api/v1/firewall/allow/{id} show + match counters
POST /api/v1/firewall/allow/bulk {add: [...], remove: [...]} atomic batch
POST /api/v1/firewall/reconcile force full reload
```
coold translates each row into the right iptables/nft fragment. Per-port: `-p tcp --dport <N>`. Source/dest IP, CIDR, or set reference (for grouping like "all-frontend-ips").
For very large rule sets: use **nftables sets** so a rule references a set name, and the set membership changes are O(1):
```
nft add element ip filter coolify_allowed_pairs { 10.210.0.10 . 10.210.1.10 }
```
One static rule like `ct state new ip saddr . ip daddr @coolify_allowed_pairs accept` evaluates in O(log n) regardless of set size. coold maintains the set rather than thousands of rules. Optional optimization for v5+.
#### Intra-host isolation (NOT enforced by `--default-deny`)
Linux + netavark + Ubuntu 24.04: bridge L2 traffic bypasses iptables FORWARD even with `bridge-nf-call-iptables=1`. **Containers on the same host's `coolify-mesh` bridge can always reach each other.**
Two paths for v5 to enforce intra-host isolation:
- **(Recommended) Per-app podman networks**: each Coolify service = own podman network with `--opt isolate=true`. Different networks can't talk by default; use `podman network connect` for cross-app.
- Trade-off: each network needs its own `/24` from container pool → wastes pool. Or carve `/27`s (allocator extension needed).
- **(Alternative) ebtables L2 filter**: `ebtables --logical-in podman1 --logical-out podman1 --ip-src X --ip-dst Y -j ACCEPT/DROP`. Independent toolchain, separate persistence. Bridge name discovery needed.
v1 ships without intra-host enforcement. v5 picks one path.
### 4. Container IP allocation per host
The bootstrap gives each host a `/24` (e.g. `10.210.0.0/24`). The control plane:
- Reserves `.1` (bridge gateway, skip).
- Allocates `.2-.254` for containers, deduplicated against running `podman ps` IPs.
- Pins IP via `podman run --ip <IP>` so DNS/firewall rules stay stable.
- Detects exhaustion early; alerts user to grow `--container-pool` or `--container-prefix`.
For `/24` per host: 253 containers max. For higher density: re-bootstrap with `--container-prefix 23` or larger pool.
### 5. Service discovery
**Pattern**: embedded DNS server in coold, backed by [Corrosion](https://github.com/superfly/corrosion) (CRDT sqlite gossiped via SWIM across the mesh). No env injection. No container restarts on backend movement.
#### Why DNS-via-coold over alternatives
| Approach | Stable target? | Backend move = restart? | Complexity |
|---|---|---|---|
| Env injection (`DB_HOST=10.210.5.42`) | no — IP changes | yes (rolling redeploy on every change) | medium (template engine + dep graph) |
| **Embedded DNS in coold** | **yes (hostname)** | **no** | **low (~200 LoC)** |
| VIP per service | yes (IP) | no | high (keepalived/BGP/IPVS) |
| Per-host HTTP/TCP proxy | yes (port) | no | medium (proxy config) |
DNS chosen: smallest moving parts, works for any protocol, standard `getaddrinfo()` path, ubiquitous client support.
#### Corrosion schema (replicated sqlite)
```sql
CREATE TABLE services (
id TEXT PRIMARY KEY, -- "myapp.db"
coolify_app_id TEXT NOT NULL,
name TEXT NOT NULL, -- "db"
namespace TEXT NOT NULL, -- "myapp"
port INTEGER, -- canonical port (informational)
updated_at INTEGER NOT NULL -- ms epoch (CRDT clock)
);
CREATE TABLE service_endpoints (
service_id TEXT NOT NULL,
container_id TEXT NOT NULL,
host_mgmt_ip TEXT NOT NULL, -- 100.64.0.X (host running the container)
container_ip TEXT NOT NULL, -- 10.210.X.Y
healthy INTEGER NOT NULL,
updated_at INTEGER NOT NULL,
PRIMARY KEY (service_id, container_id)
);
```
Each coold writes its own host's container facts. Reads are local sqlite (sub-ms). Gossip handles distribution; convergence ~1s in small clusters.
#### Embedded DNS server
```go
// pseudocode — ~200 LoC total
func (c *Coold) serveDNS() {
pc, _ := net.ListenPacket("udp", "10.210.X.1:53") // bridge gateway IP
for {
buf := make([]byte, 512)
n, addr, _ := pc.ReadFrom(buf)
go c.handle(buf[:n], addr, pc)
}
}
func (c *Coold) handle(query []byte, src net.Addr, pc net.PacketConn) {
msg := dns.Unpack(query)
name := msg.Questions[0].Name // "myapp.db.coolify.internal."
if !strings.HasSuffix(name, ".coolify.internal.") {
// Forward to upstream (configurable; default 1.1.1.1).
pc.WriteTo(c.upstream.Query(msg), src)
return
}
serviceID := strings.TrimSuffix(name, ".coolify.internal.")
var ips []string
c.corrosion.Query(`
SELECT container_ip FROM service_endpoints
WHERE service_id = ? AND healthy = 1
`, serviceID).Scan(&ips)
if len(ips) == 0 {
pc.WriteTo(dns.NXDOMAIN(msg), src); return
}
pc.WriteTo(dns.AnswerA(msg, ips, ttl=5), src)
}
```
Listens on **bridge gateway IP** (`10.210.X.1:53`) of the host's `coolify-mesh` bridge — reachable from every container in the host's `/24` via standard kernel routing.
#### Container creation hook
Every container coold creates gets:
```
podman run --dns 10.210.X.1 --dns-search coolify.internal ...
```
App code uses short names: `getaddrinfo("myapp.db", ...)` → libc appends search suffix → `myapp.db.coolify.internal` → coold answers from local Corrosion.
#### Resolution flow
```
1. App in container A on host-1 (10.210.0.10) calls getaddrinfo("myapp.db")
2. libc reads /etc/resolv.conf:
nameserver 10.210.0.1
search coolify.internal
3. UDP query "myapp.db.coolify.internal" → 10.210.0.1:53
4. coold@host-1 reads local Corrosion → 10.210.5.42 (running on host-3)
5. Reply: A 10.210.5.42, TTL=5
6. App opens TCP to 10.210.5.42:5432
7. Routed via wg0 (peer host-3's AllowedIPs covers 10.210.5.0/24)
→ bridge → container
8. (If --default-deny is on, COOLIFY-ALLOW on host-3 must permit
10.210.0.10 → 10.210.5.42.)
```
#### Backend movement (zero restart on dependents)
```
T+0: myapp.db @ 10.210.5.42 on host-3. Endpoint row gossiped.
T+10s: User redeploys myapp.db on host-3.
coold@host-3:
- new container at 10.210.5.43
- INSERT new endpoint row (10.210.5.43)
- DELETE old endpoint row (10.210.5.42)
- kill old container
Corrosion gossips delta.
T+11s: All hosts have updated state.
T+15s: App on host-1 has stale TCP to 10.210.5.42 — broken when old container died.
App's reconnect logic re-resolves myapp.db → 10.210.5.43 → reconnects.
App container NEVER restarted, env NEVER changed.
```
App must have reconnect logic (every reasonable DB/cache client does). DNS provides the new IP transparently.
#### TTL
5s. Trade-off:
- Lower = faster failover, more queries.
- Higher = quieter DNS, slower failover.
Apps with infinite-cache resolvers (Java's `networkaddress.cache.ttl=-1`) won't see updates. Document for users; not coold's problem.
#### Multi-replica services
Resolver returns ALL healthy A records. Apps with proper conn pools (postgres, redis clients) handle multi-target naturally. No client-side LB protocol needed.
#### Health & staleness
- coold marks `healthy=0` on healthcheck fail. DNS stops returning that IP within next query.
- Stale-row TTL: rows older than 60s without heartbeat are pruned (owning coold heartbeats every 15s).
#### TLD
`.coolify.internal``.internal` is RFC 6761 reserved for private use. Won't collide with public TLDs. Configurable per-cluster.
#### Failure modes
| Failure | Behaviour |
|---|---|
| coold dies | Cluster DNS resolution stops. systemd restarts coold (~3s). Existing connections survive. Same profile as k8s losing CoreDNS. |
| Corrosion split-brain | Each partition serves local view; CRDT merges cleanly when partition heals. May serve stale IPs during partition. |
| Backend healthy in DB but unreachable | DNS returns IP → app connection fails → app retries. If multi-replica, may pick different one on retry. |
| Container has no `--dns` (created outside coold) | No cluster resolution. Document: only coold-managed containers get discovery. |
| Cross-region high latency | Slower convergence; stale DNS for 1030s. Acceptable v1. |
#### API surface
Same dual-transport model as the firewall API — stream from central, REST from intra-mesh callers.
```
POST /api/v1/services/register {service_id, app_id, name, namespace, port, container_id, container_ip, host_mgmt_ip}
DELETE /api/v1/services/{service_id}/endpoints/{container_id}
GET /api/v1/services/{service_id}/endpoints
GET /api/v1/services?namespace=myapp
GET /api/v1/dns/lookup/{name} (debug — what coold would answer)
GET /api/v1/dns/stats (qps, hit/miss/forward counts)
```
Most ops are automatic side effects of deploy/scale/health-check. Central rarely calls `/services/register` directly — coold registers on container create, deregisters on remove.
coold writes Corrosion rows on behalf of central (explicit `POST /services/register` frames); it does not infer service identity from container labels. Central supplies `service_id` explicitly so naming policy stays in one place.
#### Bootstrap impact
Minimal. `coolify init bootstrap` creates every `coolify-<ns>-mesh` Podman network with `--disable-dns` so netavark never starts aardvark-dns on the bridge gateway `:53`. coold owns that socket. Bridge gateway IP was always reserved by `MachineIP()`.
Pre-alpha deployments that created the network without `--disable-dns` are detected at plan-time (probe reads `podman network inspect .DNSEnabled`). A `recreate-podman-network` action drops and recreates the network — same subnet, same gateway, but with DNS disabled. Any attached containers are disconnected via `podman network rm -f`.
#### Port 53 conflict handling
Three layers protect coold's `10.210.X.1:53` socket:
| Layer | Mechanism | Covers |
|---|---|---|
| 1. Bootstrap | `podman network create --disable-dns` (+ drift recreate) | aardvark-dns squat |
| 2. Bind target | coold binds **bridge gateway IP only**, not `0.0.0.0` and not wg0 mgmt IP | host wildcard DNS daemons (dnsmasq/pihole on `0.0.0.0:53`) and wg0 bloat |
| 3. Preflight | `net.Listen("tcp", gateway+":53")` probe before `ListenPacket` | clear actionable error + systemd `Restart=on-failure` retry |
systemd-resolved on Ubuntu binds `127.0.0.53:53` — no conflict with bridge gateway.
Bind rule: coold DNS is container-facing only (listen on bridge gateway IP). coold REST API is operator-facing (listen on wg0 mgmt IP, port 8443). Separate concerns, separate sockets.
### 6. Ingress (public traffic → containers)
`coolify init` doesn't manage public ingress. v5 deploys a reverse proxy (Traefik/Caddy) per host or HA pair:
- Listens on host public IP `:80/:443`.
- Routes `Host: app.example.com` → container IP (over container bridge or wg0 if cross-host).
- Cert management via ACME.
- Coolify generates proxy config from app routing rules.
Important: ingress proxy needs its own podman network OR can share `coolify-mesh`. Sharing means proxy can reach all containers — fine since it's the entrypoint.
### 7. Deployment workflows
Deploy is a **central-side state machine** that compiles app intent (compose / Dockerfile / buildpack / Nixpacks / raw image) into a sequence of coold primitives (see §2 wire surface). coold does not participate in planning — it executes one primitive per frame.
#### Build pipeline (not in coold)
```
git push
Central receives webhook
Builder (BuildKit / Buildpacks / Nixpacks) ← coold NOT involved
- Self-hosted: first mesh host by default;
central may pin via target_host_id per build.
- Cloud: central-run.
Push to registry (registry.coolify.io or customer's) ← coold NOT involved
Central deploy controller → primitive op stream → coold on target host
```
coold's only role in the build path: `POST /images/pull` once the tag exists in the registry.
#### Deploy flow (T0T10 — every frame = one §2 primitive)
```
T0 Central builder clones source, invokes BuildKit / buildpack / nixpacks.
Output: OCI image @ registry.coolify.io/tenant/web:v2.
T1 Central deploy controller picks target host H (scheduler = least-loaded / pin).
T2 Frame: POST /images/pull {ref: "registry.coolify.io/tenant/web:v2"}
coold@H calls podman.sock /images/create, streams progress back.
T3 Frame: POST /volumes {name: "web-data", driver: "local"}
coold@H idempotent; no-op if exists.
T4 Frame: POST /containers (central templates from compose + resolved secrets)
body:
{
"image": "registry.coolify.io/tenant/web:v2",
"name": "web-v2-a3f91",
"network": "coolify-mesh",
"ip": "10.210.H.42",
"dns": ["10.210.H.1"],
"dns_search": ["coolify.internal"],
"env": {"DATABASE_URL": "postgres://…"},
"mounts": [{"volume": "web-data", "target": "/data"}],
"healthcheck": {"test": ["CMD","curl","-f","http://localhost/"], "interval": "5s"},
"labels": {"coolify.app": "web", "coolify.version": "v2"}
}
coold checks deny filter → calls podman.sock /containers/create → returns id.
T5 Frame: POST /containers/{id}/start
coold starts container.
T6 Central polls GET /containers/{id} or subscribes to events.
Wait for healthy; abort + rollback on timeout.
T7 Frame: POST /services/register
coold writes Corrosion row. Gossip distributes; DNS now answers new IP.
T8 Frame: POST /firewall/allow (on dst host — coold = sole kernel writer)
{src: proxy-ip, dst: 10.210.H.42, proto: "tcp", port: 80}
T9 Central ingress controller regenerates proxy config (Caddy/Traefik/nginx)
→ upstreams point to new container IP.
Frame: POST /containers/{proxy-id}/exec (reload) or proxy-specific reload.
T10 Cutover complete. Central retires the old container:
POST /containers/{old-id}/stop {timeout: 10}
DELETE /containers/{old-id}
DELETE /services/web/endpoints/{old-container-id}
DELETE /firewall/allow/{old-rule-id}
```
Every T-frame is one of the narrow primitives in §2. coold never runs compose, never builds, never picks hosts, never reads app config. If a future verb is needed, it gets added to §2 and the coold release, not smuggled through a passthrough.
**coold non-goals for deploy**: no compose parser, no buildpacks, no Dockerfile handler, no Nixpacks, no scheduler, no ingress templating, no rollback orchestration, no secrets store.
### 8. Storage & volumes
- Local podman volumes per host (`/var/lib/containers/storage/volumes`).
- Cross-host: distributed FS (out of scope) OR pin stateful services to a host (anti-affinity rules in scheduler).
- Backup: `podman volume export` + scp to backup target. Coolify orchestrates schedule.
- **v5 alpha decision**: stateful services **pin to host**. Cross-host volume movement / distributed FS is post-alpha.
### 9. Scheduling
**Placement lives in central.** coold provides facts (`GET /host/info`, `/host/stats`, `/host/containers`); central consumes them, picks the target host, and sends the resulting primitives. coold has no placement logic.
When user creates an app, central decides which host runs it:
- Round-robin / least-loaded / explicit pin.
- Pinned services (DB, persistent volumes) tracked in central DB.
- Re-schedule on host failure (wg0 down, last-handshake stale).
Failure detection: central polls `wg show wg0 latest-handshakes` via `GET /host/info` on every host, parses seconds-since-handshake; alerts if > N seconds.
### 10. Observability
coold exposes read-only `/host/*` endpoints surfacing the facts below. Central (or a central-side scraper) pulls from each host and feeds Prometheus / VictoriaMetrics. coold does **not** push metrics.
Per host metrics (over wg0 via coold endpoints):
- `GET /host/info` → podman info (version, storage driver, free space), kernel, wg state, load.
- `GET /host/containers``podman ps -a --format json` state.
- `GET /host/stats``podman stats --no-stream --format json` CPU/mem per container.
- Wg handshake + transfer bytes via `GET /host/info` (`wg show wg0 dump` internally).
- `iptables -nvL COOLIFY-ALLOW` match counters (for audit) exposed through `GET /firewall/allow` with counters.
Stream into central time-series store (Prometheus / VictoriaMetrics).
### 11. Updates
- Coolify runtime image self-updates (container restart with new image).
- WireGuard / Podman package updates: `coolify init bootstrap` re-runs idempotently and picks up newer packages from apt. Agent (coold/corrosion/scheduler/builder) bumps go through `coolify init upgrade --coold-version vX.Y.Z` etc. Schedule periodic re-apply (weekly?).
- Mesh config changes (new host, removed host) trigger re-apply on all hosts; control plane orchestrates.
### 12. Security posture
- **Private keys never leave hosts**: WG private key generated on remote, never transits SSH (already done by bootstrap).
- **Podman socket access**: `/run/podman/podman.sock` stays as a rootful Unix socket on each host — **NEVER exposed on TCP**. Only **coold** (per-host agent, see §2) has access via bind-mount. coold surfaces a curated REST API over wg0 with TLS + bearer auth. This means:
- Compromise of a non-coold container does NOT grant podman API access.
- coold enforces bearer-token authn and can deny dangerous flags (e.g. `--privileged`) at the API surface. RBAC, per-user/tenant scoping, and business audit live **only** in central Coolify (see §3 split).
- No `podman system service tcp://...` listener; no need for socket-level TLS.
- Central Coolify only knows the coold endpoint, not the underlying socket.
- **SSH access**: bootstrap uses key-based SSH. Control plane should rotate SSH keys per agent install, store in encrypted DB. After bootstrap, day-to-day ops go via coold REST — SSH is for re-bootstrap only.
- **Host firewall (iptables INPUT chain)**: bootstrap doesn't lock down INPUT. v5 should drop public access to ports other than `:51820/udp` (WG), `:22/tcp` (SSH), `:80/:443` (ingress). coold's `:8443` binds to the wg0 IP only, so it's already not on the public interface.
- **coold port reachability**: central never dials in — coold's outbound stream is the control path — so no `COOLIFY-ALLOW` rule for central is needed. coold's local REST on wg0 mgmt IP (`:8443`) is reachable only from inside the mesh, and is used by (a) the `coolify firewall` CLI via SSH-bounce, (b) other coolds in the same mesh, (c) an optional per-customer gateway. Nothing on the public internet reaches coold. Outbound TLS :443 to central must be permitted by the customer's egress firewall — standard for any SaaS agent.
- **Audit**: central Coolify is the sole authoritative audit log — who-when-why metadata for every COOLIFY-ALLOW change. coold writes only an ops/debug request log (request id, endpoint, status, duration) for troubleshooting; it never sees the identity of the human caller, only the bearer token used to reach it.
### 13. Failure modes & recovery
| Failure | Detection | Recovery |
|---|---|---|
| Host SSH unreachable | bootstrap apply error | Manual investigation; node marked unhealthy in DB |
| WG peer offline (`latest_handshake > 180s`) | `wg show` poll | Mark unhealthy; re-schedule containers if pinning permits |
| Podman socket unreachable | API call timeout | Restart `podman.socket`; if persistent, re-bootstrap |
| Firewall service failed | `systemctl is-active != active` | Re-run `coolify init bootstrap`; service is idempotent |
| Container OOM/crash | `podman events` watcher | Restart per restart policy; alert after N crashes |
| Container subnet exhausted | allocator returns error | Alert; offer apply with bigger `--container-prefix` |
| Mgmt IP exhausted | allocator returns error | Alert; rare for /16 |
| `coolify-mesh` bridge missing | probe `podman network exists` returns no | Re-run apply |
| User manually deletes COOLIFY-ALLOW chain | runtime check | Re-run apply (recreates chain via service restart) |
### 14. Multi-tenancy (deferred)
If Coolify ever supports tenant isolation:
- Tenant = own podman network namespace per host.
- Allows always scoped within tenant; cross-tenant requires explicit allow.
- Pool subdivided per tenant. Allocator extension.
Not in v1 or v5 initial.
---
## Out of scope (now and likely v5)
- Rootless containers (would need user namespace mapping, separate sockets per user).
- IPv6 mesh (`fdcc::` style, ip6tables mirror).
- Hardware-level isolation (SELinux profiles, AppArmor).
- Live migration (qemu/criu).
- Distributed storage (Ceph/Longhorn).
- macvlan / SR-IOV networking.
- Autoscaling.
- BGP / external network announcements.
---
## Quick reference — operations the agent CLI should expose
(Future `coolify-cli` subcommands beyond `init`)
```
coolify deploy <app> # build + push + run
coolify scale <app> --replicas N
coolify firewall containers --servers A,B ... # discover mesh containers (SSH+podman)
coolify firewall list --servers A,B ... # list allow rules across hosts (coold GET /allow, SSH-bounced)
coolify firewall allow --from <ref> --to <ref> --port N # add allow rule (coold POST /allow, SSH-bounced)
coolify firewall revoke --from <ref> --to <ref> --port N # remove allow rule (coold DELETE /allow/{id})
coolify host list # show mesh state, last-handshake, container count
coolify host add <ip> --ssh-key K
coolify host remove <ip>
coolify logs <container>
coolify exec <container> -- sh
```
`coolify firewall` is implemented today as a thin SSH-bounced REST client of coold (§3 above). The laptop running the CLI isn't a mesh peer, so every call SSHes into the target host and runs `curl "http://<wg0-mgmt-ip>:8443/api/v1/firewall/..."` against coold locally. Per-host bearer tokens are fetched from `/etc/coolify/api-token` on demand (with `--coold-token` as an override for homogeneous test clusters).
Everything else on the roadmap (`coolify deploy`, `coolify scale`, `coolify logs`, `coolify exec`) targets the **central** API (SaaS or self-hosted central), not coold directly. Central compiles the request into the primitive-op sequence in §7 and streams it to coold. Only `coolify firewall` currently bypasses central and hits coold directly — legacy + test harness until central wires up `/firewall/*` itself.
---
## Summary
`coolify init bootstrap` does the **first-time host install**: WG mesh, podman runtime, bridge network, default-deny scaffold, coold/corrosion/scheduler/builder agents. `coolify init extend` adds hosts to an existing mesh without disturbing converged ones; `coolify init upgrade` bumps agent versions across the fleet. After that, **everything dynamic is the v5 control plane's job**: container lifecycle, allow rules in COOLIFY-ALLOW (via systemd dropins for persistence), scheduling, observability, ingress, updates.
The pieces communicate via:
1. **SSH** for host provisioning + re-converge (idempotent `coolify init bootstrap` / `extend` / `upgrade` re-runs). SSH is the installer channel only, not a steady-state control path.
2. **coold → central outbound stream** (WSS / gRPC bidi on :443) for day-to-day runtime ops from central. One topology for self-hosted and cloud SaaS; central never dials coold, never joins any mesh. Per-customer gateway (optional) collapses N streams into 1 per mesh.
3. **coold local REST API** on wg0 mgmt IP (`http://100.64.0.X:8443`) for intra-mesh callers: the `coolify firewall` CLI via SSH-bounce, other coolds, the per-customer gateway. Never reachable from the public internet.
coold is the *only* process with access to the local podman socket AND the sole writer of allow rules in COOLIFY-ALLOW. Both transports hit the same API surface.
Persistence model:
- Bootstrap state (chains, jumps, conntrack accept) → idempotent `coolify init bootstrap` re-runs (and `extend` when a namespace is added).
- Rule metadata (who/when/why, audit, RBAC, tenant scoping) → central Coolify DB only. coold does not duplicate this.
- Kernel rules → programmed by coold on every API call (from either central Coolify or the `coolify firewall` CLI); mirrored to `/etc/coolify/allow.rules` for reboot via `coolify-mesh-allow.service` (oneshot `iptables-restore --noflush`).
- Today the `coolify firewall` CLI is the primary caller of coold (SSH-bounced REST client with per-host `/etc/coolify/api-token` resolution). Central Coolify will call the same API once wired.
The podman socket is host-local. There is no TCP podman API. coold is the **authn + privilege boundary** between any caller (central Coolify over the outbound stream, or the `coolify firewall` CLI via SSH-bounced local REST) and the host, AND the kernel-rule applier. Central Coolify owns RBAC, tenant scoping, and the business audit log (who/when/why). coold only verifies a bearer token (per-host static for local REST; per-host JWT for the stream), applies the rule, and keeps an ops/debug request log. `coolify firewall` exercises the local REST surface today; central will exercise the stream surface — same code path end-to-end, different transport.
**coold stays small.** All app-aware logic (compose, Dockerfile, buildpacks, Nixpacks, scheduling, rollback, ingress templating, RBAC, audit) lives in central. coold's wire surface is enumerable (§2); new verbs require a coold release, not a `/podman/raw` passthrough. If coold ever grows a `/apps` or `/compose` endpoint, that is the wrong layer.
+24 -23
View File
@@ -44,30 +44,27 @@ Once you publish the release:
- **Linux**: amd64, arm64
- **macOS (Darwin)**: amd64, arm64
- **Windows**: amd64, arm64
3. Goreleaser injects the version from the tag into the binaries
3. Goreleaser injects the version from the tag into the binaries via ldflags (into `internal/version.version`)
4. Binaries are automatically uploaded to the release
5. The release becomes available at:
5. A follow-up `update-version` job then:
- Updates the `version` constant in `internal/version/checker.go` to the new tag
- Commits the bump to `v4.x` as `chore: bump version to vX.Y.Z`
- Force-moves the release tag to point at that new commit
6. GoReleaser also publishes a Homebrew formula to the tap at [`coollabsio/homebrew-coolify-cli`](https://github.com/coollabsio/homebrew-coolify-cli) (under `Formula/coolify-cli.rb`), using the `HOMEBREW_TAP_GITHUB_TOKEN` secret
7. The release becomes available at:
- GitHub: `https://github.com/coollabsio/coolify-cli/releases/tag/v1.x.x`
- Install script: `curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash`
- Homebrew: `brew install coollabsio/coolify-cli/coolify-cli`
- `go install`: `go install github.com/coollabsio/coolify-cli/coolify@v1.x.x`
### 3. Verify the Release
After the workflow completes (usually 2-5 minutes):
After the workflow completes (usually 2-5 minutes), verify without touching your local install:
1. Check the release page has all platform binaries
2. Test the install script:
```bash
curl -fsSL https://cdn.coollabs.io/coolify/install.sh | bash
coolify version
```
3. Test the auto-update functionality:
```bash
# If you have an older version installed
coolify update
coolify version # Should show the new version
```
4. Verify the version matches your release
1. Check the release page has all platform binaries (Linux/macOS/Windows × amd64/arm64)
2. Confirm the `update-version` job committed the bump on `v4.x` (look for `chore: bump version to vX.Y.Z`) and that the tag now points at that commit
3. Confirm `internal/version/checker.go` on `v4.x` has the new version
4. Confirm the Homebrew tap has a new `Formula/coolify-cli.rb` commit for this version at https://github.com/coollabsio/homebrew-coolify-cli
## Troubleshooting
@@ -79,9 +76,10 @@ After the workflow completes (usually 2-5 minutes):
- GoReleaser configuration issues
### Version Not Updating
- Ensure you committed the version change in `cmd/root.go`
- The version is injected at build time via ldflags into `internal/version.version` — you do **not** need to edit it manually before releasing. The post-release `update-version` job also rewrites `internal/version/checker.go` on `v4.x`.
- If the hardcoded fallback in `internal/version/checker.go` is stale, check that the `update-version` job ran successfully after the release.
- The tag must start with `v` (e.g., `v1.2.3`, not `1.2.3`)
- Check that the workflow has write permissions
- Check that the workflow has write permissions (`contents: write` in `release-cli.yml`)
### Install Script Not Finding New Version
- Wait a few minutes for GitHub's CDN to update
@@ -94,30 +92,33 @@ Before creating a release:
- [ ] All tests pass: `go test ./internal/...`
- [ ] Code is formatted: `go fmt ./...`
- [ ] Version updated in `cmd/root.go`
- [ ] Changes merged to `v4.x` branch
- [ ] Release notes prepared
> Note: You do **not** need to bump the version manually. GoReleaser injects the tag version via ldflags, and the `update-version` CI job commits the bump to `internal/version/checker.go` after the release.
After creating a release:
- [ ] GitHub Actions workflow completed successfully
- [ ] GitHub Actions workflow completed successfully (both `release-cli` and `update-version` jobs)
- [ ] All platform binaries are present on the release page
- [ ] Install script downloads the new version
- [ ] `coolify version` returns the correct version
- [ ] `internal/version/checker.go` on `v4.x` shows the new version
- [ ] Homebrew tap has a fresh `Formula/coolify-cli.rb` commit
## Configuration Files
The release process uses these configuration files:
- `.goreleaser.yml` - GoReleaser configuration (build matrix, archives, etc.) - points to `/coolify` as entry point
- `.goreleaser.yml` - GoReleaser configuration (build matrix, archives, Homebrew tap) - entry point is `./coolify/main.go`
- `.github/workflows/release-cli.yml` - GitHub Actions workflow
- `scripts/install.sh` - User-facing install script
- `internal/version/checker.go` - Contains `GetVersion()` function that returns the current version
- `coolify/main.go` - Binary entry point for `go install` support
- [`coollabsio/homebrew-coolify-cli`](https://github.com/coollabsio/homebrew-coolify-cli) - External Homebrew tap updated automatically on each release
## Notes
- The CLI has auto-update checking built-in (checks every 10 minutes)
- Users can manually update with `coolify update`
- Install script supports version pinning: `bash install.sh v1.2.3`
- Homebrew users can install via `brew install coollabsio/coolify-cli/coolify-cli` (the tap at https://github.com/coollabsio/homebrew-coolify-cli is auto-updated by GoReleaser)
- Releases are immutable - if you need to fix something, create a new patch version
+1 -1
View File
@@ -7,7 +7,7 @@
#### Linux/macOS
```bash
curl -fsSL https://raw.githubusercontent.com/coollabsio/coolify-cli/main/scripts/install.sh | bash
curl -fsSL https://gitamin.ir/IranAccess/coolify-cli/raw/branch/v4.x/scripts/install.sh | bash
```
It will install the CLI in `/usr/local/bin/coolify` and the configuration file in `~/.config/coolify/config.json`
+11
View File
@@ -5,6 +5,7 @@ import (
"github.com/coollabsio/coolify-cli/cmd/application/create"
"github.com/coollabsio/coolify-cli/cmd/application/env"
"github.com/coollabsio/coolify-cli/cmd/application/previews"
"github.com/coollabsio/coolify-cli/cmd/application/storage"
)
@@ -57,5 +58,15 @@ func NewAppCommand() *cobra.Command {
storageCmd.AddCommand(storage.NewDeleteCommand())
cmd.AddCommand(storageCmd)
// Add previews subcommand with its children
previewsCmd := &cobra.Command{
Use: "previews",
Aliases: []string{"preview"},
Short: "Manage application preview deployments",
Long: `Manage preview deployments created from pull requests. Requires the application UUID.`,
}
previewsCmd.AddCommand(previews.NewDeletePreviewCommand())
cmd.AddCommand(previewsCmd)
return cmd
}
+72
View File
@@ -0,0 +1,72 @@
package previews
import (
"fmt"
"strconv"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/cli"
"github.com/coollabsio/coolify-cli/internal/service"
)
func NewDeletePreviewCommand() *cobra.Command {
deletePreviewCmd := &cobra.Command{
Use: "delete <app_uuid> <pr_id>",
Short: "Delete a preview deployment",
Long: `Delete a preview deployment for an application. First argument is the application UUID, second is the pull request ID.`,
Args: cli.ExactArgs(2, "<app_uuid> <pr_id>"),
RunE: func(cmd *cobra.Command, args []string) error {
ctx := cmd.Context()
appUUID := args[0]
prID := args[1]
prIDInt, err := strconv.Atoi(prID)
if err != nil {
return fmt.Errorf("invalid pr_id: must be an integer")
}
if prIDInt <= 0 {
return fmt.Errorf("invalid pr_id: must be a positive integer")
}
client, err := cli.GetAPIClient(cmd)
if err != nil {
return fmt.Errorf("failed to get API client: %w", err)
}
if err := cli.CheckMinimumVersion(ctx, client, "4.0.0-beta.474"); err != nil {
return err
}
force, _ := cmd.Flags().GetBool("force")
// Prompt for confirmation unless --force is used
if !force {
var response string
fmt.Printf("Are you sure you want to delete the preview deployment for PR %s? (yes/no): ", prID)
_, err := fmt.Scanln(&response)
if err != nil {
return fmt.Errorf("failed to read confirmation: %w", err)
}
if response != "yes" && response != "y" {
fmt.Println("Delete cancelled.")
return nil
}
}
appSvc := service.NewApplicationService(client)
err = appSvc.DeletePreview(ctx, appUUID, prID)
if err != nil {
return fmt.Errorf("failed to delete preview deployment: %w", err)
}
fmt.Printf("Preview deployment for PR %s deleted successfully.\n", prID)
return nil
},
}
deletePreviewCmd.Flags().Bool("force", false, "Skip confirmation prompt")
return deletePreviewCmd
}
+105
View File
@@ -0,0 +1,105 @@
// Package common holds flag structs and helpers shared between the
// `coolify init` and `coolify firewall` command trees. Kept intentionally
// small: only cross-command plumbing (SSH mesh flags, namespace validation)
// lives here.
//
//nolint:revive // "common" is the conventional sharing point for these cobra subtrees
package common
import (
"fmt"
"regexp"
"github.com/spf13/cobra"
)
// DefaultNamespace is the namespace used when the user does not pass
// --namespaces. It is also always present (implicitly) so existing workflows
// and coold defaults keep working.
const DefaultNamespace = "default"
// PodmanNetworkFor returns the podman bridge network name that backs
// namespace ns on every host. Derived as `coolify-<ns>-mesh` so the
// namespace name is visible in `podman network ls`.
func PodmanNetworkFor(ns string) string {
return "coolify-" + ns + "-mesh"
}
// MeshNetFlags holds the flag set shared between `coolify init` (which creates
// per-namespace podman networks on every host) and `coolify firewall` (which
// talks to coold about per-namespace rules).
//
// `init` binds it as a slice so a single command sets up the entire cluster;
// `firewall` binds it as a single value since each allow/revoke/list call
// operates on one namespace at a time.
type MeshNetFlags struct {
// Namespaces enumerates every namespace the mesh should carry. At least
// one entry is required; the first element is the implicit "default"
// unless the user overrides it.
Namespaces []string
// ContainerPool is the shared address pool every namespace carves its
// per-host /<ContainerPrefix> from. One pool covers all namespaces;
// subnets never overlap.
ContainerPool string
// ContainerPrefix is the prefix length of each per-host, per-namespace
// container subnet (default 24 → 254 container IPs per host per ns).
ContainerPrefix int
}
// BindMeshNetMultiFlags registers --namespaces/--container-pool/--container-prefix
// on cmd (init-style: many namespaces per invocation).
func BindMeshNetMultiFlags(cmd *cobra.Command, f *MeshNetFlags) {
pf := cmd.PersistentFlags()
pf.StringSliceVar(&f.Namespaces, "namespaces", []string{DefaultNamespace},
"Comma-separated list of namespaces to create on each host. Each "+
"namespace is a separate Podman bridge network (coolify-<ns>-mesh) "+
"with its own /<container-prefix> per host")
pf.StringVar(&f.ContainerPool, "container-pool", "10.210.0.0/16",
"Shared container address pool — each (namespace, host) pair gets a "+
"/<container-prefix> from here, owned by that namespace's Podman bridge")
pf.IntVar(&f.ContainerPrefix, "container-prefix", 24,
"Prefix length of each per-host, per-namespace container subnet")
}
// BindMeshNetSingleFlags registers --namespace on cmd (firewall-style: one
// namespace per invocation).
func BindMeshNetSingleFlags(cmd *cobra.Command, ns *string) {
pf := cmd.PersistentFlags()
pf.StringVar(ns, "namespace", DefaultNamespace,
"Namespace the command operates against (must match a namespace created by `coolify init`)")
}
// namespaceRegex matches a valid DNS label (namespace names appear in the
// podman network name, in iptables chain names, and — post-coold-changes —
// as DNS labels like web.<ns>.coolify.internal).
var namespaceRegex = regexp.MustCompile(`^[a-z0-9]([a-z0-9-]{0,61}[a-z0-9])?$`)
// ValidateNamespaces checks that every namespace is a valid DNS label and
// that the list has no duplicates.
func (f *MeshNetFlags) ValidateNamespaces() error {
if len(f.Namespaces) == 0 {
return fmt.Errorf("--namespaces must list at least one namespace")
}
seen := make(map[string]struct{}, len(f.Namespaces))
for _, ns := range f.Namespaces {
if !namespaceRegex.MatchString(ns) {
return fmt.Errorf("invalid namespace %q (must be a DNS label: lowercase alphanumerics + '-', 1-63 chars)", ns)
}
if _, dup := seen[ns]; dup {
return fmt.Errorf("duplicate namespace %q in --namespaces", ns)
}
seen[ns] = struct{}{}
}
return nil
}
// ValidateNamespace validates a single namespace value (used by the firewall
// command's --namespace flag).
func ValidateNamespace(ns string) error {
if !namespaceRegex.MatchString(ns) {
return fmt.Errorf("invalid --namespace %q (must be a DNS label: lowercase alphanumerics + '-', 1-63 chars)", ns)
}
return nil
}
+95
View File
@@ -0,0 +1,95 @@
// Package common hosts flag sets and helpers shared between multiple
// top-level commands that SSH into a list of servers (init, firewall, ...).
package common
import (
"fmt"
"os"
"time"
"github.com/spf13/cobra"
"golang.org/x/term"
internalssh "github.com/coollabsio/coolify-cli/internal/ssh"
)
// SSHMeshFlags holds the flags shared by every command that fans out over
// a list of SSH-reachable servers (coolify init, coolify firewall, ...).
type SSHMeshFlags struct {
Servers []string
SSHKey string
SSHUser string
SSHPort int
SSHPassphrasePrompt bool
Concurrency int
SSHTimeout string
}
// BindSSHMeshFlags registers the shared flags as PersistentFlags on cmd.
func BindSSHMeshFlags(cmd *cobra.Command, f *SSHMeshFlags) {
pf := cmd.PersistentFlags()
pf.StringSliceVar(&f.Servers, "servers", nil,
"Comma-separated server IPs (required)")
pf.StringVar(&f.SSHKey, "ssh-key", "",
"Path to SSH private key used to connect to servers (required)")
pf.StringVar(&f.SSHUser, "ssh-user", "root",
"SSH username")
pf.IntVar(&f.SSHPort, "ssh-port", 22,
"SSH port")
pf.BoolVar(&f.SSHPassphrasePrompt, "ssh-passphrase-prompt", false,
"Prompt for SSH key passphrase (also reads COOLIFY_SSH_PASSPHRASE env var)")
pf.IntVar(&f.Concurrency, "concurrency", 10,
"Maximum number of parallel SSH connections")
pf.StringVar(&f.SSHTimeout, "ssh-timeout", "30s",
"SSH connection timeout (e.g. 30s, 1m)")
}
// ParseSSHTimeout parses SSHTimeout, falling back to 30s on error/zero.
func (f *SSHMeshFlags) ParseSSHTimeout() time.Duration {
d, err := time.ParseDuration(f.SSHTimeout)
if err != nil || d <= 0 {
return 30 * time.Second
}
return d
}
// ResolvePassphrase returns the SSH key passphrase in this priority order:
// 1. COOLIFY_SSH_PASSPHRASE env var
// 2. Interactive prompt when --ssh-passphrase-prompt is set
// 3. nil (no passphrase)
func (f *SSHMeshFlags) ResolvePassphrase() ([]byte, error) {
if env := os.Getenv("COOLIFY_SSH_PASSPHRASE"); env != "" {
return []byte(env), nil
}
if f.SSHPassphrasePrompt {
fmt.Fprint(os.Stderr, "SSH key passphrase: ")
pass, err := term.ReadPassword(int(os.Stdin.Fd()))
fmt.Fprintln(os.Stderr)
if err != nil {
return nil, fmt.Errorf("read passphrase: %w", err)
}
return pass, nil
}
return nil, nil
}
// BuildSSHClient creates an SSH client, resolving any key passphrase first.
func (f *SSHMeshFlags) BuildSSHClient() (*internalssh.Client, error) {
passphrase, err := f.ResolvePassphrase()
if err != nil {
return nil, err
}
return internalssh.NewClient(f.SSHKey, passphrase, f.ParseSSHTimeout())
}
// Validate checks that the required flags are set.
func (f *SSHMeshFlags) Validate() error {
if len(f.Servers) == 0 {
return fmt.Errorf("--servers is required")
}
if f.SSHKey == "" {
return fmt.Errorf("--ssh-key is required")
}
return nil
}
+57
View File
@@ -0,0 +1,57 @@
package common
import (
"testing"
"time"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestSSHMeshFlags_ParseSSHTimeout(t *testing.T) {
tests := []struct {
input string
want time.Duration
}{
{"30s", 30 * time.Second},
{"1m", time.Minute},
{"invalid", 30 * time.Second},
{"0s", 30 * time.Second},
{"", 30 * time.Second},
}
for _, tt := range tests {
f := &SSHMeshFlags{SSHTimeout: tt.input}
assert.Equal(t, tt.want, f.ParseSSHTimeout(), "input: %q", tt.input)
}
}
func TestSSHMeshFlags_Validate(t *testing.T) {
t.Run("missing servers", func(t *testing.T) {
err := (&SSHMeshFlags{SSHKey: "/k"}).Validate()
require.Error(t, err)
assert.Contains(t, err.Error(), "--servers")
})
t.Run("missing ssh key", func(t *testing.T) {
err := (&SSHMeshFlags{Servers: []string{"1.1.1.1"}}).Validate()
require.Error(t, err)
assert.Contains(t, err.Error(), "--ssh-key")
})
t.Run("valid", func(t *testing.T) {
err := (&SSHMeshFlags{Servers: []string{"1.1.1.1"}, SSHKey: "/k"}).Validate()
require.NoError(t, err)
})
}
func TestSSHMeshFlags_ResolvePassphrase_Env(t *testing.T) {
t.Setenv("COOLIFY_SSH_PASSPHRASE", "hunter2")
pass, err := (&SSHMeshFlags{}).ResolvePassphrase()
require.NoError(t, err)
assert.Equal(t, []byte("hunter2"), pass)
}
func TestSSHMeshFlags_ResolvePassphrase_NoPrompt(t *testing.T) {
t.Setenv("COOLIFY_SSH_PASSPHRASE", "")
pass, err := (&SSHMeshFlags{SSHPassphrasePrompt: false}).ResolvePassphrase()
require.NoError(t, err)
assert.Nil(t, pass)
}
-4
View File
@@ -10,10 +10,6 @@ import (
func TestNewConfigCommand(t *testing.T) {
cmd := NewConfigCommand()
if cmd == nil {
t.Fatal("NewConfigCommand() returned nil")
}
if cmd.Use != "config" {
t.Errorf("Expected Use to be 'config', got '%s'", cmd.Use)
}
+255
View File
@@ -0,0 +1,255 @@
package firewall
import (
"context"
"fmt"
"net"
"os"
"strings"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/cmd/common"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// allowRevokeFlags are the per-subcommand flags for `allow` / `revoke`.
type allowRevokeFlags struct {
From string
To string
Port int
Proto string
Bidirectional bool
}
// newAllowCommand builds `coolify firewall allow`.
func newAllowCommand(parent *Flags) *cobra.Command {
local := &allowRevokeFlags{}
cmd := &cobra.Command{
Use: "allow",
Short: "Add an allow rule (from container → to container:port)",
RunE: func(cmd *cobra.Command, _ []string) error {
return runAllowRevoke(cmd.Context(), cmd, parent, local, false)
},
}
bindAllowRevokeFlags(cmd, local)
return cmd
}
// newRevokeCommand builds `coolify firewall revoke`.
func newRevokeCommand(parent *Flags) *cobra.Command {
local := &allowRevokeFlags{}
cmd := &cobra.Command{
Use: "revoke",
Short: "Remove an allow rule",
RunE: func(cmd *cobra.Command, _ []string) error {
return runAllowRevoke(cmd.Context(), cmd, parent, local, true)
},
}
bindAllowRevokeFlags(cmd, local)
return cmd
}
func bindAllowRevokeFlags(cmd *cobra.Command, f *allowRevokeFlags) {
pf := cmd.Flags()
pf.StringVar(&f.From, "from", "",
"Source container (name, short-id, raw IP, or host:name) — required")
pf.StringVar(&f.To, "to", "",
"Destination container (name, short-id, raw IP, or host:name) — required")
pf.IntVar(&f.Port, "port", 0,
"Destination port (required unless --proto is empty)")
pf.StringVar(&f.Proto, "proto", "tcp",
"Protocol (tcp, udp, or empty for any)")
pf.BoolVar(&f.Bidirectional, "bidirectional", false,
"Also install the reverse rule on the source host (default: one-way; conntrack handles replies)")
}
func validateAllowRevokeFlags(f *allowRevokeFlags) error {
if f.From == "" {
return fmt.Errorf("--from is required")
}
if f.To == "" {
return fmt.Errorf("--to is required")
}
if f.Proto != "" && f.Proto != "tcp" && f.Proto != "udp" {
return fmt.Errorf("--proto must be tcp, udp, or empty (got %q)", f.Proto)
}
if f.Proto != "" && f.Port <= 0 {
return fmt.Errorf("--port is required when --proto is set")
}
return nil
}
func runAllowRevoke(
ctx context.Context,
cmd *cobra.Command,
parent *Flags,
local *allowRevokeFlags,
revoke bool,
) error {
if err := parent.Validate(); err != nil {
return err
}
if err := common.ValidateNamespace(parent.Namespace); err != nil {
return err
}
if err := validateAllowRevokeFlags(local); err != nil {
return err
}
runner, err := parent.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
return emitAllowRevoke(ctx, cmd, parent, local, runner, revoke)
}
// emitAllowRevoke is the core path: discover → resolve → build rule → apply.
// Split from the cobra wrapper so tests inject a fake ssh.Runner.
func emitAllowRevoke(
ctx context.Context,
cmd *cobra.Command,
parent *Flags,
local *allowRevokeFlags,
runner ssh.Runner,
revoke bool,
) error {
all, results := discoverAllViaPkg(ctx, runner, parent)
for _, r := range results {
if r.Err != nil {
fmt.Fprintf(os.Stderr, "Warning: discover %s: %v\n", r.Host, r.Err)
}
}
from, err := resolveEndpoint(local.From, all)
if err != nil {
return fmt.Errorf("--from: %w", err)
}
to, err := resolveEndpoint(local.To, all)
if err != nil {
return fmt.Errorf("--to: %w", err)
}
if from.IP == nil || to.IP == nil {
return fmt.Errorf("failed to resolve endpoint IPs (from=%s to=%s)", local.From, local.To)
}
// Determine destination host (rule owner). If `to` was resolved from a
// raw IP with no container match, try to map it via discovery first.
dstHost := to.Host
if dstHost == "" {
if h, ok := findHostForIP(to.IP, all); ok {
dstHost = h
}
}
if dstHost == "" {
return fmt.Errorf("cannot determine destination host for IP %s — no container on the mesh owns it", to.IP)
}
srcHost := from.Host
if srcHost == "" {
if h, ok := findHostForIP(from.IP, all); ok {
srcHost = h
}
}
ns := parent.Namespace
primary := ifw.AllowRule{
Host: dstHost,
Namespace: ns,
Src: from.IP,
Dst: to.IP,
Proto: local.Proto,
Port: local.Port,
Comment: "cid:" + ifw.ComputeID(ns, from.IP, to.IP, local.Proto, local.Port),
}
rules := []ifw.AllowRule{primary}
if local.Bidirectional {
if srcHost == "" {
return fmt.Errorf("--bidirectional requires the source endpoint to belong to a mesh host")
}
reverse := ifw.AllowRule{
Host: srcHost,
Namespace: ns,
Src: to.IP,
Dst: from.IP,
Proto: local.Proto,
Port: local.Port,
Comment: "cid:" + ifw.ComputeID(ns, to.IP, from.IP, local.Proto, local.Port),
}
rules = append(rules, reverse)
}
action := "allow"
past := "allowed"
if revoke {
action = "revoke"
past = "revoked"
}
tokenFor := tokenResolver(ctx, runner, parent)
for _, r := range rules {
token, terr := tokenFor(r.Host)
if terr != nil {
return fmt.Errorf("%s on %s: %w", action, r.Host, terr)
}
var rerr error
if revoke {
// Revoke by id — coold is idempotent (204 even on unknown id).
id := strings.TrimPrefix(r.Comment, "cid:")
rerr = ifw.CooldRevoke(ctx, runner, r.Host, parent.SSHUser,
parent.SSHPort, parent.CooldPort, parent.WGInterface, token, id)
} else {
rerr = ifw.CooldApply(ctx, runner, r.Host, parent.SSHUser,
parent.SSHPort, parent.CooldPort, parent.WGInterface, token, r)
}
if rerr != nil {
return fmt.Errorf("%s on %s: %w", action, r.Host, rerr)
}
fmt.Fprintf(os.Stderr, "%s on %s: %s → %s %s/%d\n",
past, r.Host, ipOrAny(r.Src), ipOrAny(r.Dst),
protoOrAny(r.Proto), r.Port)
}
rows := make([]models.AllowRuleRow, 0, len(rules))
for _, r := range rules {
rows = append(rows, models.AllowRuleRow{
Host: r.Host,
Namespace: r.Namespace,
ID: r.Comment,
Src: r.Src.String(),
Dst: r.Dst.String(),
Proto: r.Proto,
Port: r.Port,
Comment: r.Comment,
})
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
if format == "" {
format = output.FormatTable
}
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
if format == output.FormatJSON || format == output.FormatPretty {
return formatter.Format(models.FirewallAllowOutput{Rules: rows})
}
return formatter.Format(rows)
}
func ipOrAny(ip net.IP) string {
if ip == nil {
return "any"
}
return ip.String()
}
func protoOrAny(p string) string {
if p == "" {
return "any"
}
return p
}
+247
View File
@@ -0,0 +1,247 @@
package firewall
import (
"context"
"strings"
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coollabsio/coolify-cli/cmd/common"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
func TestValidateAllowRevokeFlags(t *testing.T) {
t.Run("missing from", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{To: "x", Port: 80, Proto: "tcp"})
require.Error(t, err)
assert.Contains(t, err.Error(), "--from")
})
t.Run("missing to", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "x", Port: 80, Proto: "tcp"})
require.Error(t, err)
assert.Contains(t, err.Error(), "--to")
})
t.Run("missing port with proto", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "a", To: "b", Proto: "tcp"})
require.Error(t, err)
assert.Contains(t, err.Error(), "--port")
})
t.Run("bad proto", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "a", To: "b", Proto: "icmp", Port: 1})
require.Error(t, err)
})
t.Run("ok tcp", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "a", To: "b", Proto: "tcp", Port: 80})
require.NoError(t, err)
})
t.Run("ok no-proto no-port", func(t *testing.T) {
err := validateAllowRevokeFlags(&allowRevokeFlags{From: "a", To: "b", Proto: "", Port: 0})
require.NoError(t, err)
})
}
// cmdFakeRunner matches a Runner call against substrings in its response map
// and returns the first hit. Mirrors cmd/init/plan_test.go's pattern.
type cmdFakeRunner struct {
responses map[string]string
calls []string
}
func (f *cmdFakeRunner) Run(_ context.Context, _, _ string, _ int, cmd string) (string, string, error) {
f.calls = append(f.calls, cmd)
for sub, resp := range f.responses {
if strings.Contains(cmd, sub) {
return resp, "", nil
}
}
return "", "", nil
}
var _ ssh.Runner = (*cmdFakeRunner)(nil)
func rootCmdFor(cmd *cobra.Command) {
root := &cobra.Command{Use: "coolify"}
root.PersistentFlags().String("format", "table", "")
root.AddCommand(cmd)
}
// parentWithToken builds a Flags pre-wired for the REST path:
// single test host, coold port 8443, non-empty bearer token.
func parentWithToken() *Flags {
return &Flags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"h1"}, SSHUser: "root", SSHPort: 22, Concurrency: 1,
},
Namespace: common.DefaultNamespace,
CooldToken: "test-token",
CooldPort: 8443,
WGInterface: "wg0",
}
}
func TestEmitAllowRevoke_PostsOneAllowToCoold(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
}}
parent := parentWithToken()
local := &allowRevokeFlags{
From: "10.210.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.NoError(t, err)
var posts []string
for _, c := range fr.calls {
if strings.Contains(c, "-X POST") && strings.Contains(c, "/api/v1/firewall/allow") {
posts = append(posts, c)
}
}
assert.Len(t, posts, 1)
// Token carried in Authorization header.
assert.Contains(t, posts[0], "Authorization: Bearer test-token")
// JSON body carries namespace + src/dst/port.
assert.Contains(t, posts[0], `"namespace":"default"`)
assert.Contains(t, posts[0], `"src":"10.210.1.5"`)
assert.Contains(t, posts[0], `"dst":"10.210.0.10"`)
assert.Contains(t, posts[0], `"port":80`)
// Discovers mgmt IP via wg0 before curl.
assert.Contains(t, posts[0], "ip -4 -o addr show wg0")
}
// TestEmitAllowRevoke_CarriesNonDefaultNamespace verifies that the user's
// chosen namespace propagates into the JSON body (and therefore into the
// cid hash coold will compute).
func TestEmitAllowRevoke_CarriesNonDefaultNamespace(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.220.0.10",
}}
parent := parentWithToken()
parent.Namespace = "alpha"
local := &allowRevokeFlags{
From: "10.220.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.NoError(t, err)
var post string
for _, c := range fr.calls {
if strings.Contains(c, "-X POST") {
post = c
}
}
assert.NotEmpty(t, post)
assert.Contains(t, post, `"namespace":"alpha"`)
// Discovery targets the alpha-namespace bridge, not the default one.
var psCalls []string
for _, c := range fr.calls {
if strings.Contains(c, "podman ps") {
psCalls = append(psCalls, c)
}
}
assert.NotEmpty(t, psCalls)
assert.Contains(t, psCalls[0], "coolify-alpha-mesh")
}
func TestEmitAllowRevoke_Bidirectional(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10\nbbb222222222|client|10.210.1.5",
}}
parent := parentWithToken()
local := &allowRevokeFlags{
From: "10.210.1.5", To: "10.210.0.10", Proto: "tcp", Port: 80, Bidirectional: true,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.NoError(t, err)
var posts int
for _, c := range fr.calls {
if strings.Contains(c, "-X POST") && strings.Contains(c, "/api/v1/firewall/allow") {
posts++
}
}
assert.Equal(t, 2, posts)
}
func TestEmitAllowRevoke_RevokeIssuesDelete(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
}}
parent := parentWithToken()
local := &allowRevokeFlags{
From: "10.210.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "revoke"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, true)
require.NoError(t, err)
var deletes []string
for _, c := range fr.calls {
if strings.Contains(c, "-X DELETE") && strings.Contains(c, "/api/v1/firewall/allow/") {
deletes = append(deletes, c)
}
}
assert.Len(t, deletes, 1)
assert.Contains(t, deletes[0], "Authorization: Bearer test-token")
}
func TestEmitAllowRevoke_FetchesTokenPerHostWhenOverrideAbsent(t *testing.T) {
// No --coold-token override → CLI SSHes `cat /etc/coolify/api-token`
// on the destination host and uses the result as the bearer.
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
"/etc/coolify/api-token": "per-host-token\n",
}}
parent := parentWithToken()
parent.CooldToken = ""
t.Setenv("COOLIFY_COOLD_TOKEN", "")
local := &allowRevokeFlags{
From: "10.210.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.NoError(t, err)
var post string
for _, c := range fr.calls {
if strings.Contains(c, "-X POST") && strings.Contains(c, "/api/v1/firewall/allow") {
post = c
}
}
assert.NotEmpty(t, post)
assert.Contains(t, post, "Authorization: Bearer per-host-token")
}
func TestEmitAllowRevoke_FetchFailurePropagates(t *testing.T) {
// Empty /etc/coolify/api-token on the host → FetchCooldToken errors,
// and the error surfaces to the caller instead of silently proceeding.
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
// No token file → empty stdout → "token is empty" error.
}}
parent := parentWithToken()
parent.CooldToken = ""
t.Setenv("COOLIFY_COOLD_TOKEN", "")
local := &allowRevokeFlags{
From: "10.210.1.5", To: "web", Proto: "tcp", Port: 80,
}
inner := &cobra.Command{Use: "allow"}
rootCmdFor(inner)
err := emitAllowRevoke(context.Background(), inner, parent, local, fr, false)
require.Error(t, err)
assert.Contains(t, err.Error(), "coold token")
}
+105
View File
@@ -0,0 +1,105 @@
package firewall
import (
"context"
"fmt"
"os"
"github.com/spf13/cobra"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// newContainersCommand builds `coolify firewall containers`.
func newContainersCommand(flags *Flags) *cobra.Command {
return &cobra.Command{
Use: "containers",
Short: "List containers on the Coolify mesh bridge across all servers",
RunE: func(cmd *cobra.Command, _ []string) error {
return runContainers(cmd.Context(), cmd, flags)
},
}
}
func runContainers(ctx context.Context, cmd *cobra.Command, flags *Flags) error {
if err := flags.Validate(); err != nil {
return err
}
runner, err := flags.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
return emitContainers(ctx, cmd, flags, runner)
}
// emitContainers is factored out so tests can pass a fake ssh.Runner.
func emitContainers(
ctx context.Context,
cmd *cobra.Command,
flags *Flags,
runner ssh.Runner,
) error {
var (
all []ifw.Container
results []ssh.ServerResult[[]ifw.Container]
)
if flags.AllNamespaces {
// Discover across every managed network on each host.
nsList, nsResults := discoverNamespacesOnHosts(ctx, runner, flags)
for _, r := range nsResults {
if r.Err != nil {
results = append(results, ssh.ServerResult[[]ifw.Container]{
Host: r.Host, Err: r.Err,
})
}
}
var containerResults []ssh.ServerResult[[]ifw.Container]
all, containerResults = discoverAcrossNamespaces(ctx, runner, flags, nsList)
results = append(results, containerResults...)
} else {
all, results = discoverAllViaPkg(ctx, runner, flags)
}
rows := make([]models.ContainerRow, 0, len(all))
for _, c := range all {
rows = append(rows, models.ContainerRow{
Host: c.Host, Namespace: c.Namespace, ID: c.ID, Name: c.Name, IP: c.IP.String(),
})
}
var errs []string
for _, r := range results {
if r.Err != nil {
errs = append(errs, fmt.Sprintf("%s: %v", r.Host, r.Err))
}
}
for _, e := range errs {
fmt.Fprintln(os.Stderr, "Warning:", e)
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
if format == "" {
format = output.FormatTable
}
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
if format == output.FormatJSON || format == output.FormatPretty {
return formatter.Format(models.FirewallContainersOutput{
Containers: rows, Errors: errs,
})
}
if len(rows) == 0 {
if flags.AllNamespaces {
fmt.Fprintln(os.Stderr, "No containers found on any coolify-<ns>-mesh network.")
} else {
fmt.Fprintf(os.Stderr, "No containers found on %s network.\n", flags.PodmanNetworkName())
}
return nil
}
return formatter.Format(rows)
}
+106
View File
@@ -0,0 +1,106 @@
package firewall
import (
"context"
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coollabsio/coolify-cli/cmd/common"
)
func TestEmitContainers_RunsAndFormatsTable(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|web|10.210.0.10",
}}
parent := &Flags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"h1"}, SSHUser: "root", SSHPort: 22, Concurrency: 1,
},
Namespace: common.DefaultNamespace,
}
inner := &cobra.Command{Use: "containers"}
rootCmdFor(inner)
err := emitContainers(context.Background(), inner, parent, fr)
require.NoError(t, err)
// Discovery command was issued, targeting the default-namespace bridge.
assert.Len(t, fr.calls, 1)
assert.Contains(t, fr.calls[0], "podman ps")
assert.Contains(t, fr.calls[0], "coolify-default-mesh")
}
func TestEmitContainers_EmptyOutput(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{}}
parent := &Flags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"h1"}, SSHUser: "root", SSHPort: 22, Concurrency: 1,
},
Namespace: common.DefaultNamespace,
}
inner := &cobra.Command{Use: "containers"}
rootCmdFor(inner)
err := emitContainers(context.Background(), inner, parent, fr)
require.NoError(t, err)
}
// TestEmitContainers_AllNamespaces_FansOutAcrossNetworks verifies that with
// --all-namespaces the CLI first enumerates managed networks on every host
// and then issues one podman-ps per namespace it found.
func TestEmitContainers_AllNamespaces_FansOutAcrossNetworks(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
// Host reports two managed namespaces via label inspection.
"podman network ls": "default\nalpha\n",
// Every subsequent podman-ps returns the same container.
"podman ps": "aaa111111111|web|10.210.0.10",
}}
parent := &Flags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"h1"}, SSHUser: "root", SSHPort: 22, Concurrency: 1,
},
Namespace: common.DefaultNamespace,
AllNamespaces: true,
}
inner := &cobra.Command{Use: "containers"}
rootCmdFor(inner)
err := emitContainers(context.Background(), inner, parent, fr)
require.NoError(t, err)
// Expect one `podman network ls` discovery call + one `podman ps` per
// discovered namespace (default + alpha = 2).
var ls, ps int
for _, c := range fr.calls {
switch {
case containsAll(c, "podman network ls", "io.coolify.managed=true"):
ls++
case containsAll(c, "podman ps"):
ps++
}
}
assert.Equal(t, 1, ls, "one namespace-discovery call per host")
assert.Equal(t, 2, ps, "one container-discovery call per namespace per host")
}
func containsAll(s string, subs ...string) bool {
for _, sub := range subs {
if !contains(s, sub) {
return false
}
}
return true
}
func contains(s, sub string) bool {
// Tiny local wrapper so tests stay readable without importing strings
// twice — the test file already uses it elsewhere via cmdFakeRunner.
for i := 0; i+len(sub) <= len(s); i++ {
if s[i:i+len(sub)] == sub {
return true
}
}
return false
}
+39
View File
@@ -0,0 +1,39 @@
package firewall
import (
"github.com/spf13/cobra"
)
// NewFirewallCommand creates the parent `coolify firewall` command.
// On bare invocation (no subcommand) it prints help.
func NewFirewallCommand() *cobra.Command {
flags := &Flags{}
cmd := &cobra.Command{
Use: "firewall",
Short: "[ALPHA] Manage cross-host container allow rules (Coolify v5)",
Long: `[ALPHA] Manage the COOLIFY-ALLOW iptables chain installed by
"coolify init --podman --default-deny". This is a test harness for the v5
control-plane firewall flow: it SSHes into every server, discovers running
containers on the Coolify mesh bridge (override with --podman-network), and
lets you add/remove cross-host allow rules.
Subcommands:
containers List discovered containers across the mesh.
list Show installed allow rules.
allow Add an allow rule (src container → dst container:port).
revoke Remove an allow rule.`,
RunE: func(cmd *cobra.Command, _ []string) error {
return cmd.Help()
},
}
bindFlags(cmd, flags)
cmd.AddCommand(newContainersCommand(flags))
cmd.AddCommand(newListCommand(flags))
cmd.AddCommand(newAllowCommand(flags))
cmd.AddCommand(newRevokeCommand(flags))
return cmd
}
+50
View File
@@ -0,0 +1,50 @@
package firewall
import (
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
)
func TestNewFirewallCommand_Subcommands(t *testing.T) {
cmd := NewFirewallCommand()
assert.Equal(t, "firewall", cmd.Use)
subs := map[string]*cobra.Command{}
for _, s := range cmd.Commands() {
subs[s.Use] = s
}
assert.Contains(t, subs, "containers")
assert.Contains(t, subs, "list")
assert.Contains(t, subs, "allow")
assert.Contains(t, subs, "revoke")
}
func TestNewFirewallCommand_PersistentFlags(t *testing.T) {
cmd := NewFirewallCommand()
pf := cmd.PersistentFlags()
for _, name := range []string{"servers", "ssh-key", "ssh-user", "ssh-port",
"concurrency", "ssh-timeout", "namespace", "all-namespaces",
"coold-token", "coold-port", "wg-interface"} {
assert.NotNil(t, pf.Lookup(name), "missing --%s", name)
}
// Replaced by --namespace; must be gone.
assert.Nil(t, pf.Lookup("podman-network"))
}
func TestAllowCommand_LocalFlags(t *testing.T) {
cmd := NewFirewallCommand()
var allow *cobra.Command
for _, s := range cmd.Commands() {
if s.Use == "allow" {
allow = s
break
}
}
if allow == nil {
t.Fatal("allow subcommand not found")
}
for _, name := range []string{"from", "to", "port", "proto", "bidirectional"} {
assert.NotNil(t, allow.Flags().Lookup(name), "missing --%s on allow", name)
}
}
+82
View File
@@ -0,0 +1,82 @@
// Package firewall implements the `coolify firewall` command tree. It is a
// thin SSH-bounced client for the coold agent's REST API: `allow` / `revoke`
// / `list` POST/DELETE/GET against coold on the destination host, while
// `containers` stays SSH+podman because coold has no container surface.
// See CONTROL_PLANE.md §3.
package firewall
import (
"os"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/cmd/common"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
)
// Flags is the shared flag set for every `coolify firewall`
// subcommand: SSH plumbing (via embed) + namespace selection + coold REST
// endpoint/token. The podman network name is derived from the namespace
// (coolify-<ns>-mesh) so the CLI and `coolify init` stay in sync.
type Flags struct {
common.SSHMeshFlags
// Namespace is the mesh namespace the command operates against. Derives
// the podman network (common.PodmanNetworkFor) and is forwarded to coold
// as part of every rule / list query.
Namespace string
// AllNamespaces, when true, makes namespace-aware subcommands operate
// across every namespace the mesh carries. Each subcommand interprets it
// contextually (list: union across namespaces; containers: discover every
// coolify-<ns>-mesh network on each host).
AllNamespaces bool
// CooldToken is an optional bearer-token override for coold's REST API.
// When unset (and COOLIFY_COOLD_TOKEN env is unset), the CLI SSHes into
// each host and reads /etc/coolify/api-token instead — tokens are
// generated per-host at install time and are not centrally shared.
CooldToken string
// CooldPort is the TCP port coold listens on (bound to the WG mgmt IP).
// Must match COOLD_API_BIND emitted by internal/services/coold.go.
CooldPort int
// WGInterface is the WireGuard interface name used to discover coold's
// bind IP on each host. Must match --wg-interface used at `coolify init`.
WGInterface string
}
// bindFlags registers the persistent flags on the parent command.
func bindFlags(cmd *cobra.Command, f *Flags) {
common.BindSSHMeshFlags(cmd, &f.SSHMeshFlags)
common.BindMeshNetSingleFlags(cmd, &f.Namespace)
pf := cmd.PersistentFlags()
pf.BoolVar(&f.AllNamespaces, "all-namespaces", false,
"Operate across every mesh namespace on each host (list/containers fan out; "+
"allow/revoke still require a specific --namespace)")
pf.StringVar(&f.CooldToken, "coold-token", "",
"Bearer token override for coold REST API (also reads COOLIFY_COOLD_TOKEN env). "+
"When unset, CLI reads /etc/coolify/api-token over SSH per host.")
pf.IntVar(&f.CooldPort, "coold-port", 8443,
"TCP port coold's REST API listens on (bound to the WG mgmt IP)")
pf.StringVar(&f.WGInterface, "wg-interface", ifw.DefaultWGInterface,
"WireGuard interface name on remote hosts (must match --wg-interface at init)")
}
// ResolveCooldToken returns the bearer-token override supplied via flag or
// env, or "" when neither is set. Callers treat an empty string as "no
// override — SSH-fetch the per-host token instead".
func (f *Flags) ResolveCooldToken() (string, error) {
if f.CooldToken != "" {
return f.CooldToken, nil
}
if env := os.Getenv("COOLIFY_COOLD_TOKEN"); env != "" {
return env, nil
}
return "", nil
}
// PodmanNetworkName returns the podman bridge that backs the selected
// namespace on every host. Used by container discovery.
func (f *Flags) PodmanNetworkName() string {
return common.PodmanNetworkFor(f.Namespace)
}
+129
View File
@@ -0,0 +1,129 @@
package firewall
import (
"context"
"sort"
"strings"
"sync"
"github.com/coollabsio/coolify-cli/cmd/common"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// discoverAllViaPkg is a thin wrapper around ifw.DiscoverAll /
// ifw.DiscoverAllNamespaces that threads the Flags in. Used by
// `containers` (SSH+podman) and by `allow` / `revoke` for endpoint
// resolution; `list` goes straight to coold REST.
//
// When AllNamespaces is set, the fanout walks every supplied namespace; the
// caller (containers subcommand) is responsible for enumerating which
// namespaces exist on the hosts — absent that, falls back to the selected
// single namespace.
func discoverAllViaPkg(
ctx context.Context,
runner ssh.Runner,
flags *Flags,
) ([]ifw.Container, []ssh.ServerResult[[]ifw.Container]) {
return ifw.DiscoverAll(ctx, runner, flags.Servers, flags.SSHUser,
flags.SSHPort, flags.Namespace, flags.PodmanNetworkName(),
flags.Concurrency)
}
// discoverAcrossNamespaces runs DiscoverAllNamespaces for every supplied
// namespace. Network name is derived from common.PodmanNetworkFor so the
// caller only has to supply the namespace list.
func discoverAcrossNamespaces(
ctx context.Context,
runner ssh.Runner,
flags *Flags,
namespaces []string,
) ([]ifw.Container, []ssh.ServerResult[[]ifw.Container]) {
return ifw.DiscoverAllNamespaces(ctx, runner, flags.Servers,
flags.SSHUser, flags.SSHPort, namespaces,
common.PodmanNetworkFor, flags.Concurrency)
}
// discoverNamespacesOnHosts SSHes into every host and lists every podman
// network carrying the io.coolify.managed=true label, collecting the unique
// io.coolify.namespace label values. Used by `containers --all-namespaces`.
// Returns the per-host results so host-level failures surface as warnings
// instead of aborting the fanout.
func discoverNamespacesOnHosts(
ctx context.Context,
runner ssh.Runner,
flags *Flags,
) ([]string, []ssh.ServerResult[[]string]) {
// `podman network ls`'s `{{.Labels}}` renders as a comma-separated `k=v`
// string (not a map, unlike `podman network inspect`), so `index` can't be
// used — pull `io.coolify.namespace=<val>` out with sed instead.
script := `podman network ls --filter label=io.coolify.managed=true ` +
`--format '{{.Labels}}' 2>/dev/null | ` +
`sed -n 's/.*io\.coolify\.namespace=\([^,]*\).*/\1/p' || true`
results := ssh.ForEachServer(ctx, flags.Servers, flags.Concurrency,
func(ctx context.Context, host string) ([]string, error) {
stdout, _, err := runner.Run(ctx, host, flags.SSHUser,
flags.SSHPort, script)
if err != nil {
return nil, err
}
var nss []string
for _, line := range strings.Split(stdout, "\n") {
ns := strings.TrimSpace(line)
if ns != "" {
nss = append(nss, ns)
}
}
return nss, nil
})
seen := map[string]struct{}{}
for _, r := range results {
for _, ns := range r.Result {
seen[ns] = struct{}{}
}
}
// Always probe the selected namespace too — caller may have just created
// it and we haven't seen it on any host yet.
seen[flags.Namespace] = struct{}{}
all := make([]string, 0, len(seen))
for ns := range seen {
all = append(all, ns)
}
sort.Strings(all)
return all, results
}
// tokenResolver returns a closure that hands out coold bearer tokens
// per-host. Precedence: explicit --coold-token (or COOLIFY_COOLD_TOKEN env)
// wins for every host; otherwise SSH into the host once and cache the
// contents of /etc/coolify/api-token. The cache is goroutine-safe so the
// closure can be passed straight into CooldListAll's fanout.
func tokenResolver(
ctx context.Context,
runner ssh.Runner,
flags *Flags,
) func(host string) (string, error) {
if override, _ := flags.ResolveCooldToken(); override != "" {
return func(string) (string, error) { return override, nil }
}
var (
mu sync.Mutex
cache = map[string]string{}
)
return func(host string) (string, error) {
mu.Lock()
if tok, ok := cache[host]; ok {
mu.Unlock()
return tok, nil
}
mu.Unlock()
tok, err := ifw.FetchCooldToken(ctx, runner, host, flags.SSHUser, flags.SSHPort)
if err != nil {
return "", err
}
mu.Lock()
cache[host] = tok
mu.Unlock()
return tok, nil
}
}
+97
View File
@@ -0,0 +1,97 @@
package firewall
import (
"context"
"fmt"
"os"
"github.com/spf13/cobra"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// newListCommand builds `coolify firewall list`.
func newListCommand(flags *Flags) *cobra.Command {
return &cobra.Command{
Use: "list",
Short: "List installed allow rules across all servers",
RunE: func(cmd *cobra.Command, _ []string) error {
return runList(cmd.Context(), cmd, flags)
},
}
}
func runList(ctx context.Context, cmd *cobra.Command, flags *Flags) error {
if err := flags.Validate(); err != nil {
return err
}
runner, err := flags.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
return emitList(ctx, cmd, flags, runner)
}
func emitList(
ctx context.Context,
cmd *cobra.Command,
flags *Flags,
runner ssh.Runner,
) error {
tokenFor := tokenResolver(ctx, runner, flags)
// --all-namespaces → omit the query param so coold returns the union.
ns := flags.Namespace
if flags.AllNamespaces {
ns = ""
}
all, results := ifw.CooldListAll(ctx, runner, flags.Servers, flags.SSHUser,
flags.SSHPort, flags.CooldPort, flags.WGInterface, tokenFor,
flags.Concurrency, ns)
rows := make([]models.AllowRuleRow, 0, len(all))
for _, r := range all {
rows = append(rows, models.AllowRuleRow{
Host: r.Host,
Namespace: r.Namespace,
ID: r.Comment,
Src: r.Src.String(),
Dst: r.Dst.String(),
Proto: r.Proto,
Port: r.Port,
Comment: r.Comment,
})
}
var errs []string
for _, r := range results {
if r.Err != nil {
errs = append(errs, fmt.Sprintf("%s: %v", r.Host, r.Err))
}
}
for _, e := range errs {
fmt.Fprintln(os.Stderr, "Warning:", e)
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
if format == "" {
format = output.FormatTable
}
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
if format == output.FormatJSON || format == output.FormatPretty {
return formatter.Format(models.FirewallListOutput{
Rules: rows, Errors: errs,
})
}
if len(rows) == 0 {
fmt.Fprintln(os.Stderr, "No allow rules found. Run `coolify firewall allow ...` to add one.")
return nil
}
return formatter.Format(rows)
}
+65
View File
@@ -0,0 +1,65 @@
package firewall
import (
"context"
"strings"
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestEmitList_CallsCooldGet(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{
"/api/v1/firewall/allow": `[{"src":"10.0.0.1","dst":"10.0.0.2","proto":"tcp","port":80,"id":"abc123def456"}]`,
}}
parent := parentWithToken()
inner := &cobra.Command{Use: "list"}
rootCmdFor(inner)
err := emitList(context.Background(), inner, parent, fr)
require.NoError(t, err)
assert.Len(t, fr.calls, 1)
assert.Contains(t, fr.calls[0], "curl")
assert.Contains(t, fr.calls[0], "/api/v1/firewall/allow")
assert.Contains(t, fr.calls[0], "Authorization: Bearer test-token")
}
func TestEmitList_EmptyCoold(t *testing.T) {
fr := &cmdFakeRunner{responses: map[string]string{}}
parent := parentWithToken()
inner := &cobra.Command{Use: "list"}
rootCmdFor(inner)
err := emitList(context.Background(), inner, parent, fr)
require.NoError(t, err)
}
func TestEmitList_FetchesPerHostTokenWhenOverrideAbsent(t *testing.T) {
// Without --coold-token override, each host's token is read via SSH
// `cat /etc/coolify/api-token` then used as the bearer for GET /allow.
fr := &cmdFakeRunner{responses: map[string]string{
"/etc/coolify/api-token": "per-host-token\n",
"/api/v1/firewall/allow": `[]`,
}}
parent := parentWithToken()
parent.CooldToken = ""
t.Setenv("COOLIFY_COOLD_TOKEN", "")
inner := &cobra.Command{Use: "list"}
rootCmdFor(inner)
err := emitList(context.Background(), inner, parent, fr)
require.NoError(t, err)
var ranTokenFetch, ranGet bool
for _, c := range fr.calls {
if strings.Contains(c, "cat /etc/coolify/api-token") {
ranTokenFetch = true
}
if strings.Contains(c, "curl") && strings.Contains(c, "Authorization: Bearer per-host-token") {
ranGet = true
}
}
assert.True(t, ranTokenFetch, "CLI should SSH-fetch the token")
assert.True(t, ranGet, "bearer should be the fetched token")
}
+113
View File
@@ -0,0 +1,113 @@
package firewall
import (
"fmt"
"net"
"strings"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
)
// resolveEndpoint turns a user-supplied reference (name, short-id, raw IP,
// or "host:name") into the container it points at. When ref is a raw IP
// that doesn't match any discovered container, it returns a synthetic
// entry with Host="" — the caller must derive Host some other way.
//
// Ambiguous names across hosts are rejected; the user must disambiguate
// with "host:name" or a short-ID.
func resolveEndpoint(ref string, all []ifw.Container) (ifw.Container, error) {
ref = strings.TrimSpace(ref)
if ref == "" {
return ifw.Container{}, fmt.Errorf("empty container reference")
}
// "host:name" form — exact host disambiguator.
if host, name, ok := splitHostName(ref); ok {
for _, c := range all {
if c.Host == host && c.Name == name {
return c, nil
}
}
return ifw.Container{}, fmt.Errorf("no container named %q on host %q", name, host)
}
// Raw IP form.
if ip := net.ParseIP(ref); ip != nil {
for _, c := range all {
if c.IP.Equal(ip) {
return c, nil
}
}
// Synthetic: caller must decide on Host.
return ifw.Container{IP: ip}, nil
}
// Name / short-id form. Collect matches, error on ambiguity.
var matches []ifw.Container
for _, c := range all {
if c.Name == ref || strings.HasPrefix(c.ID, ref) {
matches = append(matches, c)
}
}
switch len(matches) {
case 0:
return ifw.Container{}, fmt.Errorf("no container matches %q", ref)
case 1:
return matches[0], nil
default:
return ifw.Container{}, fmt.Errorf(
"reference %q is ambiguous across hosts (%s) — use host:name form",
ref, hostList(matches))
}
}
func splitHostName(ref string) (host, name string, ok bool) {
i := strings.IndexByte(ref, ':')
if i <= 0 || i == len(ref)-1 {
return "", "", false
}
// Reject if the part after `:` looks like a port (all digits) — likely
// an IP:port form the user didn't mean.
name = ref[i+1:]
host = ref[:i]
if allDigits(name) {
return "", "", false
}
return host, name, true
}
func allDigits(s string) bool {
if s == "" {
return false
}
for _, r := range s {
if r < '0' || r > '9' {
return false
}
}
return true
}
func hostList(cs []ifw.Container) string {
seen := map[string]bool{}
var hosts []string
for _, c := range cs {
if !seen[c.Host] {
hosts = append(hosts, c.Host)
seen[c.Host] = true
}
}
return strings.Join(hosts, ", ")
}
// findHostForIP returns the SSH host that owns ip (i.e. the host whose
// coolify-mesh bridge has ip assigned). Used when --to/--from is given as
// a raw IP not tied to a running container.
func findHostForIP(ip net.IP, all []ifw.Container) (string, bool) {
for _, c := range all {
if c.IP.Equal(ip) {
return c.Host, true
}
}
return "", false
}
+76
View File
@@ -0,0 +1,76 @@
package firewall
import (
"net"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
ifw "github.com/coollabsio/coolify-cli/internal/firewall"
)
func cs() []ifw.Container {
return []ifw.Container{
{Host: "h1", ID: "aaa111111111", Name: "web", IP: net.ParseIP("10.210.0.10")},
{Host: "h2", ID: "bbb222222222", Name: "api", IP: net.ParseIP("10.210.1.10")},
{Host: "h3", ID: "ccc333333333", Name: "web", IP: net.ParseIP("10.210.2.10")},
}
}
func TestResolveEndpoint_ByName_Unique(t *testing.T) {
c, err := resolveEndpoint("api", cs())
require.NoError(t, err)
assert.Equal(t, "h2", c.Host)
assert.Equal(t, "10.210.1.10", c.IP.String())
}
func TestResolveEndpoint_ByName_Ambiguous(t *testing.T) {
_, err := resolveEndpoint("web", cs())
require.Error(t, err)
assert.Contains(t, err.Error(), "ambiguous")
}
func TestResolveEndpoint_ByShortID(t *testing.T) {
c, err := resolveEndpoint("bbb", cs())
require.NoError(t, err)
assert.Equal(t, "h2", c.Host)
}
func TestResolveEndpoint_ByHostName(t *testing.T) {
c, err := resolveEndpoint("h3:web", cs())
require.NoError(t, err)
assert.Equal(t, "h3", c.Host)
assert.Equal(t, "10.210.2.10", c.IP.String())
}
func TestResolveEndpoint_ByRawIP(t *testing.T) {
c, err := resolveEndpoint("10.210.1.10", cs())
require.NoError(t, err)
assert.Equal(t, "h2", c.Host)
}
func TestResolveEndpoint_UnknownRawIP_Synthetic(t *testing.T) {
c, err := resolveEndpoint("10.99.99.99", cs())
require.NoError(t, err)
assert.Empty(t, c.Host)
assert.Equal(t, "10.99.99.99", c.IP.String())
}
func TestResolveEndpoint_NotFound(t *testing.T) {
_, err := resolveEndpoint("nobody", cs())
require.Error(t, err)
}
func TestResolveEndpoint_Empty(t *testing.T) {
_, err := resolveEndpoint("", cs())
require.Error(t, err)
}
func TestFindHostForIP(t *testing.T) {
h, ok := findHostForIP(net.ParseIP("10.210.0.10"), cs())
assert.True(t, ok)
assert.Equal(t, "h1", h)
_, ok = findHostForIP(net.ParseIP("1.2.3.4"), cs())
assert.False(t, ok)
}
+200
View File
@@ -0,0 +1,200 @@
package initcmd
import (
"bufio"
"context"
"fmt"
"os"
"github.com/mattn/go-isatty"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
internalssh "github.com/coollabsio/coolify-cli/internal/ssh"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// Ensure internalssh is used (for *internalssh.Client in signatures).
var _ *internalssh.Client
// applyOptions tweaks runApply per subcommand.
type applyOptions struct {
// SkipAlphaGate, when true, bypasses the interactive "press enter"
// confirmation. upgrade/extend set it because those are called from the
// Coolify backend in production, not a human at a terminal.
SkipAlphaGate bool
// Header is a one-line banner describing the intent (e.g. "extending
// mesh with 1 new host"). Printed to stderr before the plan.
Header string
}
func runApply(ctx context.Context, cmd *cobra.Command, flags *InitFlags, opts applyOptions) error {
fmt.Fprint(os.Stderr, alphaBanner)
if err := validatePlanFlags(flags); err != nil {
return err
}
if !opts.SkipAlphaGate && !shouldSkipGate(flags) {
fmt.Fprintln(os.Stderr, "This command will modify network configuration on the listed servers.")
fmt.Fprint(os.Stderr, "Press Enter to continue, or Ctrl+C to abort... ")
reader := bufio.NewReader(os.Stdin)
if _, err := reader.ReadString('\n'); err != nil {
return fmt.Errorf("read confirmation: %w", err)
}
}
desired, err := buildDesired(flags)
if err != nil {
return err
}
if err := wireguard.ValidateIntent(desired); err != nil {
return err
}
sshClient, err := flags.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
if opts.Header != "" {
fmt.Fprintln(os.Stderr, opts.Header)
}
fmt.Fprintf(os.Stderr, "Probing %d server(s)...\n", len(flags.Servers))
current, probeErr := wireguard.Reconstruct(ctx, sshClient, flags.Servers,
flags.SSHUser, flags.SSHPort, flags.WGInterface,
flags.Namespaces, flags.Concurrency)
if probeErr != nil {
fmt.Fprintf(os.Stderr, "Warning: %v\n", probeErr)
}
plan, err := wireguard.BuildPlan(desired, current)
if err != nil {
return fmt.Errorf("build plan: %w", err)
}
for _, w := range plan.Warnings {
fmt.Fprintf(os.Stderr, "Warning [%s]: %s\n", w.Host, w.Reason)
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
if plan.IsEmpty() {
fmt.Fprintln(os.Stderr, "No changes needed. Mesh is already converged.")
} else {
fmt.Fprintln(os.Stderr, "Plan:")
for _, a := range plan.Actions {
fmt.Fprintf(os.Stderr, " [%s] %s %s\n", a.Host, a.Type, a.Detail)
}
fmt.Fprintln(os.Stderr)
}
if len(plan.Skipped) > 0 {
fmt.Fprintln(os.Stderr, "Skipped by intent filter:")
for _, s := range plan.Skipped {
fmt.Fprintf(os.Stderr, " [%s] %s — %s\n", s.Action.Host, s.Action.Type, s.Reason)
}
fmt.Fprintln(os.Stderr)
}
if plan.IsEmpty() {
return runVerify(ctx, sshClient, flags, desired, format)
}
fmt.Fprintln(os.Stderr, "Applying...")
actionResults, applyErr := wireguard.ApplyMesh(ctx, sshClient,
flags.SSHUser, flags.SSHPort, desired, current, flags.Concurrency)
rows := make([]models.ApplyResultRow, len(actionResults))
for i, r := range actionResults {
status := "ok"
detail := r.Action.Detail
if r.Err != nil {
status = "error"
if detail == "" {
detail = r.Err.Error()
}
}
rows[i] = models.ApplyResultRow{
Server: r.Action.Host,
Action: string(r.Action.Type),
Status: status,
Detail: detail,
}
}
if format == output.FormatJSON || format == output.FormatPretty {
verifyRows := collectVerifyRows(ctx, sshClient, flags, desired)
out := models.ApplyOutput{Results: rows, Verified: verifyRows}
formatter, ferr := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if ferr != nil {
return ferr
}
if err := formatter.Format(out); err != nil {
return err
}
return applyErr
}
if len(rows) > 0 {
formatter, _ := output.NewFormatter(output.FormatTable, output.Options{Writer: os.Stdout})
_ = formatter.Format(rows)
}
if err := runVerify(ctx, sshClient, flags, desired, format); err != nil {
return err
}
return applyErr
}
// shouldSkipGate returns true when the interactive alpha gate should be bypassed.
func shouldSkipGate(flags *InitFlags) bool {
if flags.Yes {
return true
}
if os.Getenv("COOLIFY_NON_INTERACTIVE") == "1" {
return true
}
if !isatty.IsTerminal(os.Stdin.Fd()) && !isatty.IsCygwinTerminal(os.Stdin.Fd()) {
return true
}
return false
}
func runVerify(ctx context.Context, sshClient *internalssh.Client, flags *InitFlags, desired *wireguard.DesiredMesh, format string) error {
fmt.Fprintln(os.Stderr, "Verifying...")
vrows := collectVerifyRows(ctx, sshClient, flags, desired)
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
return formatter.Format(vrows)
}
func collectVerifyRows(ctx context.Context, sshClient *internalssh.Client, flags *InitFlags, desired *wireguard.DesiredMesh) []models.VerifyResultRow {
vresults := wireguard.Verify(ctx, sshClient,
flags.Servers, flags.SSHUser, flags.SSHPort, desired.Interface, flags.Concurrency)
rows := make([]models.VerifyResultRow, len(vresults))
for i, v := range vresults {
status := "ok"
wgIP := ""
if v.WireGuardIP != nil {
wgIP = v.WireGuardIP.String()
}
if v.Err != nil || !v.Active {
status = "error"
}
rows[i] = models.VerifyResultRow{
Server: v.Host,
WireGuardIP: wgIP,
PeerCount: v.PeerCount,
Status: status,
}
}
return rows
}
+125
View File
@@ -0,0 +1,125 @@
package initcmd
import (
"testing"
"github.com/spf13/cobra"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// TestNewInitCommand verifies the command tree structure.
func TestNewInitCommand(t *testing.T) {
cmd := NewInitCommand()
assert.Equal(t, "init", cmd.Use)
assert.NotEmpty(t, cmd.Short)
subCmds := map[string]*cobra.Command{}
for _, sub := range cmd.Commands() {
subCmds[sub.Use] = sub
}
assert.Contains(t, subCmds, "plan")
assert.Contains(t, subCmds, "bootstrap")
assert.Contains(t, subCmds, "extend")
assert.Contains(t, subCmds, "upgrade")
assert.NotContains(t, subCmds, "apply", "apply removed in favor of bootstrap/extend/upgrade")
}
// TestNewInitCommand_PersistentFlags verifies shared flags are registered.
func TestNewInitCommand_PersistentFlags(t *testing.T) {
cmd := NewInitCommand()
pf := cmd.PersistentFlags()
assert.NotNil(t, pf.Lookup("servers"))
assert.NotNil(t, pf.Lookup("ssh-key"))
assert.NotNil(t, pf.Lookup("ssh-user"))
assert.NotNil(t, pf.Lookup("ssh-port"))
assert.NotNil(t, pf.Lookup("wg-mgmt-pool"))
assert.NotNil(t, pf.Lookup("container-pool"))
assert.NotNil(t, pf.Lookup("container-prefix"))
assert.NotNil(t, pf.Lookup("wg-interface"))
assert.NotNil(t, pf.Lookup("wg-listen-port"))
assert.NotNil(t, pf.Lookup("namespaces"))
assert.NotNil(t, pf.Lookup("skip-default-deny"))
assert.NotNil(t, pf.Lookup("concurrency"))
assert.NotNil(t, pf.Lookup("ssh-timeout"))
assert.NotNil(t, pf.Lookup("yes"))
// Old flags removed.
assert.Nil(t, pf.Lookup("wg-pool"))
assert.Nil(t, pf.Lookup("wg-host-prefix"))
assert.Nil(t, pf.Lookup("wg-subnet"))
assert.Nil(t, pf.Lookup("podman"))
assert.Nil(t, pf.Lookup("default-deny"))
assert.Nil(t, pf.Lookup("install-coold"))
// Replaced by --namespaces.
assert.Nil(t, pf.Lookup("podman-network"))
}
// TestNewInitCommand_FlagDefaults verifies default values.
func TestNewInitCommand_FlagDefaults(t *testing.T) {
cmd := NewInitCommand()
pf := cmd.PersistentFlags()
user, err := pf.GetString("ssh-user")
require.NoError(t, err)
assert.Equal(t, "root", user)
port, err := pf.GetInt("ssh-port")
require.NoError(t, err)
assert.Equal(t, 22, port)
mgmtPool, err := pf.GetString("wg-mgmt-pool")
require.NoError(t, err)
assert.Equal(t, "100.64.0.0/16", mgmtPool)
contPool, err := pf.GetString("container-pool")
require.NoError(t, err)
assert.Equal(t, "10.210.0.0/16", contPool)
contPrefix, err := pf.GetInt("container-prefix")
require.NoError(t, err)
assert.Equal(t, 24, contPrefix)
iface, err := pf.GetString("wg-interface")
require.NoError(t, err)
assert.Equal(t, "wg0", iface)
listenPort, err := pf.GetInt("wg-listen-port")
require.NoError(t, err)
assert.Equal(t, 51820, listenPort)
namespaces, err := pf.GetStringSlice("namespaces")
require.NoError(t, err)
assert.Equal(t, []string{"default"}, namespaces)
skipDefaultDeny, err := pf.GetBool("skip-default-deny")
require.NoError(t, err)
assert.False(t, skipDefaultDeny)
concurrency, err := pf.GetInt("concurrency")
require.NoError(t, err)
assert.Equal(t, 10, concurrency)
timeout, err := pf.GetString("ssh-timeout")
require.NoError(t, err)
assert.Equal(t, "30s", timeout)
}
// TestPlanCommand_FlagsInherited verifies that plan inherits parent persistent flags.
func TestPlanCommand_FlagsInherited(t *testing.T) {
init := NewInitCommand()
_ = init.ParseFlags([]string{})
var planCmd *cobra.Command
for _, sub := range init.Commands() {
if sub.Use == "plan" {
planCmd = sub
break
}
}
require.NotNil(t, planCmd)
f := planCmd.InheritedFlags().Lookup("servers")
assert.NotNil(t, f, "plan should inherit --servers from parent")
}
+29
View File
@@ -0,0 +1,29 @@
package initcmd
import (
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// NewBootstrapCommand creates the `coolify init bootstrap` subcommand — the
// first-time mesh install. Runs every applicable action on every host and
// keeps the interactive alpha gate (unless --yes / non-TTY / env override).
func NewBootstrapCommand(flags *InitFlags) *cobra.Command {
return &cobra.Command{
Use: "bootstrap",
Short: "First-time mesh install (all actions allowed)",
Long: `Bootstrap a fresh WireGuard + Podman + coold mesh across every host in
--servers. Idempotent: re-running with no changes produces an empty plan.
Use this for the initial install. For adding hosts later, see
` + "`coolify init extend`" + `; for bumping agent versions, see
` + "`coolify init upgrade`" + `.`,
RunE: func(cmd *cobra.Command, _ []string) error {
flags.Intent = string(wireguard.IntentBootstrap)
return runApply(cmd.Context(), cmd, flags, applyOptions{
Header: "Bootstrapping mesh...",
})
},
}
}
+51
View File
@@ -0,0 +1,51 @@
package initcmd
import (
"fmt"
"net"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// buildDesired turns the flag struct into a wireguard.DesiredMesh. Intent is
// pulled from flags.Intent so each subcommand can set it before calling the
// shared plan/apply pipeline.
func buildDesired(flags *InitFlags) (*wireguard.DesiredMesh, error) {
_, mgmtPool, err := net.ParseCIDR(flags.WGMgmtPool)
if err != nil {
return nil, fmt.Errorf("invalid --wg-mgmt-pool %q: %w", flags.WGMgmtPool, err)
}
_, contPool, err := net.ParseCIDR(flags.ContainerPool)
if err != nil {
return nil, fmt.Errorf("invalid --container-pool %q: %w", flags.ContainerPool, err)
}
return &wireguard.DesiredMesh{
Hosts: flags.Servers,
Interface: flags.WGInterface,
MgmtPool: mgmtPool,
ContainerPool: contPool,
ContainerPrefix: flags.ContainerPrefix,
ListenPort: flags.WGListenPort,
InstallPodman: true,
Namespaces: flags.Namespaces,
DefaultDenyContainers: !flags.SkipDefaultDeny,
InstallCoold: true,
CooldVersion: flags.CooldVersion,
CorrosionVersion: flags.CorrosionVersion,
CorrosionGossipPort: flags.CorrosionGossipPort,
CorrosionAPIPort: flags.CorrosionAPIPort,
CentralHost: flags.CentralHost,
SchedulerVersion: flags.SchedulerVersion,
EnableBuilder: flags.EnableBuilder,
BuilderHosts: flags.BuilderHosts,
BuilderCapacity: flags.BuilderCapacity,
BuilderCPUQuota: flags.BuilderCPUQuota,
BuilderMemoryMax: flags.BuilderMemoryMax,
BuilderTimeoutSecs: flags.BuilderTimeoutSecs,
Intent: wireguard.Intent(flags.Intent),
NewHosts: flags.NewHosts,
AllowReplace: flags.AllowReplace,
AllowNightly: flags.AllowNightly,
}, nil
}
+65
View File
@@ -0,0 +1,65 @@
package initcmd
import (
"fmt"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// NewExtendCommand creates the `coolify init extend` subcommand. It adds the
// hosts listed in --new-hosts to an existing mesh: new hosts get the full
// first-time install; existing hosts get only peer-refresh actions (WG
// AllowedIPs update, corrosion config refresh, firewall unit reinstall if
// namespace list changed). Destructive actions on existing hosts are blocked
// unless --allow-replace is set.
func NewExtendCommand(flags *InitFlags) *cobra.Command {
cmd := &cobra.Command{
Use: "extend",
Short: "Add new hosts to an existing mesh (existing hosts stay untouched)",
Long: `Extend an existing mesh with brand-new hosts. --new-hosts lists the
subset of --servers that is brand-new; those hosts receive the full
first-time install (install WG, generate keys, install podman, install
coold/corrosion, create bridges, etc.).
Existing hosts in --servers are re-probed and get only the peer-refresh
actions required to route traffic to the new peer: WG config rewrite,
corrosion peer list refresh, firewall unit reinstall when the namespace
list changed. Agent binaries are not re-downloaded on existing hosts —
use ` + "`coolify init upgrade`" + ` for that.
--allow-replace unlocks destructive-replace actions (e.g. recreating a
drifted podman bridge) on existing hosts. Handle with care: containers
on a recreated bridge are disconnected.`,
RunE: func(cmd *cobra.Command, _ []string) error {
if len(flags.NewHosts) == 0 {
return fmt.Errorf("--new-hosts is required: list the subset of --servers that is brand-new")
}
servers := make(map[string]struct{}, len(flags.Servers))
for _, s := range flags.Servers {
servers[s] = struct{}{}
}
for _, nh := range flags.NewHosts {
if _, ok := servers[nh]; !ok {
return fmt.Errorf("--new-hosts: %q is not in --servers", nh)
}
}
flags.Intent = string(wireguard.IntentExtend)
header := fmt.Sprintf("Extending mesh with %d new host(s): %v", len(flags.NewHosts), flags.NewHosts)
return runApply(cmd.Context(), cmd, flags, applyOptions{
SkipAlphaGate: true,
Header: header,
})
},
}
cmd.Flags().StringSliceVar(&flags.NewHosts, "new-hosts", nil,
"Comma-separated subset of --servers that is brand-new this run (required). Only these hosts receive the full first-time install; all other hosts get peer-refresh only.")
cmd.Flags().BoolVar(&flags.AllowReplace, "allow-replace", false,
"Unlock destructive-replace actions on existing hosts (e.g. recreating a drifted podman bridge). Off by default — drifted existing hosts are surfaced as skipped actions instead.")
return cmd
}
+116
View File
@@ -0,0 +1,116 @@
// Package initcmd implements the `coolify init` alpha WireGuard mesh
// bootstrap command tree (Coolify v5).
package initcmd
import (
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/cmd/common"
)
// InitFlags holds all flags shared between `plan` and `apply`.
type InitFlags struct {
common.SSHMeshFlags
common.MeshNetFlags
WGMgmtPool string
WGInterface string
WGListenPort int
SkipDefaultDeny bool
CooldVersion string
CorrosionVersion string
CorrosionGossipPort int
CorrosionAPIPort int
Yes bool
// CentralHost is the SSH address of the central VM (from --central flag).
// When non-empty, phases 4+5 install the scheduler on that host and push
// per-host JWTs to all other hosts. Default empty = no scheduler setup.
CentralHost string
SchedulerVersion string
// EnableBuilder is a cluster-wide shorthand: when true (and BuilderHosts
// is empty), every host in Servers is enrolled as builder-capable. When
// BuilderHosts is non-empty, EnableBuilder is ignored and only the
// listed subset gets the capability.
EnableBuilder bool
// BuilderHosts is an explicit list of SSH addresses (subset of Servers)
// to enroll with the builder capability. Empty = fall back to
// EnableBuilder semantics. Mutually exclusive in practice with
// EnableBuilder=false (leaves builder fully disabled).
BuilderHosts []string
BuilderCapacity int
BuilderCPUQuota string
BuilderMemoryMax string
BuilderTimeoutSecs int
// NewHosts is the extend-subcommand-only list of brand-new hosts. Must
// be a subset of Servers. Existing hosts in Servers get only peer-refresh
// actions; new hosts get the full first-time install.
NewHosts []string
// AllowReplace unlocks destructive-replace actions on existing hosts in
// extend mode (e.g. recreating a podman bridge whose dns_enabled=true
// pre-alpha drift would otherwise be blocked).
AllowReplace bool
// AllowNightly permits the upgrade subcommand to accept "nightly" as a
// version tag. Rejected by default because nightly forces a re-install on
// every run instead of only when the pinned version changes.
AllowNightly bool
// Intent selects the plan filter (bootstrap/extend/upgrade). Set by each
// subcommand before calling runPlan/runApply; not bound to a flag.
Intent string
}
// bindInitFlags registers all shared flags as PersistentFlags on cmd.
func bindInitFlags(cmd *cobra.Command, f *InitFlags) {
common.BindSSHMeshFlags(cmd, &f.SSHMeshFlags)
common.BindMeshNetMultiFlags(cmd, &f.MeshNetFlags)
pf := cmd.PersistentFlags()
pf.StringVar(&f.WGMgmtPool, "wg-mgmt-pool", "100.64.0.0/16",
"WireGuard management address pool — each host gets a /32 from here, assigned to wg0")
pf.StringVar(&f.WGInterface, "wg-interface", "wg0",
"WireGuard interface name on the remote hosts")
pf.IntVar(&f.WGListenPort, "wg-listen-port", 51820,
"WireGuard UDP listen port")
pf.BoolVar(&f.SkipDefaultDeny, "skip-default-deny", false,
"Skip installing the default-deny firewall scaffold. By default, both cross-host and intra-host (same bridge) container traffic is blocked; coold manages the allow list at runtime")
pf.StringVar(&f.CooldVersion, "coold-version", "nightly",
`Release tag to download for coold (e.g. "nightly", "v1.2.3"). nightly always re-installs on every apply.`)
pf.StringVar(&f.CorrosionVersion, "corrosion-version", "nightly",
`Release tag to download for corrosion (e.g. "nightly", "v1.2.3"). nightly always re-installs on every apply.`)
pf.IntVar(&f.CorrosionGossipPort, "corrosion-gossip-port", 8787,
"Corrosion SWIM gossip port (bound to the wg0 mgmt IP)")
pf.IntVar(&f.CorrosionAPIPort, "corrosion-api-port", 8080,
"Corrosion HTTP API port (bound to 127.0.0.1)")
pf.BoolVarP(&f.Yes, "yes", "y", false,
"Skip the interactive alpha confirmation prompt")
pf.StringVar(&f.CentralHost, "central", "",
`SSH address of the central VM that will run the scheduler (and later Laravel).
Must be one of the --servers entries. When set, phases 4+5 install the scheduler on that host
and push a per-host JWT to every other server. Leave empty to skip scheduler setup.`)
pf.StringVar(&f.SchedulerVersion, "scheduler-version", "nightly",
`Release tag to download for scheduler (e.g. "nightly", "v1.2.3").`)
pf.BoolVar(&f.EnableBuilder, "enable-builder", true,
`Cluster-wide shorthand: enable the builder capability on every host
(requires --central). Ignored when --builder-hosts is set.`)
pf.StringSliceVar(&f.BuilderHosts, "builder-hosts", nil,
`Explicit subset of --servers to enroll with the builder capability.
Takes precedence over --enable-builder. Empty (default) means fall back to
--enable-builder for the whole cluster.`)
pf.IntVar(&f.BuilderCapacity, "builder-capacity", 2,
"Concurrent builds accepted per host (COOLD_BUILDER_CAPACITY).")
pf.StringVar(&f.BuilderCPUQuota, "builder-cpu-quota", "200%",
`cgroup CPU quota for each build subprocess (COOLD_BUILDER_CPU_QUOTA).
systemd CPUQuota format; "200%" = two full cores.`)
pf.StringVar(&f.BuilderMemoryMax, "builder-memory-max", "2G",
`cgroup memory cap for each build subprocess (COOLD_BUILDER_MEMORY_MAX).
systemd MemoryMax format; e.g. "2G", "512M".`)
pf.IntVar(&f.BuilderTimeoutSecs, "builder-timeout-secs", 1800,
"Hard wall-clock timeout per build in seconds (COOLD_BUILDER_TIMEOUT_SECS).")
}
+51
View File
@@ -0,0 +1,51 @@
package initcmd
import (
"fmt"
"os"
"github.com/spf13/cobra"
)
const alphaBanner = `
[ALPHA] coolify init targets Coolify v5 and is experimental.
[ALPHA] WireGuard mesh bootstrap requires root/sudo and modifies network configuration.
[ALPHA] Test in non-production environments first. Stability is not guaranteed.
`
// NewInitCommand creates the parent `coolify init` command.
// On bare invocation (no subcommand) it prints the alpha banner and help.
func NewInitCommand() *cobra.Command {
flags := &InitFlags{}
cmd := &cobra.Command{
Use: "init",
Short: "[ALPHA] Initialize WireGuard mesh for Coolify v5",
Long: `[ALPHA] Bootstrap a WireGuard full-mesh overlay between servers and
provision each host with the Coolify v5 runtime stack: Podman + bridge
network, default-deny iptables scaffold, and the coold/corrosion
control-plane agents.
Subcommands:
plan Show what would change without touching anything (--intent
selects the filter: bootstrap / extend / upgrade).
bootstrap First-time install (all actions allowed).
extend Add new hosts to an existing mesh; existing hosts get only
peer-refresh actions.
upgrade Bump agent versions (coold / corrosion / scheduler / builder);
WG / podman / firewall untouched.`,
RunE: func(cmd *cobra.Command, _ []string) error {
fmt.Fprint(os.Stderr, alphaBanner)
return cmd.Help()
},
}
bindInitFlags(cmd, flags)
cmd.AddCommand(NewPlanCommand(flags))
cmd.AddCommand(NewBootstrapCommand(flags))
cmd.AddCommand(NewExtendCommand(flags))
cmd.AddCommand(NewUpgradeCommand(flags))
return cmd
}
+175
View File
@@ -0,0 +1,175 @@
package initcmd
import (
"context"
"fmt"
"os"
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/models"
"github.com/coollabsio/coolify-cli/internal/output"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// NewPlanCommand creates the `coolify init plan` subcommand.
func NewPlanCommand(flags *InitFlags) *cobra.Command {
var intentFlag string
cmd := &cobra.Command{
Use: "plan",
Short: "Show WireGuard mesh changes without applying them",
Long: `Reconstruct the current WireGuard state from each server via SSH and
show the actions that apply would execute. Nothing is changed.
Pass --intent to preview a specific subcommand's behavior (bootstrap, extend,
upgrade). bootstrap is the default and matches the pre-split behavior.`,
RunE: func(cmd *cobra.Command, _ []string) error {
fmt.Fprint(os.Stderr, alphaBanner)
flags.Intent = intentFlag
return runPlan(cmd.Context(), cmd, flags)
},
}
cmd.Flags().StringVar(&intentFlag, "intent", "bootstrap",
`Preview filter: "bootstrap" (all actions), "extend" (treat --new-hosts as fresh, existing hosts peer-refresh only), "upgrade" (version bumps only).`)
return cmd
}
func runPlan(ctx context.Context, cmd *cobra.Command, flags *InitFlags) error {
if err := validatePlanFlags(flags); err != nil {
return err
}
desired, err := buildDesired(flags)
if err != nil {
return err
}
if err := wireguard.ValidateIntent(desired); err != nil {
return err
}
sshClient, err := flags.BuildSSHClient()
if err != nil {
return fmt.Errorf("SSH client: %w", err)
}
fmt.Fprintf(os.Stderr, "Probing %d server(s)...\n", len(flags.Servers))
current, err := wireguard.Reconstruct(ctx, sshClient, flags.Servers,
flags.SSHUser, flags.SSHPort, flags.WGInterface,
flags.Namespaces, flags.Concurrency)
if err != nil {
fmt.Fprintf(os.Stderr, "Warning: %v\n", err)
}
plan, err := wireguard.BuildPlan(desired, current)
if err != nil {
return fmt.Errorf("build plan: %w", err)
}
for _, w := range plan.Warnings {
fmt.Fprintf(os.Stderr, "Warning [%s]: %s\n", w.Host, w.Reason)
}
format, _ := cmd.Root().PersistentFlags().GetString("format")
intent := intentLabel(flags.Intent)
if plan.IsEmpty() && len(plan.Skipped) == 0 {
msg := "No changes needed. Mesh is already converged."
if format == output.FormatJSON {
out := models.PlanOutput{
Servers: flags.Servers,
Intent: intent,
Actions: []models.PlanActionRow{},
Warnings: warningsToStrings(plan.Warnings),
}
formatter, _ := output.NewFormatter(format, output.Options{Writer: os.Stdout})
return formatter.Format(out)
}
fmt.Println(msg)
return nil
}
rows := make([]models.PlanActionRow, len(plan.Actions))
for i, a := range plan.Actions {
rows[i] = models.PlanActionRow{
Server: a.Host,
Action: string(a.Type),
Detail: a.Detail,
}
}
skipped := skippedRows(plan.Skipped)
formatter, err := output.NewFormatter(format, output.Options{Writer: os.Stdout})
if err != nil {
return err
}
if format == output.FormatJSON || format == output.FormatPretty {
return formatter.Format(models.PlanOutput{
Servers: flags.Servers,
Intent: intent,
Actions: rows,
Skipped: skipped,
Warnings: warningsToStrings(plan.Warnings),
})
}
if len(rows) > 0 {
if err := formatter.Format(rows); err != nil {
return err
}
} else {
fmt.Println("No actions scheduled.")
}
if len(skipped) > 0 {
fmt.Fprintln(os.Stderr)
fmt.Fprintln(os.Stderr, "Skipped by intent filter:")
for _, s := range skipped {
fmt.Fprintf(os.Stderr, " [%s] %s — %s\n", s.Server, s.Action, s.Reason)
}
}
return nil
}
func validatePlanFlags(f *InitFlags) error {
if err := f.Validate(); err != nil {
return err
}
return f.ValidateNamespaces()
}
// warningsToStrings formats allocator warnings as human-readable strings.
func warningsToStrings(ws []wireguard.Warning) []string {
if len(ws) == 0 {
return nil
}
out := make([]string, len(ws))
for i, w := range ws {
out[i] = fmt.Sprintf("[%s] %s", w.Host, w.Reason)
}
return out
}
// skippedRows converts the plan's intent-filtered actions into render rows.
func skippedRows(ss []wireguard.SkippedAction) []models.PlanSkippedRow {
if len(ss) == 0 {
return nil
}
out := make([]models.PlanSkippedRow, len(ss))
for i, s := range ss {
out[i] = models.PlanSkippedRow{
Server: s.Action.Host,
Action: string(s.Action.Type),
Reason: s.Reason,
}
}
return out
}
// intentLabel normalizes an empty or zero intent to "bootstrap" for display.
func intentLabel(raw string) string {
if raw == "" {
return "bootstrap"
}
return raw
}
+68
View File
@@ -0,0 +1,68 @@
package initcmd
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coollabsio/coolify-cli/cmd/common"
)
// TestValidatePlanFlags checks required flag validation.
func TestValidatePlanFlags(t *testing.T) {
t.Run("missing servers", func(t *testing.T) {
err := validatePlanFlags(&InitFlags{
SSHMeshFlags: common.SSHMeshFlags{SSHKey: "/path/to/key"},
})
require.Error(t, err)
assert.Contains(t, err.Error(), "--servers")
})
t.Run("missing ssh key", func(t *testing.T) {
err := validatePlanFlags(&InitFlags{
SSHMeshFlags: common.SSHMeshFlags{Servers: []string{"1.1.1.1"}},
})
require.Error(t, err)
assert.Contains(t, err.Error(), "--ssh-key")
})
t.Run("valid", func(t *testing.T) {
err := validatePlanFlags(&InitFlags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"1.1.1.1"},
SSHKey: "/path/to/key",
},
MeshNetFlags: common.MeshNetFlags{
Namespaces: []string{common.DefaultNamespace},
},
})
require.NoError(t, err)
})
t.Run("invalid namespace", func(t *testing.T) {
err := validatePlanFlags(&InitFlags{
SSHMeshFlags: common.SSHMeshFlags{
Servers: []string{"1.1.1.1"},
SSHKey: "/path/to/key",
},
MeshNetFlags: common.MeshNetFlags{
Namespaces: []string{"Not Valid"},
},
})
require.Error(t, err)
assert.Contains(t, err.Error(), "invalid namespace")
})
}
// TestShouldSkipGate verifies the alpha gate bypass logic.
func TestShouldSkipGate(t *testing.T) {
// --yes flag
assert.True(t, shouldSkipGate(&InitFlags{Yes: true}))
// Without --yes and without env var, behaviour depends on TTY.
// We can't reliably test the TTY path in unit tests, but we can
// confirm the env-var bypass.
t.Setenv("COOLIFY_NON_INTERACTIVE", "1")
assert.True(t, shouldSkipGate(&InitFlags{}))
}
+38
View File
@@ -0,0 +1,38 @@
package initcmd
import (
"github.com/spf13/cobra"
"github.com/coollabsio/coolify-cli/internal/wireguard"
)
// NewUpgradeCommand creates the `coolify init upgrade` subcommand: bumps
// coold/corrosion/scheduler/builder binaries across every host. Does not touch
// WG config, podman networks, firewall rules, or the corrosion schema. Rejects
// "nightly" version tags unless --allow-nightly is set.
func NewUpgradeCommand(flags *InitFlags) *cobra.Command {
cmd := &cobra.Command{
Use: "upgrade",
Short: "Bump agent binary versions (coold / corrosion / scheduler / builder) on every host",
Long: `Upgrade the agent binaries managed by coolify init across every host in
--servers. Only binary-fetch actions and their follow-up service restarts
run; WG config, podman networks, firewall rules, and the corrosion schema
are left untouched.
Pin each binary with --coold-version / --corrosion-version /
--scheduler-version. "nightly" is rejected by default because it forces a
re-install on every run; pass --allow-nightly to override.`,
RunE: func(cmd *cobra.Command, _ []string) error {
flags.Intent = string(wireguard.IntentUpgrade)
return runApply(cmd.Context(), cmd, flags, applyOptions{
SkipAlphaGate: true,
Header: "Upgrading agent binaries...",
})
},
}
cmd.Flags().BoolVar(&flags.AllowNightly, "allow-nightly", false,
"Permit --coold-version/--corrosion-version/--scheduler-version=nightly. Off by default because nightly re-installs on every run instead of only when the pinned version changes.")
return cmd
}
+4
View File
@@ -15,7 +15,9 @@ import (
"github.com/coollabsio/coolify-cli/cmd/context"
"github.com/coollabsio/coolify-cli/cmd/database"
"github.com/coollabsio/coolify-cli/cmd/deployment"
"github.com/coollabsio/coolify-cli/cmd/firewall"
"github.com/coollabsio/coolify-cli/cmd/github"
initcmd "github.com/coollabsio/coolify-cli/cmd/init"
"github.com/coollabsio/coolify-cli/cmd/privatekeys"
"github.com/coollabsio/coolify-cli/cmd/project"
"github.com/coollabsio/coolify-cli/cmd/resources"
@@ -91,7 +93,9 @@ func init() {
rootCmd.AddCommand(context.NewContextCommand())
rootCmd.AddCommand(database.NewDatabaseCommand())
rootCmd.AddCommand(deployment.NewDeploymentCommand())
rootCmd.AddCommand(firewall.NewFirewallCommand())
rootCmd.AddCommand(github.NewGitHubCommand())
rootCmd.AddCommand(initcmd.NewInitCommand())
rootCmd.AddCommand(privatekeys.NewPrivateKeysCommand())
rootCmd.AddCommand(project.NewProjectCommand())
rootCmd.AddCommand(resources.NewResourceCommand())
+2 -1
View File
@@ -32,7 +32,8 @@ func NewGetCommand() *cobra.Command {
return fmt.Errorf("failed to get service: %w", err)
}
formatter, err := output.NewFormatter("table", output.Options{})
format, _ := cmd.Flags().GetString("format")
formatter, err := output.NewFormatter(format, output.Options{})
if err != nil {
return fmt.Errorf("failed to create formatter: %w", err)
}
+2 -1
View File
@@ -30,7 +30,8 @@ func NewListCommand() *cobra.Command {
return fmt.Errorf("failed to list services: %w", err)
}
formatter, err := output.NewFormatter("table", output.Options{})
format, _ := cmd.Flags().GetString("format")
formatter, err := output.NewFormatter(format, output.Options{})
if err != nil {
return fmt.Errorf("failed to create formatter: %w", err)
}
-57
View File
@@ -1,57 +0,0 @@
#!/bin/bash
set -e
echo "🔧 Setting up Coolify CLI workspace..."
# Check if Go is installed
if ! command -v go &> /dev/null; then
echo "❌ Error: Go is not installed"
echo "Please install Go 1.24+ from https://go.dev/dl/"
exit 1
fi
# Check Go version
GO_VERSION=$(go version | awk '{print $3}' | sed 's/go//')
MAJOR_MINOR=$(echo $GO_VERSION | cut -d. -f1,2)
# Compare version (must be 1.24 or higher)
if [ $(echo "$MAJOR_MINOR" | awk -F. '{print ($1 * 100) + $2}') -lt 124 ]; then
echo "❌ Error: Go version 1.24+ is required"
echo "Current version: $GO_VERSION"
echo "Please upgrade Go from https://go.dev/dl/"
exit 1
fi
echo "✅ Go version $GO_VERSION detected"
# Download dependencies
echo "📦 Downloading dependencies..."
if ! go mod download; then
echo "❌ Error: Failed to download dependencies"
exit 1
fi
echo "✅ Dependencies downloaded"
# Install air if not already installed
if ! command -v air &> /dev/null; then
echo "📦 Installing air (Go file watcher)..."
if ! go install github.com/air-verse/air@latest; then
echo "⚠️ Warning: Failed to install air, but continuing..."
else
echo "✅ air installed successfully"
fi
else
echo "✅ air already installed"
fi
# Build the binary
echo "🔨 Building coolify binary..."
if ! go build -o coolify ./coolify; then
echo "❌ Error: Build failed"
exit 1
fi
echo "✅ Binary built successfully: ./coolify/coolify"
echo "🎉 Workspace setup complete!"
echo "🔥 Use the run script for hot reload during development"
-7
View File
@@ -1,7 +0,0 @@
{
"scripts": {
"setup": "./conductor-setup.sh",
"run": "~/go/bin/air"
},
"runScriptMode": "nonconcurrent"
}
+7 -5
View File
@@ -1,16 +1,20 @@
module github.com/coollabsio/coolify-cli
go 1.24.13
go 1.25.0
require (
github.com/adrg/xdg v0.5.3
github.com/creativeprojects/go-selfupdate v1.5.1
github.com/golang-jwt/jwt/v5 v5.3.1
github.com/hashicorp/go-version v1.7.0
github.com/mattn/go-isatty v0.0.20
github.com/olekukonko/tablewriter v1.1.2
github.com/spf13/cobra v1.10.1
github.com/spf13/pflag v1.0.10
github.com/spf13/viper v1.21.0
github.com/stretchr/testify v1.11.1
golang.org/x/crypto v0.50.0
golang.org/x/term v0.42.0
)
require (
@@ -33,7 +37,6 @@ require (
github.com/hashicorp/go-retryablehttp v0.7.8 // indirect
github.com/inconshreveable/mousetrap v1.1.0 // indirect
github.com/mattn/go-colorable v0.1.13 // indirect
github.com/mattn/go-isatty v0.0.20 // indirect
github.com/mattn/go-runewidth v0.0.19 // indirect
github.com/olekukonko/cat v0.0.0-20250911104152-50322a0618f6 // indirect
github.com/olekukonko/errors v1.1.0 // indirect
@@ -48,10 +51,9 @@ require (
github.com/ulikunitz/xz v0.5.15 // indirect
github.com/xanzy/go-gitlab v0.115.0 // indirect
go.yaml.in/yaml/v3 v3.0.4 // indirect
golang.org/x/crypto v0.45.0 // indirect
golang.org/x/oauth2 v0.32.0 // indirect
golang.org/x/sys v0.38.0 // indirect
golang.org/x/text v0.31.0 // indirect
golang.org/x/sys v0.43.0 // indirect
golang.org/x/text v0.36.0 // indirect
golang.org/x/time v0.14.0 // indirect
gopkg.in/check.v1 v1.0.0-20190902080502-41f04d3bba15 // indirect
gopkg.in/yaml.v3 v3.0.1 // indirect
+10 -8
View File
@@ -31,6 +31,8 @@ github.com/go-fed/httpsig v1.1.0 h1:9M+hb0jkEICD8/cAiNqEB66R87tTINszBRTjwjQzWcI=
github.com/go-fed/httpsig v1.1.0/go.mod h1:RCMrTZvN1bJYtofsG4rd5NaO5obxQ5xBkdiS7xsT7bM=
github.com/go-viper/mapstructure/v2 v2.4.0 h1:EBsztssimR/CONLSZZ04E8qAkxNYq4Qp9LvH92wZUgs=
github.com/go-viper/mapstructure/v2 v2.4.0/go.mod h1:oJDH3BJKyqBA2TXFhDsKDGDTlndYOZ6rGS0BRZIxGhM=
github.com/golang-jwt/jwt/v5 v5.3.1 h1:kYf81DTWFe7t+1VvL7eS+jKFVWaUnK9cB1qbwn63YCY=
github.com/golang-jwt/jwt/v5 v5.3.1/go.mod h1:fxCRLWMO43lRc8nhHWY6LGqRcf+1gQWArsqaEUEa5bE=
github.com/golang/protobuf v1.3.2/go.mod h1:6lQm79b+lXiMfvg/cZm0SGofjICqVBUtrP5yJMmIC1U=
github.com/google/go-cmp v0.5.2/go.mod h1:v8dTdLbMG2kIc/vJvl+f65V22dbkXbowE6jgT/gNBxE=
github.com/google/go-cmp v0.6.0 h1:ofyhxvXcZhMsU5ulbFiLKl/XBFqE1GSq7atu8tAmTRI=
@@ -103,8 +105,8 @@ go.yaml.in/yaml/v3 v3.0.4/go.mod h1:DhzuOOF2ATzADvBadXxruRBLzYTpT36CKvDb3+aBEFg=
golang.org/x/crypto v0.0.0-20190308221718-c2843e01d9a2/go.mod h1:djNgcEr1/C05ACkg1iLfiJU5Ep61QUkGW8qpdssI0+w=
golang.org/x/crypto v0.0.0-20200622213623-75b288015ac9/go.mod h1:LzIPMQfyMNhhGPhUkYOs5KpL4U8rLKemX1yGLhDgUto=
golang.org/x/crypto v0.0.0-20210513164829-c07d793c2f9a/go.mod h1:P+XmwS30IXTQdn5tA2iutPOUgjI07+tq3H3K9MVA1s8=
golang.org/x/crypto v0.45.0 h1:jMBrvKuj23MTlT0bQEOBcAE0mjg8mK9RXFhRH6nyF3Q=
golang.org/x/crypto v0.45.0/go.mod h1:XTGrrkGJve7CYK7J8PEww4aY7gM3qMCElcJQ8n8JdX4=
golang.org/x/crypto v0.50.0 h1:zO47/JPrL6vsNkINmLoo/PH1gcxpls50DNogFvB5ZGI=
golang.org/x/crypto v0.50.0/go.mod h1:3muZ7vA7PBCE6xgPX7nkzzjiUq87kRItoJQM1Yo8S+Q=
golang.org/x/net v0.0.0-20190311183353-d8887717615a/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20190404232315-eb5bcb51f2a3/go.mod h1:t9HGtf8HONx5eT2rtn7q6eTqICYqUVnKs3thJo3Qplg=
golang.org/x/net v0.0.0-20210226172049-e18ecbb05110/go.mod h1:m0MpNAwzfU5UDzcl9v0D8zg8gWTRqZa9RBIspLL5mdg=
@@ -116,15 +118,15 @@ golang.org/x/sys v0.0.0-20190412213103-97732733099d/go.mod h1:h1NjWce9XRLGQEsW7w
golang.org/x/sys v0.0.0-20201119102817-f84b799fce68/go.mod h1:h1NjWce9XRLGQEsW7wpKNCjG9DtNlClVuFLEZdDNbEs=
golang.org/x/sys v0.0.0-20220811171246-fbc7d0a398ab/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.6.0/go.mod h1:oPkhp1MJrh7nUepCBck5+mAzfO9JrbApNNgaTdGDITg=
golang.org/x/sys v0.38.0 h1:3yZWxaJjBmCWXqhN1qh02AkOnCQ1poK6oF+a7xWL6Gc=
golang.org/x/sys v0.38.0/go.mod h1:OgkHotnGiDImocRcuBABYBEXf8A9a87e/uXjp9XT3ks=
golang.org/x/sys v0.43.0 h1:Rlag2XtaFTxp19wS8MXlJwTvoh8ArU6ezoyFsMyCTNI=
golang.org/x/sys v0.43.0/go.mod h1:4GL1E5IUh+htKOUEOaiffhrAeqysfVGipDYzABqnCmw=
golang.org/x/term v0.0.0-20201126162022-7de9c90e9dd1/go.mod h1:bj7SfCRtBDWHUb9snDiAeCFNEtKQo2Wmx5Cou7ajbmo=
golang.org/x/term v0.37.0 h1:8EGAD0qCmHYZg6J17DvsMy9/wJ7/D/4pV/wfnld5lTU=
golang.org/x/term v0.37.0/go.mod h1:5pB4lxRNYYVZuTLmy8oR2BH8dflOR+IbTYFD8fi3254=
golang.org/x/term v0.42.0 h1:UiKe+zDFmJobeJ5ggPwOshJIVt6/Ft0rcfrXZDLWAWY=
golang.org/x/term v0.42.0/go.mod h1:Dq/D+snpsbazcBG5+F9Q1n2rXV8Ma+71xEjTRufARgY=
golang.org/x/text v0.3.0/go.mod h1:NqM8EUOU14njkJ3fqMW+pc6Ldnwhi/IjpwHt7yyuwOQ=
golang.org/x/text v0.3.3/go.mod h1:5Zoc/QRtKVWzQhOtBMvqHzDpF6irO9z98xDceosuGiQ=
golang.org/x/text v0.31.0 h1:aC8ghyu4JhP8VojJ2lEHBnochRno1sgL6nEi9WGFGMM=
golang.org/x/text v0.31.0/go.mod h1:tKRAlv61yKIjGGHX/4tP1LTbc13YSec1pxVEWXzfoeM=
golang.org/x/text v0.36.0 h1:JfKh3XmcRPqZPKevfXVpI1wXPTqbkE5f7JA92a55Yxg=
golang.org/x/text v0.36.0/go.mod h1:NIdBknypM8iqVmPiuco0Dh6P5Jcdk8lJL0CUebqK164=
golang.org/x/time v0.14.0 h1:MRx4UaLrDotUKUdCIqzPC48t1Y9hANFKIRpNx+Te8PI=
golang.org/x/time v0.14.0/go.mod h1:eL/Oa2bBBK0TkX57Fyni+NgnyQQN4LitPmob2Hjnqw4=
golang.org/x/tools v0.0.0-20180917221912-90fa682c2a6e/go.mod h1:n7NCudcB/nEzxVGmLbDWY5pfWTLqBcC2KZ6jyYvM4mQ=
+308
View File
@@ -0,0 +1,308 @@
package firewall
import (
"context"
"encoding/json"
"fmt"
"net"
"sort"
"strings"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// CooldAPIBasePath is the path prefix the coold REST router serves under.
// Mirrors `src/firewall/api.rs` in the coold repo.
const CooldAPIBasePath = "/api/v1/firewall"
// CooldAPITokenPath is the remote file coold reads its bearer token from.
// Kept in sync with internal/services/coold.go — the CLI falls back to
// reading this file over SSH when the user hasn't supplied --coold-token.
const CooldAPITokenPath = "/etc/coolify/api-token" //nolint:gosec // filesystem path, not a credential
// FetchCooldToken SSHes into host and reads the coold bearer token at
// CooldAPITokenPath. Each host generates its own random token at install
// time (see EnsureCooldAPITokenCommand), so per-host fetch is the default
// path when the user hasn't provided a global --coold-token override.
func FetchCooldToken(
ctx context.Context,
runner ssh.Runner,
host, user string,
sshPort int,
) (string, error) {
cmd := "cat " + CooldAPITokenPath
stdout, stderr, err := runner.Run(ctx, host, user, sshPort, cmd)
if err != nil {
return "", fmt.Errorf("fetch coold token from %s: %w (stderr: %s)",
host, err, strings.TrimSpace(stderr))
}
tok := strings.TrimSpace(stdout)
if tok == "" {
return "", fmt.Errorf("coold token on %s is empty — is coold installed? (expected at %s)",
host, CooldAPITokenPath)
}
return tok, nil
}
// cooldRulePayload mirrors the JSON shape coold's REST API expects on POST
// and returns on GET /allow. Kept aligned with coold/src/firewall/rule.rs:
// namespace is required (defaults to "default" on the wire), src/dst are
// string IPs, proto/port/id are omitted when absent.
type cooldRulePayload struct {
Namespace string `json:"namespace"`
Src string `json:"src"`
Dst string `json:"dst"`
Proto string `json:"proto,omitempty"`
Port uint16 `json:"port,omitempty"`
ID string `json:"id,omitempty"`
}
// toAllowRule converts a payload coming back from coold into the CLI's
// AllowRule. The host field is filled in by the caller (it is the mesh host
// the list came from, not part of the payload).
func (p cooldRulePayload) toAllowRule() (AllowRule, bool) {
src := net.ParseIP(p.Src)
dst := net.ParseIP(p.Dst)
if src == nil || dst == nil {
return AllowRule{}, false
}
ns := p.Namespace
if ns == "" {
ns = "default"
}
r := AllowRule{
Namespace: ns,
Src: src,
Dst: dst,
Proto: p.Proto,
Port: int(p.Port),
}
if p.ID != "" {
r.Comment = "cid:" + p.ID
}
return r, true
}
// allowRulePayload converts an AllowRule into the wire shape coold accepts.
// coold normalizes and computes the id itself, so we send only the tuple.
// Empty namespace is materialized as "default" on the wire so older coold
// builds with a default-only schema keep working.
func allowRulePayload(r AllowRule) cooldRulePayload {
ns := r.Namespace
if ns == "" {
ns = "default"
}
p := cooldRulePayload{
Namespace: ns,
Src: r.Src.String(),
Dst: r.Dst.String(),
Proto: r.Proto,
}
if r.Port > 0 {
p.Port = uint16(r.Port)
}
return p
}
// CooldApply POSTs r to coold's /allow endpoint on host. coold is reached
// via SSH-bounce: SSH into host, curl localhost wg0 mgmt IP. This is the
// transport of choice for the alpha because the CLI runs on a laptop that
// isn't a mesh peer — only hosts inside the wg0 network can reach coold.
func CooldApply(
ctx context.Context,
runner ssh.Runner,
host, user string,
sshPort, cooldPort int,
iface, token string,
r AllowRule,
) error {
body, err := json.Marshal(allowRulePayload(r))
if err != nil {
return fmt.Errorf("marshal allow rule: %w", err)
}
cmd := buildCurlAllow(iface, token, cooldPort, string(body))
if _, stderr, err := runner.Run(ctx, host, user, sshPort, cmd); err != nil {
return fmt.Errorf("coold apply on %s: %w (stderr: %s)",
host, err, strings.TrimSpace(stderr))
}
return nil
}
// CooldRevoke DELETEs rule id from coold on host. coold returns 204 even
// when the id is unknown, so missing rules are a silent no-op.
func CooldRevoke(
ctx context.Context,
runner ssh.Runner,
host, user string,
sshPort, cooldPort int,
iface, token, id string,
) error {
if id == "" {
return fmt.Errorf("coold revoke: empty id")
}
cmd := buildCurlRevoke(iface, token, cooldPort, id)
if _, stderr, err := runner.Run(ctx, host, user, sshPort, cmd); err != nil {
return fmt.Errorf("coold revoke on %s: %w (stderr: %s)",
host, err, strings.TrimSpace(stderr))
}
return nil
}
// CooldList GETs coold's /allow endpoint on host and returns the parsed
// rules. An empty namespace means "all namespaces"; a non-empty value is
// forwarded to coold as `?namespace=<ns>`. Missing coold (no wg0 interface)
// is treated as an empty slice so a partially-deployed mesh doesn't break
// `firewall list`.
func CooldList(
ctx context.Context,
runner ssh.Runner,
host, user string,
sshPort, cooldPort int,
iface, token, namespace string,
) ([]AllowRule, error) {
cmd := buildCurlList(iface, token, cooldPort, namespace)
stdout, stderr, err := runner.Run(ctx, host, user, sshPort, cmd)
if err != nil {
return nil, fmt.Errorf("coold list on %s: %w (stderr: %s)",
host, err, strings.TrimSpace(stderr))
}
stdout = strings.TrimSpace(stdout)
if stdout == "" {
return nil, nil
}
var payloads []cooldRulePayload
if err := json.Unmarshal([]byte(stdout), &payloads); err != nil {
return nil, fmt.Errorf("parse coold list on %s: %w (body: %s)",
host, err, stdout)
}
out := make([]AllowRule, 0, len(payloads))
for _, p := range payloads {
r, ok := p.toAllowRule()
if !ok {
continue
}
r.Host = host
out = append(out, r)
}
return out, nil
}
// CooldListAll fans CooldList across every host in parallel and returns a
// stably-sorted flattened slice plus the per-host results. tokenFor is
// called once per host on its worker goroutine — fail here and the host
// surfaces as a ServerResult.Err instead of polluting the rule slice. An
// empty namespace forwards `?namespace=` omitted (coold returns all).
func CooldListAll(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
sshPort, cooldPort int,
iface string,
tokenFor func(host string) (string, error),
concurrency int,
namespace string,
) ([]AllowRule, []ssh.ServerResult[[]AllowRule]) {
results := ssh.ForEachServer(ctx, hosts, concurrency,
func(ctx context.Context, host string) ([]AllowRule, error) {
token, err := tokenFor(host)
if err != nil {
return nil, err
}
return CooldList(ctx, runner, host, user, sshPort, cooldPort, iface, token, namespace)
})
var all []AllowRule
for _, r := range results {
all = append(all, r.Result...)
}
sort.Slice(all, func(i, j int) bool {
if all[i].Host != all[j].Host {
return all[i].Host < all[j].Host
}
if all[i].Namespace != all[j].Namespace {
return all[i].Namespace < all[j].Namespace
}
si, sj := all[i].Src.String(), all[j].Src.String()
if si != sj {
return si < sj
}
di, dj := all[i].Dst.String(), all[j].Dst.String()
if di != dj {
return di < dj
}
return all[i].Port < all[j].Port
})
return all, results
}
// shellSingleQuote wraps s in POSIX-shell single quotes, escaping any
// embedded single quotes. Used to embed JSON bodies and tokens into shell
// commands without breaking quoting.
func shellSingleQuote(s string) string {
return "'" + strings.ReplaceAll(s, "'", `'\''`) + "'"
}
// DefaultWGInterface is the WireGuard interface name the firewall CLI
// assumes when no override is supplied. Matches the default of
// `coolify init --wg-interface`.
const DefaultWGInterface = "wg0"
// mgmtIPScript discovers coold's bind IP on the remote host by reading the
// first IPv4 address on the host's WireGuard interface. Emitted as part of
// every curl command so the CLI doesn't need to track per-host mgmt IPs
// (they are already encoded in the host's own WG interface).
func mgmtIPScript(iface string) string {
return fmt.Sprintf(
`MGMT=$(ip -4 -o addr show %[1]s 2>/dev/null | awk '{print $4}' | cut -d/ -f1); `+
`test -n "$MGMT" || { echo "coold mgmt IP (%[1]s) not found on $(hostname) — is coold installed?" >&2; exit 1; }; `,
iface)
}
// mgmtIPScriptSoft is the same as mgmtIPScript but treats a missing WG
// interface as "no rules" rather than a failure. Used by list so a host
// without coold is simply absent from the output instead of aborting the
// whole fanout.
func mgmtIPScriptSoft(iface string) string {
return fmt.Sprintf(
`MGMT=$(ip -4 -o addr show %s 2>/dev/null | awk '{print $4}' | cut -d/ -f1); `+
`if [ -z "$MGMT" ]; then echo '[]'; exit 0; fi; `,
iface)
}
// buildCurlAllow returns the shell one-liner that POSTs body to coold.
// Token is embedded inline in the -H header; on the remote it is briefly
// visible in /proc/<curl-pid>/cmdline to root only, for the ~ms lifetime of
// the curl invocation. Acceptable for alpha; TLS + stdin-fed tokens are a
// follow-up.
func buildCurlAllow(iface, token string, port int, body string) string {
return mgmtIPScript(iface) +
`curl -fsS --max-time 10 ` +
`-H ` + shellSingleQuote("Authorization: Bearer "+token) + ` ` +
`-H 'Content-Type: application/json' ` +
`-X POST -d ` + shellSingleQuote(body) + ` ` +
fmt.Sprintf(`"http://$MGMT:%d%s/allow"`, port, CooldAPIBasePath)
}
// buildCurlRevoke returns the shell one-liner that DELETEs rule id.
func buildCurlRevoke(iface, token string, port int, id string) string {
return mgmtIPScript(iface) +
`curl -fsS --max-time 10 -o /dev/null ` +
`-H ` + shellSingleQuote("Authorization: Bearer "+token) + ` ` +
`-X DELETE ` +
fmt.Sprintf(`"http://$MGMT:%d%s/allow/%s"`, port, CooldAPIBasePath, id)
}
// buildCurlList returns the shell one-liner that GETs /allow. A missing
// WG interface returns an empty JSON array so the caller sees "no rules"
// instead of a transport error. A non-empty namespace is forwarded as
// ?namespace=<ns>.
func buildCurlList(iface, token string, port int, namespace string) string {
query := ""
if namespace != "" {
query = "?namespace=" + namespace
}
return mgmtIPScriptSoft(iface) +
`curl -fsS --max-time 10 ` +
`-H ` + shellSingleQuote("Authorization: Bearer "+token) + ` ` +
fmt.Sprintf(`"http://$MGMT:%d%s/allow%s"`, port, CooldAPIBasePath, query)
}
+200
View File
@@ -0,0 +1,200 @@
package firewall
import (
"context"
"net"
"strings"
"sync"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// fakeCooldRunner is a minimal Runner for client-level tests. It captures
// every command and replies based on substring-matched canned responses.
// mu guards calls against concurrent appends from ForEachServer's parallel
// goroutines.
type fakeCooldRunner struct {
mu sync.Mutex
responses map[string]string
calls []string
}
func (f *fakeCooldRunner) Run(_ context.Context, _, _ string, _ int, cmd string) (string, string, error) {
f.mu.Lock()
f.calls = append(f.calls, cmd)
f.mu.Unlock()
for sub, resp := range f.responses {
if strings.Contains(cmd, sub) {
return resp, "", nil
}
}
return "", "", nil
}
var _ ssh.Runner = (*fakeCooldRunner)(nil)
func TestShellSingleQuote_Escapes(t *testing.T) {
assert.Equal(t, `'plain'`, shellSingleQuote("plain"))
assert.Equal(t, `'it'\''s'`, shellSingleQuote("it's"))
}
func TestBuildCurlAllow_Shape(t *testing.T) {
cmd := buildCurlAllow("wg0", "tok-xyz", 8443, `{"src":"10.0.0.1","dst":"10.0.0.2"}`)
assert.Contains(t, cmd, "ip -4 -o addr show wg0")
assert.Contains(t, cmd, "curl -fsS")
assert.Contains(t, cmd, "Authorization: Bearer tok-xyz")
assert.Contains(t, cmd, "Content-Type: application/json")
assert.Contains(t, cmd, "-X POST")
assert.Contains(t, cmd, `{"src":"10.0.0.1","dst":"10.0.0.2"}`)
assert.Contains(t, cmd, `:8443/api/v1/firewall/allow`)
}
func TestBuildCurlRevoke_Shape(t *testing.T) {
cmd := buildCurlRevoke("wg0", "tok-xyz", 8443, "abc123def456")
assert.Contains(t, cmd, "curl -fsS")
assert.Contains(t, cmd, "-X DELETE")
assert.Contains(t, cmd, "Authorization: Bearer tok-xyz")
assert.Contains(t, cmd, `:8443/api/v1/firewall/allow/abc123def456`)
}
func TestBuildCurlList_SoftMgmtIP(t *testing.T) {
cmd := buildCurlList("wg0", "tok-xyz", 8443, "")
// Missing wg0 yields an empty array and success exit.
assert.Contains(t, cmd, `echo '[]'; exit 0`)
assert.Contains(t, cmd, "Authorization: Bearer tok-xyz")
assert.Contains(t, cmd, `:8443/api/v1/firewall/allow`)
// Empty namespace → no query string.
assert.NotContains(t, cmd, "namespace=")
}
// TestBuildCurlList_WithNamespace verifies that a non-empty namespace is
// forwarded as ?namespace=<ns> so coold can filter on its side.
func TestBuildCurlList_WithNamespace(t *testing.T) {
cmd := buildCurlList("wg0", "tok-xyz", 8443, "alpha")
assert.Contains(t, cmd, `:8443/api/v1/firewall/allow?namespace=alpha`)
}
func TestCooldApply_SendsJSONPayload(t *testing.T) {
fr := &fakeCooldRunner{}
r := AllowRule{
Src: net.ParseIP("10.0.0.1"), Dst: net.ParseIP("10.0.0.2"),
Proto: "tcp", Port: 80,
}
err := CooldApply(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", r)
require.NoError(t, err)
assert.Len(t, fr.calls, 1)
assert.Contains(t, fr.calls[0], `"src":"10.0.0.1"`)
assert.Contains(t, fr.calls[0], `"dst":"10.0.0.2"`)
assert.Contains(t, fr.calls[0], `"proto":"tcp"`)
assert.Contains(t, fr.calls[0], `"port":80`)
}
func TestCooldApply_OmitsProtoWhenEmpty(t *testing.T) {
fr := &fakeCooldRunner{}
r := AllowRule{
Src: net.ParseIP("10.0.0.1"), Dst: net.ParseIP("10.0.0.2"),
}
err := CooldApply(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", r)
require.NoError(t, err)
// omitempty drops zero port and empty proto — avoids tripping coold's
// "port requires proto" validation.
assert.NotContains(t, fr.calls[0], `"proto"`)
assert.NotContains(t, fr.calls[0], `"port"`)
}
func TestCooldRevoke_RejectsEmptyID(t *testing.T) {
fr := &fakeCooldRunner{}
err := CooldRevoke(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", "")
require.Error(t, err)
assert.Empty(t, fr.calls, "no SSH call for empty id")
}
func TestCooldList_ParsesJSON(t *testing.T) {
fr := &fakeCooldRunner{responses: map[string]string{
"/api/v1/firewall/allow": `[
{"src":"10.0.0.1","dst":"10.0.0.2","proto":"tcp","port":80,"id":"abc123def456"},
{"src":"10.0.0.3","dst":"10.0.0.4"}
]`,
}}
rules, err := CooldList(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", "")
require.NoError(t, err)
assert.Len(t, rules, 2)
assert.Equal(t, "h1", rules[0].Host)
assert.Equal(t, "cid:abc123def456", rules[0].Comment)
assert.Equal(t, "tcp", rules[0].Proto)
assert.Equal(t, 80, rules[0].Port)
// Rule without proto/port/id comes through with zero values, no cid.
assert.Empty(t, rules[1].Proto)
assert.Equal(t, 0, rules[1].Port)
assert.Empty(t, rules[1].Comment)
}
func TestCooldList_EmptyBody(t *testing.T) {
fr := &fakeCooldRunner{}
rules, err := CooldList(context.Background(), fr, "h1", "root", 22, 8443, "wg0", "t", "")
require.NoError(t, err)
assert.Empty(t, rules)
}
func TestCooldListAll_SortsByHost(t *testing.T) {
// Fake returns the same JSON regardless of host; the sort guarantees the
// fanout output is stable across runs.
fr := &fakeCooldRunner{responses: map[string]string{
"/api/v1/firewall/allow": `[{"src":"10.0.0.1","dst":"10.0.0.2","proto":"tcp","port":80,"id":"aaa111111111"}]`,
}}
tokenFor := func(string) (string, error) { return "t", nil }
rules, results := CooldListAll(context.Background(), fr,
[]string{"hB", "hA"}, "root", 22, 8443, "wg0", tokenFor, 2, "")
assert.Len(t, rules, 2)
assert.Equal(t, "hA", rules[0].Host)
assert.Equal(t, "hB", rules[1].Host)
assert.Len(t, results, 2)
}
func TestFetchCooldToken_ReadsFile(t *testing.T) {
fr := &fakeCooldRunner{responses: map[string]string{
"/etc/coolify/api-token": "deadbeefcafe\n",
}}
tok, err := FetchCooldToken(context.Background(), fr, "h1", "root", 22)
require.NoError(t, err)
assert.Equal(t, "deadbeefcafe", tok)
}
func TestFetchCooldToken_EmptyErrors(t *testing.T) {
fr := &fakeCooldRunner{}
_, err := FetchCooldToken(context.Background(), fr, "h1", "root", 22)
require.Error(t, err)
assert.Contains(t, err.Error(), "is empty")
}
func TestCooldListAll_PropagatesTokenFetchError(t *testing.T) {
fr := &fakeCooldRunner{responses: map[string]string{
"/api/v1/firewall/allow": `[]`,
}}
tokenFor := func(h string) (string, error) {
if h == "hBad" {
return "", assertError("no token")
}
return "t", nil
}
_, results := CooldListAll(context.Background(), fr,
[]string{"hOk", "hBad"}, "root", 22, 8443, "wg0", tokenFor, 2, "")
var okCount, errCount int
for _, r := range results {
if r.Err != nil {
errCount++
} else {
okCount++
}
}
assert.Equal(t, 1, okCount)
assert.Equal(t, 1, errCount)
}
type assertError string
func (e assertError) Error() string { return string(e) }
+173
View File
@@ -0,0 +1,173 @@
package firewall
import (
"context"
"fmt"
"net"
"sort"
"strings"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// Container is a single running podman container on one mesh host and one
// namespace (podman bridge network).
type Container struct {
Host string // SSH host the container runs on
Namespace string // mesh namespace (podman network is coolify-<ns>-mesh)
ID string // short (12-char) podman ID
Name string // podman container name
IP net.IP // IP on the coolify-<ns>-mesh bridge network
}
// discoverScript prints one `id|name|ip` line per running container on the
// target network. Piped through `podman inspect` to resolve the per-network
// IP because `podman ps` doesn't surface that directly. `|| true` keeps the
// script from erroring when podman is absent or the network has no members.
func discoverScript(networkName string) string {
return fmt.Sprintf(
`podman ps --filter network=%[1]s --format '{{.ID}}|{{.Names}}' 2>/dev/null | `+
`while IFS='|' read id name; do `+
` [ -z "$id" ] && continue; `+
` ip=$(podman inspect --format '{{(index .NetworkSettings.Networks %[2]q).IPAddress}}' "$id" 2>/dev/null); `+
` printf '%%s|%%s|%%s\n' "$id" "$name" "$ip"; `+
`done || true`,
networkName, networkName)
}
// ParseDiscoverLine parses one `id|name|ip` line from discoverScript.
// Returns (_, false) when the line is blank or malformed.
func ParseDiscoverLine(line string) (id, name string, ip net.IP, ok bool) {
parts := strings.SplitN(strings.TrimSpace(line), "|", 3)
if len(parts) != 3 {
return "", "", nil, false
}
if parts[0] == "" || parts[1] == "" || parts[2] == "" {
return "", "", nil, false
}
ip = net.ParseIP(parts[2])
if ip == nil {
return "", "", nil, false
}
id = parts[0]
if len(id) > 12 {
id = id[:12]
}
return id, parts[1], ip, true
}
// DiscoverContainers SSHes into host and returns every container on
// networkName (the podman bridge backing namespace) with its bridge IP.
func DiscoverContainers(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
namespace, networkName string,
) ([]Container, error) {
stdout, _, err := runner.Run(ctx, host, user, port, discoverScript(networkName))
if err != nil {
return nil, fmt.Errorf("discover containers on %s: %w", host, err)
}
var out []Container
for _, line := range strings.Split(stdout, "\n") {
id, name, ip, ok := ParseDiscoverLine(line)
if !ok {
continue
}
out = append(out, Container{
Host: host, Namespace: namespace,
ID: id, Name: name, IP: ip,
})
}
sort.Slice(out, func(i, j int) bool {
if out[i].Host != out[j].Host {
return out[i].Host < out[j].Host
}
if out[i].Namespace != out[j].Namespace {
return out[i].Namespace < out[j].Namespace
}
return out[i].Name < out[j].Name
})
return out, nil
}
// DiscoverAll runs DiscoverContainers across every host in parallel.
// Returns a flattened, sort-stable slice plus the per-host results so
// callers can surface partial failures.
func DiscoverAll(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
port int,
namespace, networkName string,
concurrency int,
) ([]Container, []ssh.ServerResult[[]Container]) {
results := ssh.ForEachServer(ctx, hosts, concurrency,
func(ctx context.Context, host string) ([]Container, error) {
return DiscoverContainers(ctx, runner, host, user, port, namespace, networkName)
})
var all []Container
for _, r := range results {
all = append(all, r.Result...)
}
sort.Slice(all, func(i, j int) bool {
if all[i].Host != all[j].Host {
return all[i].Host < all[j].Host
}
if all[i].Namespace != all[j].Namespace {
return all[i].Namespace < all[j].Namespace
}
return all[i].Name < all[j].Name
})
return all, results
}
// DiscoverAllNamespaces runs DiscoverAll for every (namespace, network) pair
// and merges the results. Used by `containers --all-namespaces` and by the
// allow/revoke resolver so references can be matched across every namespace
// the user might have set up on the mesh.
func DiscoverAllNamespaces(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
port int,
namespaces []string,
networkFor func(ns string) string,
concurrency int,
) ([]Container, []ssh.ServerResult[[]Container]) {
var (
all []Container
allResults []ssh.ServerResult[[]Container]
seenHosts = map[string]struct{}{}
)
for _, ns := range namespaces {
nsContainers, results := DiscoverAll(ctx, runner, hosts, user, port,
ns, networkFor(ns), concurrency)
all = append(all, nsContainers...)
for _, r := range results {
// Keep only the first error per host to avoid N-duplicate warnings
// (most errors — SSH failures — are host-level, not per-namespace).
if r.Err == nil {
continue
}
if _, ok := seenHosts[r.Host]; ok {
continue
}
seenHosts[r.Host] = struct{}{}
allResults = append(allResults, r)
}
}
sort.Slice(all, func(i, j int) bool {
if all[i].Host != all[j].Host {
return all[i].Host < all[j].Host
}
if all[i].Namespace != all[j].Namespace {
return all[i].Namespace < all[j].Namespace
}
return all[i].Name < all[j].Name
})
return all, allResults
}
+129
View File
@@ -0,0 +1,129 @@
package firewall
import (
"context"
"strings"
"sync"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestParseDiscoverLine(t *testing.T) {
tests := []struct {
line string
wantOk bool
wantID string
wantNm string
wantIP string
}{
{"abcdef123456|web|10.210.0.10", true, "abcdef123456", "web", "10.210.0.10"},
{"abcdef1234567890|web|10.210.0.10", true, "abcdef123456", "web", "10.210.0.10"},
{"|name|10.0.0.1", false, "", "", ""},
{"id|name|", false, "", "", ""},
{"id|name|not-an-ip", false, "", "", ""},
{"", false, "", "", ""},
{"a|b", false, "", "", ""},
}
for _, tt := range tests {
t.Run(tt.line, func(t *testing.T) {
id, name, ip, ok := ParseDiscoverLine(tt.line)
assert.Equal(t, tt.wantOk, ok)
if !ok {
return
}
assert.Equal(t, tt.wantID, id)
assert.Equal(t, tt.wantNm, name)
assert.Equal(t, tt.wantIP, ip.String())
})
}
}
// fakeRunner is a deterministic ssh.Runner for firewall tests. Responses
// map a command substring to its canned stdout. mu guards calls against
// concurrent appends from ForEachServer's parallel goroutines.
type fakeRunner struct {
mu sync.Mutex
responses map[string]string
calls []string
}
func (f *fakeRunner) Run(_ context.Context, _, _ string, _ int, cmd string) (string, string, error) {
f.mu.Lock()
f.calls = append(f.calls, cmd)
f.mu.Unlock()
for sub, resp := range f.responses {
if strings.Contains(cmd, sub) {
return resp, "", nil
}
}
return "", "", nil
}
func TestDiscoverContainers(t *testing.T) {
r := &fakeRunner{responses: map[string]string{
"podman ps": "abc111111111|web|10.210.0.10\ndef222222222|api|10.210.0.11\n\n",
}}
got, err := DiscoverContainers(context.Background(), r, "h1", "root", 22,
"default", "coolify-default-mesh")
require.NoError(t, err)
assert.Len(t, got, 2)
assert.Equal(t, "api", got[0].Name) // sorted by name
assert.Equal(t, "web", got[1].Name)
assert.Equal(t, "h1", got[0].Host)
assert.Equal(t, "default", got[0].Namespace)
assert.Equal(t, "10.210.0.11", got[0].IP.String())
}
func TestDiscoverContainers_EmptyOutput(t *testing.T) {
r := &fakeRunner{responses: map[string]string{}}
got, err := DiscoverContainers(context.Background(), r, "h1", "root", 22,
"default", "coolify-default-mesh")
require.NoError(t, err)
assert.Empty(t, got)
}
func TestDiscoverContainers_BadLinesSkipped(t *testing.T) {
r := &fakeRunner{responses: map[string]string{
"podman ps": "abc111111111|web|10.210.0.10\ngarbage\n|noid|1.1.1.1\n",
}}
got, err := DiscoverContainers(context.Background(), r, "h1", "root", 22,
"default", "coolify-default-mesh")
require.NoError(t, err)
assert.Len(t, got, 1)
assert.Equal(t, "web", got[0].Name)
}
func TestDiscoverAll_Sorted(t *testing.T) {
r := &fakeRunner{responses: map[string]string{
"podman ps": "aaa111111111|x|10.210.0.10",
}}
all, perHost := DiscoverAll(context.Background(), r,
[]string{"h2", "h1"}, "root", 22,
"default", "coolify-default-mesh", 2)
assert.Len(t, all, 2)
assert.Equal(t, "h1", all[0].Host)
assert.Equal(t, "h2", all[1].Host)
assert.Equal(t, "default", all[0].Namespace)
assert.Len(t, perHost, 2)
}
// TestDiscoverAllNamespaces_MergesAcrossNamespaces verifies that the
// multi-namespace discover fanout emits containers for every (ns, host)
// pair and stamps them with the correct namespace.
func TestDiscoverAllNamespaces_MergesAcrossNamespaces(t *testing.T) {
r := &fakeRunner{responses: map[string]string{
// Same podman ps response for every namespace — we only care that the
// namespace label is applied correctly after parsing.
"podman ps": "aaa111111111|web|10.210.0.10",
}}
networkFor := func(ns string) string { return "coolify-" + ns + "-mesh" }
all, _ := DiscoverAllNamespaces(context.Background(), r,
[]string{"h1"}, "root", 22,
[]string{"default", "alpha"}, networkFor, 2)
assert.Len(t, all, 2)
// Sorted by host, then namespace — alpha before default.
assert.Equal(t, "alpha", all[0].Namespace)
assert.Equal(t, "default", all[1].Namespace)
}
+56
View File
@@ -0,0 +1,56 @@
// Package firewall implements the `coolify firewall` command logic: per-host
// container discovery (SSH+podman) and the SSH-bounced REST client that
// drives the coold agent's firewall surface on each mesh host.
//
// Rule-rendering and iptables IO live entirely in coold now (see the coold
// repo, `src/firewall/`). The CLI's job is to resolve endpoints, compute
// stable rule identities, and POST/DELETE/GET against coold over SSH. Rules
// go on the host that owns the destination IP, matching CONTROL_PLANE.md §3.
package firewall
import (
"crypto/sha256"
"encoding/hex"
"fmt"
"net"
"strings"
)
// AllowRule is a single cross-host container allow entry.
//
// The rule lives on the host that owns Dst's container subnet (the default-
// deny jump fires on `-d <subnet> -j COOLIFY-INTRA`). Src may belong to any
// host in the mesh. Proto/Port are optional; zero values mean "any".
//
// Namespace qualifies the tuple so identical src/dst/proto/port pairs in
// different namespaces produce different rule IDs and are managed
// independently. Empty namespace is normalized to "default" at the transport
// boundary for legacy coold peers.
type AllowRule struct {
Host string // host that owns Dst's container subnet
Namespace string // e.g. "default", "alpha"
Src net.IP
Dst net.IP
Proto string // "tcp" | "udp" | ""
Port int // 0 = any
Comment string // "cid:<12-hex>" stable identity for list/revoke
}
// ComputeID returns a 12-hex stable identity hash over
// (namespace, src, dst, proto, port). Used as the rule comment so `list` can
// display it and `revoke --from ... --to ... --port ...` finds the right rule
// without needing to parse.
//
// Byte-compatible with coold's ComputeID_ (src/firewall/rule.rs): namespace
// defaults to "default" when empty, proto lowercased (empty when unset), port
// rendered as 0 when unset. Mixed writers (CLI + coold) produce identical IDs
// for identical tuples.
func ComputeID(namespace string, src, dst net.IP, proto string, port int) string {
if namespace == "" {
namespace = "default"
}
h := sha256.New()
fmt.Fprintf(h, "%s|%s|%s|%s|%d",
namespace, src.String(), dst.String(), strings.ToLower(proto), port)
return hex.EncodeToString(h.Sum(nil))[:12]
}
+45
View File
@@ -0,0 +1,45 @@
package firewall
import (
"net"
"testing"
"github.com/stretchr/testify/assert"
)
func TestComputeID_Stable(t *testing.T) {
a := ComputeID("default", net.ParseIP("10.210.0.10"), net.ParseIP("10.210.1.10"), "tcp", 80)
b := ComputeID("default", net.ParseIP("10.210.0.10"), net.ParseIP("10.210.1.10"), "tcp", 80)
assert.Equal(t, a, b)
assert.Len(t, a, 12)
}
func TestComputeID_CaseInsensitiveProto(t *testing.T) {
a := ComputeID("default", net.ParseIP("1.1.1.1"), net.ParseIP("2.2.2.2"), "TCP", 80)
b := ComputeID("default", net.ParseIP("1.1.1.1"), net.ParseIP("2.2.2.2"), "tcp", 80)
assert.Equal(t, a, b)
}
func TestComputeID_DifferentInputsDifferent(t *testing.T) {
a := ComputeID("default", net.ParseIP("1.1.1.1"), net.ParseIP("2.2.2.2"), "tcp", 80)
b := ComputeID("default", net.ParseIP("1.1.1.1"), net.ParseIP("2.2.2.2"), "tcp", 443)
assert.NotEqual(t, a, b)
}
// TestComputeID_DifferentNamespacesDifferent verifies that identical
// src/dst/proto/port tuples in different namespaces produce different IDs —
// this is the whole point of per-namespace rule identity.
func TestComputeID_DifferentNamespacesDifferent(t *testing.T) {
a := ComputeID("default", net.ParseIP("10.0.0.1"), net.ParseIP("10.0.0.2"), "tcp", 80)
b := ComputeID("alpha", net.ParseIP("10.0.0.1"), net.ParseIP("10.0.0.2"), "tcp", 80)
assert.NotEqual(t, a, b)
}
// TestComputeID_EmptyNamespaceMatchesDefault guards the wire-compat rule:
// an empty namespace must hash the same as "default" so older coold builds
// and newer CLI callers agree on the same ID.
func TestComputeID_EmptyNamespaceMatchesDefault(t *testing.T) {
empty := ComputeID("", net.ParseIP("10.0.0.1"), net.ParseIP("10.0.0.2"), "tcp", 80)
def := ComputeID("default", net.ParseIP("10.0.0.1"), net.ParseIP("10.0.0.2"), "tcp", 80)
assert.Equal(t, empty, def)
}
+39
View File
@@ -0,0 +1,39 @@
package models
// ContainerRow is a table-friendly row for `coolify firewall containers`.
type ContainerRow struct {
Host string `json:"host"`
Namespace string `json:"namespace"`
ID string `json:"id"`
Name string `json:"name"`
IP string `json:"ip"`
}
// AllowRuleRow is a table-friendly row for `coolify firewall list`.
type AllowRuleRow struct {
Host string `json:"host"`
Namespace string `json:"namespace"`
ID string `json:"id"`
Src string `json:"src"`
Dst string `json:"dst"`
Proto string `json:"proto,omitempty"`
Port int `json:"port,omitempty"`
Comment string `json:"comment,omitempty"`
}
// FirewallContainersOutput is the JSON output for `firewall containers`.
type FirewallContainersOutput struct {
Containers []ContainerRow `json:"containers"`
Errors []string `json:"errors,omitempty"`
}
// FirewallListOutput is the JSON output for `firewall list`.
type FirewallListOutput struct {
Rules []AllowRuleRow `json:"rules"`
Errors []string `json:"errors,omitempty"`
}
// FirewallAllowOutput is the JSON output for `firewall allow` / `revoke`.
type FirewallAllowOutput struct {
Rules []AllowRuleRow `json:"rules"`
}
+48
View File
@@ -0,0 +1,48 @@
package models
// PlanActionRow is a table-friendly row for the plan output.
type PlanActionRow struct {
Server string `json:"server"`
Action string `json:"action"`
Detail string `json:"detail"`
}
// PlanSkippedRow is a table-friendly row for actions the intent filter
// suppressed (shown in the plan preview so operators can see what would have
// run and why).
type PlanSkippedRow struct {
Server string `json:"server"`
Action string `json:"action"`
Reason string `json:"reason"`
}
// ApplyResultRow is a table-friendly row for the apply result output.
type ApplyResultRow struct {
Server string `json:"server"`
Action string `json:"action"`
Status string `json:"status"`
Detail string `json:"detail,omitempty"`
}
// VerifyResultRow is a table-friendly row for post-apply verification.
type VerifyResultRow struct {
Server string `json:"server"`
WireGuardIP string `json:"wireguard_ip"`
PeerCount int `json:"peer_count"`
Status string `json:"status"`
}
// PlanOutput is the structured JSON output for the plan command.
type PlanOutput struct {
Servers []string `json:"servers"`
Intent string `json:"intent,omitempty"`
Actions []PlanActionRow `json:"actions"`
Skipped []PlanSkippedRow `json:"skipped,omitempty"`
Warnings []string `json:"warnings,omitempty"`
}
// ApplyOutput is the structured JSON output for the apply command.
type ApplyOutput struct {
Results []ApplyResultRow `json:"results"`
Verified []VerifyResultRow `json:"verified"`
}
+9
View File
@@ -59,6 +59,15 @@ func (s *ApplicationService) Delete(ctx context.Context, uuid string) error {
return nil
}
// DeletePreview deletes a preview deployment for an application
func (s *ApplicationService) DeletePreview(ctx context.Context, appUUID, prID string) error {
err := s.client.Delete(ctx, fmt.Sprintf("applications/%s/previews/%s", appUUID, prID))
if err != nil {
return fmt.Errorf("failed to delete preview %s for application %s: %w", prID, appUUID, err)
}
return nil
}
// Start starts an application (initiates deployment)
func (s *ApplicationService) Start(ctx context.Context, uuid string, force bool, instantDeploy bool) (*models.ApplicationLifecycleResponse, error) {
var resp models.ApplicationLifecycleResponse
+48
View File
@@ -402,6 +402,54 @@ func TestApplicationService_Delete_Error(t *testing.T) {
assert.Contains(t, err.Error(), "failed to delete application")
}
func TestApplicationService_DeletePreview_Success(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
assert.Equal(t, "/api/v1/applications/app-uuid-123/previews/42", r.URL.Path)
assert.Equal(t, "DELETE", r.Method)
assert.Equal(t, "Bearer test-token", r.Header.Get("Authorization"))
w.WriteHeader(http.StatusOK)
_, _ = w.Write([]byte(`{"message":"Preview deletion request queued."}`))
}))
defer server.Close()
client := api.NewClient(server.URL, "test-token")
svc := NewApplicationService(client)
err := svc.DeletePreview(context.Background(), "app-uuid-123", "42")
require.NoError(t, err)
}
func TestApplicationService_DeletePreview_NotFound(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusNotFound)
_, _ = w.Write([]byte(`{"message":"Preview not found."}`))
}))
defer server.Close()
client := api.NewClient(server.URL, "test-token")
svc := NewApplicationService(client)
err := svc.DeletePreview(context.Background(), "app-uuid-123", "999")
require.Error(t, err)
assert.Contains(t, err.Error(), "failed to delete preview")
}
func TestApplicationService_DeletePreview_ServerError(t *testing.T) {
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, _ *http.Request) {
w.WriteHeader(http.StatusInternalServerError)
_, _ = w.Write([]byte(`{"message":"internal server error"}`))
}))
defer server.Close()
client := api.NewClient(server.URL, "test-token")
svc := NewApplicationService(client)
err := svc.DeletePreview(context.Background(), "app-uuid-123", "42")
require.Error(t, err)
assert.Contains(t, err.Error(), "failed to delete preview")
}
func TestApplicationService_Start(t *testing.T) {
deploymentUUID := "deploy-uuid-123"
server := httptest.NewServer(http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
+42
View File
@@ -0,0 +1,42 @@
package services
import "fmt"
// BuilderWorkDir is the scratch root coold creates per-build subdirectories
// in when it dispatches a `BuildRequest`. Cleaned per-request by coold.
const BuilderWorkDir = "/var/lib/coolify-builder/work"
// BuilderBinaryPath is the path to the builder binary coold spawns as a
// short-lived subprocess under a `systemd-run --scope` transient unit. No
// long-running builder daemon exists on the host.
const BuilderBinaryPath = "/usr/local/bin/builder"
// BuilderInstallCommand returns a shell snippet that installs buildah + git
// (required by the builder pipeline), ensures the work directory exists,
// and downloads the builder binary from the GitHub release for the given
// version tag. The version tag should track the coold release — builder
// and coold ship from the same workspace.
func BuilderInstallCommand(version string) string {
return fmt.Sprintf(`set -e
DEBIAN_FRONTEND=noninteractive apt-get update -qq 2>/dev/null
DEBIAN_FRONTEND=noninteractive apt-get install -y \
-o Dpkg::Options::="--force-confold" \
buildah git ca-certificates 2>&1 >/dev/null
mkdir -p %[1]s
ARCH_RAW=$(uname -m)
case "$ARCH_RAW" in
x86_64) ARCH=amd64 ;;
aarch64) ARCH=arm64 ;;
*) echo "unsupported arch: $ARCH_RAW" >&2; exit 1 ;;
esac
URL="https://github.com/coollabsio/coold/releases/download/%[2]s/builder-linux-${ARCH}.tar.gz"
DLDIR=$(mktemp -d)
trap 'rm -rf "$DLDIR"' EXIT
curl -fsSL --retry 3 --max-time 120 -o "$DLDIR/builder.tar.gz" "$URL"
tar -xzf "$DLDIR/builder.tar.gz" -C "$DLDIR"
test -f "$DLDIR/builder" || { echo "builder binary not found in tarball" >&2; exit 1; }
install -m 0755 "$DLDIR/builder" %[3]s.tmp
mv %[3]s.tmp %[3]s
echo '%[2]s' > %[3]s.version`,
BuilderWorkDir, version, BuilderBinaryPath)
}
+188
View File
@@ -0,0 +1,188 @@
package services
import (
"fmt"
"net"
"strings"
)
// DefaultCooldDNSZone is the DNS zone served by coold's embedded resolver.
// `.internal` is RFC 6761 reserved — safe from public-TLD collisions.
const DefaultCooldDNSZone = "coolify.internal"
// CooldAPIPort is the TCP port coold's firewall REST API binds on wg0.
const CooldAPIPort = 8443
// CooldAPITokenPath is the on-host path where coold reads the bearer token
// for the firewall REST API. The file is generated once by `coolify init
// apply --install-coold` (random 32-byte hex via `openssl rand`) and kept
// mode 0600.
const CooldAPITokenPath = "/etc/coolify/api-token" //nolint:gosec // filesystem path, not a credential
// CooldNamespace describes one namespace for coold's env var. coold's
// embedded DNS binds <BridgeGateway>:53 per namespace, and its sync loop
// iterates `Network` to discover containers.
type CooldNamespace struct {
Name string // e.g. "default", "alpha"
Network string // e.g. "coolify-default-mesh" — podman bridge name
BridgeGateway net.IP // the .1 of that namespace's per-host container subnet
}
// CooldNamespacesEnvValue renders the COOLD_NAMESPACES env value. Shape:
//
// default:coolify-default-mesh:10.210.0.1,alpha:coolify-alpha-mesh:10.220.0.1
//
// Triples are comma-separated; fields within a triple are colon-separated.
// Empty slice yields empty string so callers can omit the env var entirely.
func CooldNamespacesEnvValue(ns []CooldNamespace) string {
parts := make([]string, 0, len(ns))
for _, n := range ns {
parts = append(parts, fmt.Sprintf("%s:%s:%s", n.Name, n.Network, n.BridgeGateway))
}
return strings.Join(parts, ",")
}
// SchedulerConfig carries optional scheduler connectivity injected into the coold unit
// for non-central hosts. nil means no scheduler env vars are emitted.
type SchedulerConfig struct {
URL string // e.g. "http://100.64.0.1:6443"
JWTPath string // e.g. "/etc/coolify/host-jwt"
}
// BuilderConfig carries the builder-capability env vars coold needs when it
// spawns build subprocesses. nil means the capability is disabled and no
// COOLD_BUILDER_* env vars are emitted.
type BuilderConfig struct {
Capacity int // concurrent builds the host accepts; 0 falls back to 2
CPUQuota string // systemd CPUQuota per build scope; "" falls back to "200%"
MemoryMax string // systemd MemoryMax per build scope; "" falls back to "2G"
TimeoutSecs int // hard per-build timeout in seconds; 0 falls back to 1800
DenyNets []string // extra CIDRs to deny at systemd-run IPAddressDeny level
}
// CooldServiceUnitWithScheduler is like CooldServiceUnit but injects scheduler env
// vars when scheduler is non-nil and builder env vars when builder is non-nil.
// Used for non-central hosts after phase 4.
func CooldServiceUnitWithScheduler(mgmtIP net.IP, namespaces []CooldNamespace, scheduler *SchedulerConfig, builder *BuilderConfig) string {
return cooldServiceUnitInner(mgmtIP, namespaces, scheduler, builder)
}
// CooldServiceUnit renders the coold systemd unit without scheduler or builder
// env (phase-3 first install, before phase 5 rewrites the unit to inject
// scheduler settings).
func CooldServiceUnit(mgmtIP net.IP, namespaces []CooldNamespace) string {
return cooldServiceUnitInner(mgmtIP, namespaces, nil, nil)
}
func cooldServiceUnitInner(mgmtIP net.IP, namespaces []CooldNamespace, scheduler *SchedulerConfig, builder *BuilderConfig) string {
// Wants (not Requires) on corrosion: if corrosion crashes/restarts we want
// coold to stay up and retry — reconcile_once already backs off for 1s on
// error, so it self-heals once corrosion is back. Requires would cascade
// stop coold and leave it down until someone restarted it.
nsEnv := ""
if len(namespaces) > 0 {
nsEnv = fmt.Sprintf(`Environment=COOLD_NAMESPACES=%s
Environment=COOLD_DNS_ZONE=%s
`, CooldNamespacesEnvValue(namespaces), DefaultCooldDNSZone)
}
// Firewall REST API binds wg0-only (never a public interface) and requires
// a bearer token. Plain HTTP for alpha — TLS material is managed by the
// central Coolify control plane and will be wired in a follow-up.
apiEnv := fmt.Sprintf(`Environment=COOLD_API_BIND=%s:%d
Environment=COOLD_API_TOKEN_FILE=%s
`, mgmtIP, CooldAPIPort, CooldAPITokenPath)
schedulerEnv := ""
if scheduler != nil {
schedulerEnv = fmt.Sprintf(`Environment=COOLD_SCHEDULER_URL=%s
Environment=COOLD_HOST_JWT_PATH=%s
`, scheduler.URL, scheduler.JWTPath)
}
builderEnv := ""
builderPre := ""
if builder != nil {
capacity := builder.Capacity
if capacity <= 0 {
capacity = 2
}
cpuQuota := builder.CPUQuota
if cpuQuota == "" {
cpuQuota = "200%"
}
memoryMax := builder.MemoryMax
if memoryMax == "" {
memoryMax = "2G"
}
timeoutSecs := builder.TimeoutSecs
if timeoutSecs <= 0 {
timeoutSecs = 1800
}
denyNets := strings.Join(builder.DenyNets, ",")
builderEnv = fmt.Sprintf(`Environment=COOLD_BUILDER_ENABLED=true
Environment=COOLD_BUILDER_WORK_DIR=%s
Environment=COOLD_BUILDER_CAPACITY=%d
Environment=COOLD_BUILDER_CPU_QUOTA=%s
Environment=COOLD_BUILDER_MEMORY_MAX=%s
Environment=COOLD_BUILDER_TIMEOUT_SECS=%d
Environment=COOLD_BUILDER_BIN=%s
Environment=COOLD_BUILDER_DENY_NETS=%s
`, BuilderWorkDir, capacity, cpuQuota, memoryMax, timeoutSecs, BuilderBinaryPath, denyNets)
builderPre = fmt.Sprintf("ExecStartPre=/bin/mkdir -p %s\n", BuilderWorkDir)
}
return fmt.Sprintf(`[Unit]
Description=Coolify host agent
Wants=corrosion.service
After=corrosion.service network-online.target podman.socket coolify-mesh-fw.service
[Service]
Environment=COOLD_HOST_MGMT_IP=%s
%s%s%s%s%sExecStart=/usr/local/bin/coold
AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_NET_ADMIN CAP_NET_RAW
Restart=on-failure
RestartSec=2s
[Install]
WantedBy=multi-user.target
`, mgmtIP, nsEnv, apiEnv, schedulerEnv, builderEnv, builderPre)
}
// CooldInstallCommand returns a shell snippet that downloads and installs coold
// from the GitHub release for the given version tag (e.g. "nightly", "v1.2.3").
// Architecture is auto-detected on the remote host via uname -m.
// The version tag is written to /usr/local/bin/coold.version after install.
func CooldInstallCommand(version string) string {
return fmt.Sprintf(`set -e
ARCH_RAW=$(uname -m)
case "$ARCH_RAW" in
x86_64) ARCH=amd64 ;;
aarch64) ARCH=arm64 ;;
*) echo "unsupported arch: $ARCH_RAW" >&2; exit 1 ;;
esac
URL="https://github.com/coollabsio/coold/releases/download/%s/coold-linux-${ARCH}.tar.gz"
DLDIR=$(mktemp -d)
trap 'rm -rf "$DLDIR"' EXIT
curl -fsSL --retry 3 --max-time 120 -o "$DLDIR/coold.tar.gz" "$URL"
tar -xzf "$DLDIR/coold.tar.gz" -C "$DLDIR"
test -f "$DLDIR/coold" || { echo "coold binary not found in tarball" >&2; exit 1; }
install -m 0755 "$DLDIR/coold" /usr/local/bin/coold.tmp
mv /usr/local/bin/coold.tmp /usr/local/bin/coold
echo '%s' > /usr/local/bin/coold.version`, version, version)
}
// EnsureCooldAPITokenCommand returns a shell snippet that creates the
// CooldAPITokenPath file with a random 32-byte hex token if it does not
// already exist. Idempotent: repeated runs preserve the existing token so
// clients already trusting it keep working.
func EnsureCooldAPITokenCommand() string {
return fmt.Sprintf(
`mkdir -p /etc/coolify && `+
`if [ ! -s %[1]s ]; then `+
`openssl rand -hex 32 > %[1]s.tmp && `+
`chmod 0600 %[1]s.tmp && `+
`mv %[1]s.tmp %[1]s; `+
`fi`,
CooldAPITokenPath,
)
}
+151
View File
@@ -0,0 +1,151 @@
package services
import (
"net"
"strings"
"testing"
)
func TestCooldInstallCommand_SubstitutesVersion(t *testing.T) {
for _, version := range []string{"nightly", "v1.2.3"} {
cmd := CooldInstallCommand(version)
if !strings.Contains(cmd, version) {
t.Errorf("version %q not found in install command", version)
}
if !strings.Contains(cmd, "coollabsio/coold/releases/download/"+version) {
t.Errorf("release URL missing version %q in:\n%s", version, cmd)
}
if !strings.Contains(cmd, "/usr/local/bin/coold.version") {
t.Errorf("version marker write missing from install command")
}
}
}
func TestCooldInstallCommand_ArchDetection(t *testing.T) {
cmd := CooldInstallCommand("nightly")
for _, want := range []string{
"x86_64) ARCH=amd64",
"aarch64) ARCH=arm64",
"coold-linux-${ARCH}.tar.gz",
"install -m 0755",
} {
if !strings.Contains(cmd, want) {
t.Errorf("expected %q in install command:\n%s", want, cmd)
}
}
}
func TestCooldServiceUnit_EmbedsMgmtIPAndNamespaces(t *testing.T) {
namespaces := []CooldNamespace{
{Name: "default", Network: "coolify-default-mesh", BridgeGateway: net.ParseIP("10.210.7.1")},
{Name: "alpha", Network: "coolify-alpha-mesh", BridgeGateway: net.ParseIP("10.210.8.1")},
}
got := CooldServiceUnit(net.ParseIP("100.64.0.5"), namespaces)
for _, want := range []string{
"Environment=COOLD_HOST_MGMT_IP=100.64.0.5",
"Environment=COOLD_NAMESPACES=default:coolify-default-mesh:10.210.7.1,alpha:coolify-alpha-mesh:10.210.8.1",
"Environment=COOLD_DNS_ZONE=coolify.internal",
"Environment=COOLD_API_BIND=100.64.0.5:8443",
"Environment=COOLD_API_TOKEN_FILE=/etc/coolify/api-token",
"AmbientCapabilities=CAP_NET_BIND_SERVICE CAP_NET_ADMIN CAP_NET_RAW",
"Wants=corrosion.service",
"After=corrosion.service network-online.target podman.socket",
"ExecStart=/usr/local/bin/coold",
} {
if !strings.Contains(got, want) {
t.Errorf("unit missing %q:\n%s", want, got)
}
}
}
func TestCooldServiceUnit_EmptyNamespacesSkipsNamespaceEnv(t *testing.T) {
got := CooldServiceUnit(net.ParseIP("100.64.0.5"), nil)
if strings.Contains(got, "COOLD_NAMESPACES") {
t.Errorf("expected no namespace env when nil, got:\n%s", got)
}
if strings.Contains(got, "COOLD_DNS_ZONE") {
t.Errorf("expected no DNS zone env when nil, got:\n%s", got)
}
if !strings.Contains(got, "Environment=COOLD_HOST_MGMT_IP=100.64.0.5") {
t.Errorf("expected mgmt IP env, got:\n%s", got)
}
}
func TestCooldServiceUnit_EmitsBuilderEnvWhenConfigured(t *testing.T) {
builder := &BuilderConfig{
Capacity: 4,
CPUQuota: "400%",
MemoryMax: "4G",
TimeoutSecs: 900,
DenyNets: []string{"100.64.0.0/16", "10.210.0.0/16"},
}
got := CooldServiceUnitWithScheduler(
net.ParseIP("100.64.0.5"),
nil,
&SchedulerConfig{URL: "http://100.64.0.1:6443", JWTPath: "/etc/coolify/host-jwt"},
builder,
)
for _, want := range []string{
"Environment=COOLD_BUILDER_ENABLED=true",
"Environment=COOLD_BUILDER_CAPACITY=4",
"Environment=COOLD_BUILDER_CPU_QUOTA=400%",
"Environment=COOLD_BUILDER_MEMORY_MAX=4G",
"Environment=COOLD_BUILDER_TIMEOUT_SECS=900",
"Environment=COOLD_BUILDER_DENY_NETS=100.64.0.0/16,10.210.0.0/16",
} {
if !strings.Contains(got, want) {
t.Errorf("unit missing %q:\n%s", want, got)
}
}
}
func TestCooldServiceUnit_BuilderDefaultsWhenZero(t *testing.T) {
builder := &BuilderConfig{} // all zero values
got := CooldServiceUnitWithScheduler(
net.ParseIP("100.64.0.5"),
nil,
&SchedulerConfig{URL: "http://100.64.0.1:6443", JWTPath: "/etc/coolify/host-jwt"},
builder,
)
for _, want := range []string{
"Environment=COOLD_BUILDER_CAPACITY=2",
"Environment=COOLD_BUILDER_CPU_QUOTA=200%",
"Environment=COOLD_BUILDER_MEMORY_MAX=2G",
"Environment=COOLD_BUILDER_TIMEOUT_SECS=1800",
} {
if !strings.Contains(got, want) {
t.Errorf("unit missing default %q:\n%s", want, got)
}
}
}
func TestCooldServiceUnit_OmitsBuilderEnvWhenNil(t *testing.T) {
got := CooldServiceUnitWithScheduler(
net.ParseIP("100.64.0.5"),
nil,
&SchedulerConfig{URL: "http://100.64.0.1:6443", JWTPath: "/etc/coolify/host-jwt"},
nil,
)
if strings.Contains(got, "COOLD_BUILDER_") {
t.Errorf("expected no builder env when nil, got:\n%s", got)
}
}
func TestCooldNamespacesEnvValue_Triples(t *testing.T) {
ns := []CooldNamespace{
{Name: "default", Network: "coolify-default-mesh", BridgeGateway: net.ParseIP("10.210.0.1")},
{Name: "alpha", Network: "coolify-alpha-mesh", BridgeGateway: net.ParseIP("10.220.0.1")},
}
got := CooldNamespacesEnvValue(ns)
want := "default:coolify-default-mesh:10.210.0.1,alpha:coolify-alpha-mesh:10.220.0.1"
if got != want {
t.Errorf("got %q, want %q", got, want)
}
if CooldNamespacesEnvValue(nil) != "" {
t.Errorf("expected empty string for nil slice")
}
}
+125
View File
@@ -0,0 +1,125 @@
// Package services generates configuration for the v5 control-plane
// daemons installed by `coolify init` (corrosion + coold). All functions
// are pure: they emit bytes/strings and do no I/O.
package services
import (
"fmt"
"net"
"sort"
"strings"
)
// CoolifySchemaSQL is the Corrosion schema that coold's sync loop writes to.
//
// Every NOT NULL column MUST have a DEFAULT — corrosion's CR-SQLite backend
// rejects schemas missing defaults with "needs a default value for forward
// schema compatibility". Defaults are never surfaced at runtime because
// coold always provides every column on upsert.
//
// Columns:
// - container_name: globally unique DNS label. coold's embedded resolver
// answers <container_name>.coolify.internal → container_ip. Uniqueness
// is Coolify's responsibility at app-create time.
// - namespace: optional app-scoping key reserved for multi-tenant / per-app
// isolation (e.g. one podman network per namespace). Empty string in
// single-tenant deployments. Opaque DNS-safe string owned by Coolify.
// - state: raw podman container status (running, exited, stopped,
// restarting, paused, created, dead, configured, removing). Liveness.
// - health: podman HEALTHCHECK result. One of:
// "healthy", "unhealthy", "starting", "unknown". "unknown" when the
// container has no HEALTHCHECK declared. Readiness.
const CoolifySchemaSQL = `CREATE TABLE service_endpoints (
container_id TEXT NOT NULL DEFAULT '' PRIMARY KEY,
container_name TEXT NOT NULL DEFAULT '',
namespace TEXT NOT NULL DEFAULT '',
host_mgmt_ip TEXT NOT NULL DEFAULT '',
container_ip TEXT NOT NULL DEFAULT '',
state TEXT NOT NULL DEFAULT '',
health TEXT NOT NULL DEFAULT 'unknown',
updated_at INTEGER NOT NULL DEFAULT 0
);
`
// CorrosionConfigBytes renders /etc/corrosion/config.toml for a single host.
//
// bindAddr is this host's wg0 management IP — gossip is confined to the mesh
// (already encrypted by WireGuard, so plaintext=true is safe).
// peers are the mgmt IPs of all OTHER hosts; they are sorted lexically so the
// output is byte-stable across probe orderings (needed for sha256 drift check).
func CorrosionConfigBytes(bindAddr net.IP, gossipPort, apiPort int, peers []net.IP) []byte {
sorted := make([]string, 0, len(peers))
for _, p := range peers {
if p == nil {
continue
}
sorted = append(sorted, p.String())
}
sort.Strings(sorted)
var b strings.Builder
b.WriteString("# generated by coolify init — do not edit\n")
b.WriteString("[db]\n")
b.WriteString(`path = "/var/lib/corrosion/corrosion.db"` + "\n")
b.WriteString(`schema_paths = ["/etc/corrosion/schemas"]` + "\n")
b.WriteString("\n[gossip]\n")
fmt.Fprintf(&b, "addr = \"%s:%d\"\n", bindAddr, gossipPort)
b.WriteString("bootstrap = [")
for i, p := range sorted {
if i > 0 {
b.WriteString(", ")
}
fmt.Fprintf(&b, "\"%s:%d\"", p, gossipPort)
}
b.WriteString("]\n")
b.WriteString("plaintext = true\n")
b.WriteString("\n[api]\n")
fmt.Fprintf(&b, "addr = \"127.0.0.1:%d\"\n", apiPort)
b.WriteString("\n[admin]\n")
b.WriteString(`path = "/var/run/corrosion/admin.sock"` + "\n")
return []byte(b.String())
}
// CorrosionInstallCommand returns a shell snippet that downloads and installs
// corrosion from the GitHub release for the given version tag.
// Architecture is auto-detected on the remote host via uname -m.
// The version tag is written to /usr/local/bin/corrosion.version after install.
func CorrosionInstallCommand(version string) string {
return fmt.Sprintf(`set -e
ARCH_RAW=$(uname -m)
case "$ARCH_RAW" in
x86_64) ARCH=x86_64-unknown-linux-gnu ;;
aarch64) ARCH=aarch64-unknown-linux-gnu ;;
*) echo "unsupported arch: $ARCH_RAW" >&2; exit 1 ;;
esac
URL="https://github.com/coollabsio/corrosion/releases/download/%s/corrosion-${ARCH}.tar.gz"
DLDIR=$(mktemp -d)
trap 'rm -rf "$DLDIR"' EXIT
curl -fsSL --retry 3 --max-time 120 -o "$DLDIR/corrosion.tar.gz" "$URL"
tar -xzf "$DLDIR/corrosion.tar.gz" -C "$DLDIR"
test -f "$DLDIR/corrosion" || { echo "corrosion binary not found in tarball" >&2; exit 1; }
install -m 0755 "$DLDIR/corrosion" /usr/local/bin/corrosion.tmp
mv /usr/local/bin/corrosion.tmp /usr/local/bin/corrosion
echo '%s' > /usr/local/bin/corrosion.version`, version, version)
}
// CorrosionServiceUnit returns the systemd unit text for corrosion.
// Plain .service (not a template unit); iface is baked into the dependency.
func CorrosionServiceUnit(iface string) string {
return fmt.Sprintf(`[Unit]
Description=Corrosion agent
After=network-online.target wg-quick@%[1]s.service
Wants=network-online.target
Requires=wg-quick@%[1]s.service
[Service]
ExecStart=/usr/local/bin/corrosion agent --config /etc/corrosion/config.toml
Restart=on-failure
RestartSec=2s
StateDirectory=corrosion
WorkingDirectory=/var/lib/corrosion
[Install]
WantedBy=multi-user.target
`, iface)
}
+150
View File
@@ -0,0 +1,150 @@
package services
import (
"crypto/sha256"
"encoding/hex"
"net"
"strings"
"testing"
)
func TestCorrosionInstallCommand_SubstitutesVersion(t *testing.T) {
for _, version := range []string{"nightly", "v1.2.3"} {
cmd := CorrosionInstallCommand(version)
if !strings.Contains(cmd, version) {
t.Errorf("version %q not found in install command", version)
}
if !strings.Contains(cmd, "coollabsio/corrosion/releases/download/"+version) {
t.Errorf("release URL missing version %q in:\n%s", version, cmd)
}
if !strings.Contains(cmd, "/usr/local/bin/corrosion.version") {
t.Errorf("version marker write missing from install command")
}
}
}
func TestCorrosionInstallCommand_ArchDetection(t *testing.T) {
cmd := CorrosionInstallCommand("nightly")
for _, want := range []string{
"x86_64) ARCH=x86_64-unknown-linux-gnu",
"aarch64) ARCH=aarch64-unknown-linux-gnu",
"corrosion-${ARCH}.tar.gz",
"install -m 0755",
} {
if !strings.Contains(cmd, want) {
t.Errorf("expected %q in install command:\n%s", want, cmd)
}
}
}
func TestCorrosionConfigBytes_GoldenThreeHost(t *testing.T) {
self := net.ParseIP("100.64.0.1")
peers := []net.IP{
net.ParseIP("100.64.0.3"),
net.ParseIP("100.64.0.2"), // intentionally unsorted
}
got := CorrosionConfigBytes(self, 8787, 8080, peers)
want := `# generated by coolify init do not edit
[db]
path = "/var/lib/corrosion/corrosion.db"
schema_paths = ["/etc/corrosion/schemas"]
[gossip]
addr = "100.64.0.1:8787"
bootstrap = ["100.64.0.2:8787", "100.64.0.3:8787"]
plaintext = true
[api]
addr = "127.0.0.1:8080"
[admin]
path = "/var/run/corrosion/admin.sock"
`
if string(got) != want {
t.Fatalf("config mismatch.\nWANT:\n%s\nGOT:\n%s", want, got)
}
}
func TestCorrosionConfigBytes_StableHashAcrossOrderings(t *testing.T) {
self := net.ParseIP("100.64.0.1")
peersA := []net.IP{net.ParseIP("100.64.0.2"), net.ParseIP("100.64.0.3")}
peersB := []net.IP{net.ParseIP("100.64.0.3"), net.ParseIP("100.64.0.2")}
a := CorrosionConfigBytes(self, 8787, 8080, peersA)
b := CorrosionConfigBytes(self, 8787, 8080, peersB)
hashA := sha256.Sum256(a)
hashB := sha256.Sum256(b)
if hex.EncodeToString(hashA[:]) != hex.EncodeToString(hashB[:]) {
t.Fatalf("hashes differ across peer orderings (sort broken):\nA=%x\nB=%x", hashA, hashB)
}
}
func TestCorrosionConfigBytes_EmptyPeers(t *testing.T) {
got := string(CorrosionConfigBytes(net.ParseIP("100.64.0.1"), 8787, 8080, nil))
if !strings.Contains(got, `bootstrap = []`) {
t.Fatalf("expected empty bootstrap array, got:\n%s", got)
}
}
func TestCoolifySchema_HasLivenessAndReadinessColumns(t *testing.T) {
for _, want := range []string{
"state TEXT NOT NULL DEFAULT ''",
"health TEXT NOT NULL DEFAULT 'unknown'",
} {
if !strings.Contains(CoolifySchemaSQL, want) {
t.Errorf("schema missing %q:\n%s", want, CoolifySchemaSQL)
}
}
if strings.Contains(CoolifySchemaSQL, "healthy") {
t.Errorf("schema still has removed `healthy` column:\n%s", CoolifySchemaSQL)
}
}
func TestCoolifySchema_HasContainerNameColumn(t *testing.T) {
// container_name is the DNS label coold's resolver queries on. Flat
// scheme: <container_name>.coolify.internal → container_ip. Coolify
// enforces global uniqueness.
want := "container_name TEXT NOT NULL DEFAULT ''"
if !strings.Contains(CoolifySchemaSQL, want) {
t.Errorf("schema missing %q:\n%s", want, CoolifySchemaSQL)
}
}
func TestCoolifySchema_HasNamespaceColumn(t *testing.T) {
// namespace is reserved for future per-app isolation / multi-tenant.
// Empty in single-tenant; populated when Coolify wants app scoping.
want := "namespace TEXT NOT NULL DEFAULT ''"
if !strings.Contains(CoolifySchemaSQL, want) {
t.Errorf("schema missing %q:\n%s", want, CoolifySchemaSQL)
}
}
func TestCoolifySchema_AllNotNullColumnsHaveDefault(t *testing.T) {
// CR-SQLite rejects any NOT NULL column missing a DEFAULT with
// "needs a default value for forward schema compatibility".
for _, line := range strings.Split(CoolifySchemaSQL, "\n") {
trimmed := strings.TrimSpace(line)
if !strings.Contains(trimmed, "NOT NULL") {
continue
}
if !strings.Contains(trimmed, "DEFAULT") {
t.Errorf("line missing DEFAULT (CR-SQLite would reject): %q", trimmed)
}
}
}
func TestCorrosionServiceUnit_ContainsInterface(t *testing.T) {
got := CorrosionServiceUnit("wg0")
for _, want := range []string{
"After=network-online.target wg-quick@wg0.service",
"Requires=wg-quick@wg0.service",
"ExecStart=/usr/local/bin/corrosion agent --config /etc/corrosion/config.toml",
} {
if !strings.Contains(got, want) {
t.Errorf("unit missing %q:\n%s", want, got)
}
}
}
+49
View File
@@ -0,0 +1,49 @@
package services
import (
"crypto/ecdsa"
"crypto/x509"
"encoding/pem"
"fmt"
"time"
"github.com/golang-jwt/jwt/v5"
)
// MintHostJWT creates a 1-year ES256 JWT signed with the EC P-256 private key.
//
// privKeyPEM must be PKCS8 EC PEM (produced by `openssl genpkey -algorithm EC
// -pkeyopt ec_paramgen_curve:P-256`). hostID becomes the `sub` claim; the
// scheduler uses it as the key into its host→stream registry.
//
// caps lists the capabilities this host is authorized to advertise in the
// coold Hello frame. Always includes "coold"; hosts that accept builds also
// carry "builder". The scheduler cross-checks the advertised Hello capability
// set against this claim and rejects streams that try to elevate.
func MintHostJWT(privKeyPEM []byte, hostID string, caps []string) (string, error) {
block, _ := pem.Decode(privKeyPEM)
if block == nil {
return "", fmt.Errorf("no PEM block found in private key")
}
raw, err := x509.ParsePKCS8PrivateKey(block.Bytes)
if err != nil {
return "", fmt.Errorf("parse PKCS8 private key: %w", err)
}
ecKey, ok := raw.(*ecdsa.PrivateKey)
if !ok {
return "", fmt.Errorf("expected EC private key, got %T", raw)
}
if len(caps) == 0 {
caps = []string{"coold"}
}
now := time.Now()
claims := jwt.MapClaims{
"sub": hostID,
"aud": "coold",
"caps": caps,
"iat": now.Unix(),
"exp": now.Add(365 * 24 * time.Hour).Unix(),
}
token := jwt.NewWithClaims(jwt.SigningMethodES256, claims)
return token.SignedString(ecKey)
}
+86
View File
@@ -0,0 +1,86 @@
package services
import "fmt"
// SchedulerGRPCPort is the TCP port scheduler listens on. coold dials this stream
// and carries both coold and builder traffic on the same connection — there
// is no longer a separate listener for builds.
const SchedulerGRPCPort = 6443
// SchedulerJWTPubPath is the on-host path where the scheduler reads the ES256 public key.
const SchedulerJWTPubPath = "/etc/coolify/jwt.pub"
// SchedulerJWTPrivPath is the on-central path for the EC private key (chmod 0600).
const SchedulerJWTPrivPath = "/etc/coolify/jwt.priv"
// HostJWTPath is the on-host path where coold reads its bearer JWT.
const HostJWTPath = "/etc/coolify/host-jwt"
// SchedulerUnixSocketPath is the on-host path of the scheduler's HTTP-over-UDS
// listener. The central-plane caller (Laravel) connects here. Access
// control is filesystem perms — see SchedulerServiceUnit.
const SchedulerUnixSocketPath = "/run/coolify/scheduler.sock"
// SchedulerServiceUnit returns the systemd unit text for scheduler.
//
// grpcBind is "ip:port" for the single gRPC listener (e.g. "100.64.0.1:6443").
// It binds on the central host's wg0 mgmt IP so the listener is unreachable
// outside the mesh.
//
// RuntimeDirectory=coolify creates /run/coolify owned by the scheduler user
// at unit start, which is where the UDS gets bound. Laravel group access
// is configured at deploy time via SCHEDULER_UNIX_SOCKET_GROUP once the
// PHP-FPM group is finalized; until then the socket stays 0600.
func SchedulerServiceUnit(grpcBind, jwtPubPath string) string {
return fmt.Sprintf(`[Unit]
Description=Coolify scheduler
After=network-online.target wg-quick@wg0.service
[Service]
RuntimeDirectory=coolify
RuntimeDirectoryMode=0750
Environment=SCHEDULER_GRPC_BIND=%s
Environment=SCHEDULER_UNIX_SOCKET_PATH=%s
Environment=SCHEDULER_JWT_PUBLIC_KEY_PATH=%s
ExecStart=/usr/local/bin/scheduler
Restart=on-failure
RestartSec=2s
[Install]
WantedBy=multi-user.target
`, grpcBind, SchedulerUnixSocketPath, jwtPubPath)
}
// SchedulerInstallCommand returns a shell snippet that downloads and installs
// scheduler from the GitHub release for the given version tag.
func SchedulerInstallCommand(version string) string {
return fmt.Sprintf(`set -e
ARCH_RAW=$(uname -m)
case "$ARCH_RAW" in
x86_64) ARCH=amd64 ;;
aarch64) ARCH=arm64 ;;
*) echo "unsupported arch: $ARCH_RAW" >&2; exit 1 ;;
esac
URL="https://github.com/coollabsio/coold/releases/download/%s/scheduler-linux-${ARCH}.tar.gz"
DLDIR=$(mktemp -d)
trap 'rm -rf "$DLDIR"' EXIT
curl -fsSL --retry 3 --max-time 120 -o "$DLDIR/scheduler.tar.gz" "$URL"
tar -xzf "$DLDIR/scheduler.tar.gz" -C "$DLDIR"
test -f "$DLDIR/scheduler" || { echo "scheduler binary not found in tarball" >&2; exit 1; }
install -m 0755 "$DLDIR/scheduler" /usr/local/bin/scheduler.tmp
mv /usr/local/bin/scheduler.tmp /usr/local/bin/scheduler
echo '%s' > /usr/local/bin/scheduler.version`, version, version)
}
// EnsureJWTKeypairCommand returns a shell snippet that generates an EC P-256
// keypair in PKCS8 format on the central host (idempotent).
func EnsureJWTKeypairCommand() string {
return `mkdir -p /etc/coolify && ` +
`if [ ! -f ` + SchedulerJWTPrivPath + ` ]; then ` +
`openssl genpkey -algorithm EC -pkeyopt ec_paramgen_curve:P-256 ` +
`-out ` + SchedulerJWTPrivPath + `.tmp 2>&1 && ` +
`chmod 0600 ` + SchedulerJWTPrivPath + `.tmp && ` +
`mv ` + SchedulerJWTPrivPath + `.tmp ` + SchedulerJWTPrivPath + ` && ` +
`openssl pkey -in ` + SchedulerJWTPrivPath + ` -pubout -out ` + SchedulerJWTPubPath + ` 2>&1 && ` +
`chmod 0644 ` + SchedulerJWTPubPath + `; fi`
}
+57
View File
@@ -0,0 +1,57 @@
package services
import (
"strings"
"testing"
)
func TestSchedulerInstallCommand_ContainsNewAssetName(t *testing.T) {
cmd := SchedulerInstallCommand("nightly")
for _, want := range []string{
"scheduler-linux-${ARCH}.tar.gz",
"/usr/local/bin/scheduler",
"nightly",
} {
if !strings.Contains(cmd, want) {
t.Errorf("SchedulerInstallCommand missing %q", want)
}
}
if strings.Contains(cmd, "coolify-scheduler") {
t.Error("SchedulerInstallCommand still contains old name 'coolify-scheduler'")
}
}
func TestSchedulerInstallCommand_VersionTagEmbedded(t *testing.T) {
cmd := SchedulerInstallCommand("v1.2.3")
if !strings.Contains(cmd, "v1.2.3") {
t.Error("SchedulerInstallCommand missing version tag in URL and version file write")
}
}
func TestSchedulerServiceUnit_ExecStartPath(t *testing.T) {
unit := SchedulerServiceUnit("100.64.0.1:6443", SchedulerJWTPubPath)
if !strings.Contains(unit, "ExecStart=/usr/local/bin/scheduler") {
t.Error("SchedulerServiceUnit ExecStart does not point to /usr/local/bin/scheduler")
}
if strings.Contains(unit, "coolify-scheduler") {
t.Error("SchedulerServiceUnit still contains old name 'coolify-scheduler'")
}
if strings.Contains(unit, "BUILDER_GRPC_BIND") {
t.Error("SchedulerServiceUnit still emits SCHEDULER_BUILDER_GRPC_BIND; builder port was removed")
}
if strings.Contains(unit, "SCHEDULER_REDIS_URL") || strings.Contains(unit, "redis") {
t.Error("SchedulerServiceUnit still references Redis; UDS migration should have dropped it")
}
for _, want := range []string{
"SCHEDULER_GRPC_BIND=100.64.0.1:6443",
"SCHEDULER_UNIX_SOCKET_PATH=" + SchedulerUnixSocketPath,
"RuntimeDirectory=coolify",
SchedulerJWTPubPath,
} {
if !strings.Contains(unit, want) {
t.Errorf("SchedulerServiceUnit missing %q", want)
}
}
}
+213
View File
@@ -0,0 +1,213 @@
// Package ssh provides a thin SSH client and parallel fanout helper
// for the coolify init mesh-bootstrap commands.
package ssh
import (
"bytes"
"context"
"fmt"
"net"
"os"
"strconv"
"time"
gossh "golang.org/x/crypto/ssh"
)
// Runner executes a shell command on a remote host and returns its
// stdout, stderr, and exit error. It is an interface so tests can
// inject a fake implementation without opening real SSH connections.
type Runner interface {
Run(ctx context.Context, host, user string, port int, cmd string) (stdout, stderr string, err error)
}
// FileUploader streams a local file to a remote path via a single SSH
// session. Kept separate from Runner so existing Runner mocks stay valid.
type FileUploader interface {
UploadFile(ctx context.Context, host, user string, port int, localPath, remotePath string, mode os.FileMode) error
}
// Client implements Runner using the golang.org/x/crypto/ssh library.
// Keys must be unencrypted PEM files.
// NOTE: host-key verification is intentionally disabled in v1 (alpha).
// This is acceptable for a bootstrap tool in controlled environments
// and should be improved in a future release.
type Client struct {
signer gossh.Signer
timeout time.Duration
}
// NewClient loads the private key at keyPath and returns a Client ready to
// SSH into hosts. If passphrase is non-nil it is used to decrypt the key;
// pass nil for unencrypted keys.
func NewClient(keyPath string, passphrase []byte, timeout time.Duration) (*Client, error) {
raw, err := os.ReadFile(keyPath)
if err != nil {
return nil, fmt.Errorf("read SSH key %q: %w", keyPath, err)
}
var signer gossh.Signer
if len(passphrase) > 0 {
signer, err = gossh.ParsePrivateKeyWithPassphrase(raw, passphrase)
} else {
signer, err = gossh.ParsePrivateKey(raw)
}
if err != nil {
// Give the user an actionable hint when the key is passphrase-protected.
if isPassphraseError(err) {
return nil, fmt.Errorf("SSH key %q is passphrase-protected — use --ssh-passphrase-prompt or set COOLIFY_SSH_PASSPHRASE: %w", keyPath, err)
}
return nil, fmt.Errorf("parse SSH key %q: %w", keyPath, err)
}
return &Client{
signer: signer,
timeout: timeout,
}, nil
}
// isPassphraseError returns true when err is the "passphrase protected" error
// returned by golang.org/x/crypto/ssh.
func isPassphraseError(err error) bool {
if err == nil {
return false
}
msg := err.Error()
return contains(msg, "passphrase") || contains(msg, "encrypted")
}
func contains(s, sub string) bool {
return len(sub) > 0 && len(s) >= len(sub) &&
func() bool {
for i := 0; i <= len(s)-len(sub); i++ {
if s[i:i+len(sub)] == sub {
return true
}
}
return false
}()
}
// dial opens an SSH connection to host:port as user and returns it. Caller
// owns Close(). Shared by Run and UploadFile so host-key/timeout behaviour
// stays identical across commands and file transfers.
func (c *Client) dial(ctx context.Context, host, user string, port int) (*gossh.Client, error) {
cfg := &gossh.ClientConfig{
User: user,
Auth: []gossh.AuthMethod{gossh.PublicKeys(c.signer)},
HostKeyCallback: gossh.InsecureIgnoreHostKey(), //nolint:gosec // alpha v1, documented limitation
Timeout: c.timeout,
}
addr := net.JoinHostPort(host, strconv.Itoa(port))
dialer := &net.Dialer{Timeout: c.timeout}
netConn, err := dialer.DialContext(ctx, "tcp", addr)
if err != nil {
return nil, fmt.Errorf("dial %s: %w", addr, err)
}
sshConn, chans, reqs, err := gossh.NewClientConn(netConn, addr, cfg)
if err != nil {
_ = netConn.Close()
return nil, fmt.Errorf("SSH handshake %s: %w", addr, err)
}
return gossh.NewClient(sshConn, chans, reqs), nil
}
// Run connects to host:port over SSH as user, executes cmd, and returns
// the combined stdout, stderr, and any error. The connection is
// closed when the command finishes or ctx is cancelled.
func (c *Client) Run(ctx context.Context, host, user string, port int, cmd string) (string, string, error) {
conn, err := c.dial(ctx, host, user, port)
if err != nil {
return "", "", err
}
defer conn.Close()
addr := net.JoinHostPort(host, strconv.Itoa(port))
sess, err := conn.NewSession()
if err != nil {
return "", "", fmt.Errorf("SSH new session on %s: %w", addr, err)
}
defer sess.Close()
var stdout, stderr bytes.Buffer
sess.Stdout = &stdout
sess.Stderr = &stderr
if err := sess.Start(cmd); err != nil {
return "", "", fmt.Errorf("SSH start on %s: %w", addr, err)
}
waitDone := make(chan error, 1)
go func() { waitDone <- sess.Wait() }()
select {
case <-ctx.Done():
// Best-effort signal; ignore error since we're already cancelled.
_ = sess.Signal(gossh.SIGTERM)
return stdout.String(), stderr.String(), ctx.Err()
case runErr := <-waitDone:
return stdout.String(), stderr.String(), runErr
}
}
// uploadShellCmd returns the remote command that atomically writes stdin
// to remotePath with the given mode. Exposed as a function so it can be
// unit-tested without opening an SSH connection.
func uploadShellCmd(remotePath string, mode os.FileMode) string {
return fmt.Sprintf(
`set -e; umask 077; mkdir -p "$(dirname %q)"; `+
`cat > %q.tmp.$$ && chmod %o %q.tmp.$$ && mv -f %q.tmp.$$ %q`,
remotePath, remotePath, mode.Perm(), remotePath, remotePath, remotePath)
}
// UploadFile streams localPath to remotePath on host via a single SSH
// session. The write is atomic: data lands in <remote>.tmp.$PID first and
// is renamed on success.
func (c *Client) UploadFile(ctx context.Context, host, user string, port int, localPath, remotePath string, mode os.FileMode) error {
f, err := os.Open(localPath)
if err != nil {
return fmt.Errorf("open %s: %w", localPath, err)
}
defer f.Close()
conn, err := c.dial(ctx, host, user, port)
if err != nil {
return err
}
defer conn.Close()
addr := net.JoinHostPort(host, strconv.Itoa(port))
sess, err := conn.NewSession()
if err != nil {
return fmt.Errorf("SSH new session on %s: %w", addr, err)
}
defer sess.Close()
var stderr bytes.Buffer
sess.Stdin = f
sess.Stderr = &stderr
if err := sess.Start(uploadShellCmd(remotePath, mode)); err != nil {
return fmt.Errorf("SSH upload start on %s: %w", addr, err)
}
waitDone := make(chan error, 1)
go func() { waitDone <- sess.Wait() }()
select {
case <-ctx.Done():
_ = sess.Signal(gossh.SIGTERM)
return ctx.Err()
case runErr := <-waitDone:
if runErr != nil {
return fmt.Errorf("upload %s -> %s: %w (stderr: %s)",
localPath, remotePath, runErr, bytes.TrimSpace(stderr.Bytes()))
}
return nil
}
}
+30
View File
@@ -0,0 +1,30 @@
package ssh
import (
"strings"
"testing"
)
func TestUploadShellCmd_AtomicWrite(t *testing.T) {
got := uploadShellCmd("/usr/local/bin/coold", 0o755)
for _, want := range []string{
`mkdir -p "$(dirname "/usr/local/bin/coold")"`,
`cat > "/usr/local/bin/coold".tmp.$$`,
`chmod 755 "/usr/local/bin/coold".tmp.$$`,
`mv -f "/usr/local/bin/coold".tmp.$$ "/usr/local/bin/coold"`,
`umask 077`,
`set -e`,
} {
if !strings.Contains(got, want) {
t.Errorf("upload cmd missing %q:\nGOT: %s", want, got)
}
}
}
func TestUploadShellCmd_ModeIsOctal(t *testing.T) {
got := uploadShellCmd("/x", 0o644)
if !strings.Contains(got, "chmod 644") {
t.Errorf("expected octal mode 644, got: %s", got)
}
}
+55
View File
@@ -0,0 +1,55 @@
package ssh
import (
"context"
"sync"
)
// ServerResult holds the return value (or error) from running a function
// against a single server.
type ServerResult[T any] struct {
Host string
Result T
Err error
}
// ForEachServer runs fn concurrently on every host, honouring the
// concurrency limit. It always returns a result for every host (even on
// error) and never returns early — callers inspect each ServerResult.Err.
func ForEachServer[T any](
ctx context.Context,
hosts []string,
concurrency int,
fn func(ctx context.Context, host string) (T, error),
) []ServerResult[T] {
if concurrency <= 0 {
concurrency = 1
}
results := make([]ServerResult[T], len(hosts))
sem := make(chan struct{}, concurrency)
var wg sync.WaitGroup
for i, host := range hosts {
wg.Add(1)
go func(idx int, h string) {
defer wg.Done()
// Acquire semaphore slot.
select {
case sem <- struct{}{}:
defer func() { <-sem }()
case <-ctx.Done():
var zero T
results[idx] = ServerResult[T]{Host: h, Result: zero, Err: ctx.Err()}
return
}
res, err := fn(ctx, h)
results[idx] = ServerResult[T]{Host: h, Result: res, Err: err}
}(i, host)
}
wg.Wait()
return results
}
+1 -1
View File
@@ -15,7 +15,7 @@ import (
// Version variables injected by GoReleaser at build time via ldflags
var (
version = "v1.6.0"
version = "v1.6.2"
)
// GitHubAPIURL is the URL for fetching CLI version tags (exported for testing)
+804
View File
@@ -0,0 +1,804 @@
package wireguard
import (
"context"
"fmt"
"net"
"os"
"strings"
"github.com/coollabsio/coolify-cli/internal/services"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// ActionResult pairs a PlannedAction with its execution outcome.
type ActionResult struct {
Action PlannedAction
Err error
}
// VerifyResult holds the post-apply verification for one server.
type VerifyResult struct {
Host string
WireGuardIP net.IP
PeerCount int
Active bool
Err error
}
const aptInstallCmd = `DEBIAN_FRONTEND=noninteractive apt-get update -qq 2>/dev/null && ` +
`DEBIAN_FRONTEND=noninteractive apt-get install -y ` +
`-o Dpkg::Options::="--force-confold" ` +
`wireguard wireguard-tools 2>&1`
const podmanInstallCmd = `DEBIAN_FRONTEND=noninteractive apt-get update -qq 2>/dev/null && ` +
`DEBIAN_FRONTEND=noninteractive apt-get install -y ` +
`-o Dpkg::Options::="--force-confold" ` +
`podman 2>&1`
// enablePodmanSocketCmd ensures /run/podman/podman.sock exists via systemd
// socket activation. The socket is NEVER exposed on TCP — it stays a Unix
// socket on the host so the per-host coold agent can bind-mount it and
// proxy a curated REST API over wg0. See CONTROL_PLANE.md §2 + §12.
const enablePodmanSocketCmd = `systemctl enable --now podman.socket 2>&1`
const enableIPForwardCmd = `sysctl -w net.ipv4.ip_forward=1 && ` +
`mkdir -p /etc/sysctl.d && ` +
`echo 'net.ipv4.ip_forward=1' > /etc/sysctl.d/99-coolify-mesh.conf`
// podmanNetCreateCmd creates a per-namespace Podman bridge network. Idempotent:
// skips if the network already exists. The bridge gateway is MachineIP(subnet)
// (the .1 of the subnet).
//
// --disable-dns prevents netavark from starting aardvark-dns on the bridge
// gateway IP:53 — coold owns that socket for cluster-wide service discovery
// (see CONTROL_PLANE.md §5). Labels mark the network as ours + carry its
// namespace so `podman network inspect` drift checks can assert it.
func podmanNetCreateCmd(name, namespace string, subnet *net.IPNet, gateway net.IP) string {
return fmt.Sprintf(
`podman network exists %s 2>/dev/null && echo "network exists, skipping" || `+
`podman network create --driver bridge --disable-dns `+
`--label io.coolify.managed=true --label io.coolify.namespace=%s `+
`--subnet=%s --gateway=%s %s`,
name, namespace, subnet, gateway, name)
}
// podmanNetRecreateCmd drops and recreates a per-namespace Podman bridge
// network to clear drift (dns_enabled=true, subnet mismatch, missing label).
// Uses `rm -f` to detach any attached containers first.
func podmanNetRecreateCmd(name, namespace string, subnet *net.IPNet, gateway net.IP) string {
return fmt.Sprintf(
`podman network rm -f %s 2>&1 && `+
`podman network create --driver bridge --disable-dns `+
`--label io.coolify.managed=true --label io.coolify.namespace=%s `+
`--subnet=%s --gateway=%s %s`,
name, namespace, subnet, gateway, name)
}
// runStep executes a single shell command on a remote host, appends an
// ActionResult to out, and returns an error if the command failed.
func runStep(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
out *[]ActionResult,
atype ActionType,
namespace, cmd, errFmt string,
) error {
stdout, stderr, err := runner.Run(ctx, host, user, port, cmd)
detail := ""
if err != nil {
detail = firstLine(stderr)
if detail == "" {
detail = firstLine(stdout)
}
if detail == "" {
detail = err.Error()
}
}
*out = append(*out, ActionResult{
Action: PlannedAction{Host: host, Namespace: namespace, Type: atype, Detail: detail},
Err: err,
})
if err != nil {
return fmt.Errorf(errFmt+": %w", err)
}
return nil
}
// ApplyMesh executes the mesh convergence in two phases:
//
// - Phase 1 (per-server, parallel): install WG + Podman, generate keypair,
// enable podman socket + IP forwarding.
// - Re-probe to collect fresh public keys.
// - Phase 2 (per-server, parallel): write WG config, enable/reload service,
// create per-namespace Podman networks, install firewall service.
// - Phase 3 (per-server, parallel, optional): download + enable corrosion/coold.
func ApplyMesh(
ctx context.Context,
runner ssh.Runner,
user string,
port int,
desired *DesiredMesh,
current MeshState,
concurrency int,
) ([]ActionResult, error) {
var results []ActionResult
p1 := ssh.ForEachServer(ctx, desired.Hosts, concurrency,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase1Server(ctx, runner, host, user, port, desired, current)
})
phase1Failed := false
for _, r := range p1 {
results = append(results, r.Result...)
if r.Err != nil {
phase1Failed = true
}
}
if phase1Failed {
return results, fmt.Errorf("phase 1 (install/keygen) failed on one or more servers; aborting")
}
fresh, err := Reconstruct(ctx, runner, desired.Hosts, user, port,
desired.Interface, desired.Namespaces, concurrency)
if err != nil {
return results, fmt.Errorf("re-probe after phase 1: %w", err)
}
mgmtAssignments, _, err := AllocateMgmtIPs(desired.MgmtPool, fresh.AssignedMgmtIPs(), desired.Hosts)
if err != nil {
return results, fmt.Errorf("mgmt IP allocation: %w", err)
}
containerAssignments, _, err := AllocateNamespaced(desired.ContainerPool, desired.ContainerPrefix,
fresh.AssignedContainerSubnets(), desired.Namespaces, desired.Hosts)
if err != nil {
return results, fmt.Errorf("container subnet allocation: %w", err)
}
p2 := ssh.ForEachServer(ctx, desired.Hosts, concurrency,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase2Server(ctx, runner, host, user, port, desired, fresh, mgmtAssignments, containerAssignments)
})
for _, r := range p2 {
results = append(results, r.Result...)
if r.Err != nil {
err = fmt.Errorf("phase 2 failed on one or more servers")
}
}
if desired.InstallCoold && err == nil {
p3 := ssh.ForEachServer(ctx, desired.Hosts, concurrency,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase3Server(ctx, runner, host, user, port,
desired, fresh, mgmtAssignments, containerAssignments)
})
for _, r := range p3 {
results = append(results, r.Result...)
if r.Err != nil {
err = fmt.Errorf("phase 3 failed on one or more servers")
}
}
}
// Phase 4: central-only — install scheduler, generate JWT keypair.
if desired.CentralHost != "" && err == nil {
p4 := ssh.ForEachServer(ctx, []string{desired.CentralHost}, 1,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase4Central(ctx, runner, host, user, port, desired, mgmtAssignments)
})
for _, r := range p4 {
results = append(results, r.Result...)
if r.Err != nil {
err = fmt.Errorf("phase 4 (central scheduler setup) failed: %w", r.Err)
}
}
}
// Phase 5: per non-central host — mint JWT (with caps), update coold unit
// with scheduler env (and builder env when EnableBuilder).
if desired.CentralHost != "" && err == nil {
privKeyPEM, _, keyErr := runner.Run(ctx, desired.CentralHost, user, port,
"cat "+services.SchedulerJWTPrivPath)
if keyErr != nil {
err = fmt.Errorf("read jwt.priv from central %s: %w", desired.CentralHost, keyErr)
} else {
centralMgmtIP := mgmtAssignments[desired.CentralHost]
schedulerURL := fmt.Sprintf("http://%s:%d", centralMgmtIP, services.SchedulerGRPCPort)
// Include central itself: in single-server topology central *is* the coold
// target, and in fleet mode central's own coold still benefits from scheduler
// wiring (uniform dispatch path, no standalone-API exception).
p5 := ssh.ForEachServer(ctx, desired.Hosts, concurrency,
func(ctx context.Context, host string) ([]ActionResult, error) {
return phase5PerHost(ctx, runner, host, user, port,
desired, fresh, mgmtAssignments, containerAssignments,
[]byte(privKeyPEM), schedulerURL)
})
for _, r := range p5 {
results = append(results, r.Result...)
if r.Err != nil {
err = fmt.Errorf("phase 5 failed on one or more servers")
}
}
}
}
return results, err
}
// phase1Server installs WireGuard, generates a keypair, and (if requested)
// installs Podman, enables its socket, and enables IP forwarding.
func phase1Server(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
current MeshState,
) ([]ActionResult, error) {
state, ok := current.Servers[host]
if !ok {
state = &ServerState{Host: host}
}
var out []ActionResult
if !state.Installed {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallWG, "", aptInstallCmd,
fmt.Sprintf("install WireGuard on %s", host)); err != nil {
return out, err
}
}
if !state.KeysExist {
genCmd := `mkdir -p /etc/wireguard && ` +
`wg genkey | tee /etc/wireguard/privatekey | wg pubkey | tee /etc/wireguard/publickey && ` +
`chmod 600 /etc/wireguard/privatekey`
if err := runStep(ctx, runner, host, user, port, &out,
ActionGenKeyPair, "", genCmd,
fmt.Sprintf("generate keypair on %s", host)); err != nil {
return out, err
}
}
if desired.InstallPodman {
if !state.PodmanInstalled {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallPodman, "", podmanInstallCmd,
fmt.Sprintf("install Podman on %s", host)); err != nil {
return out, err
}
}
if !state.PodmanSocketActive {
if err := runStep(ctx, runner, host, user, port, &out,
ActionEnablePodmanSocket, "", enablePodmanSocketCmd,
fmt.Sprintf("enable podman.socket on %s", host)); err != nil {
return out, err
}
}
if !state.IPForwardEnabled {
if err := runStep(ctx, runner, host, user, port, &out,
ActionEnableIPForward, "", enableIPForwardCmd,
fmt.Sprintf("enable IP forwarding on %s", host)); err != nil {
return out, err
}
}
}
return out, nil
}
// phase2Server writes the WireGuard config, enables/reloads the service,
// creates per-namespace Podman bridges, and installs the firewall service.
func phase2Server(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
fresh MeshState,
mgmtAssignments map[string]net.IP,
containerAssignments map[string]map[string]*net.IPNet,
) ([]ActionResult, error) {
var out []ActionResult
mgmtIP := mgmtAssignments[host]
nsSorted := desired.SortedNamespaces()
// Build peer list (everyone except self, skip hosts with no pubkey).
// Each peer's AllowedIPs covers every namespace subnet that peer owns.
var peers []PeerConfig
for _, peer := range desired.Hosts {
if peer == host {
continue
}
ps, ok := fresh.Servers[peer]
if !ok || ps.PublicKey == "" {
continue
}
var subnets []*net.IPNet
for _, ns := range nsSorted {
if sn := containerAssignments[ns][peer]; sn != nil {
subnets = append(subnets, sn)
}
}
peers = append(peers, PeerConfig{
Endpoint: peer,
PublicKey: ps.PublicKey,
MgmtIP: mgmtAssignments[peer],
ContainerSubnets: subnets,
})
}
// Write WG config.
configCmd := WriteConfigCommand(desired.Interface, mgmtIP, desired.ListenPort, peers)
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteConfig, "", configCmd,
fmt.Sprintf("write config on %s", host)); err != nil {
return out, err
}
// Enable or reload wg-quick.
state := fresh.Servers[host]
var serviceCmd string
actionType := ActionEnableService
if state != nil && state.Active {
serviceCmd = fmt.Sprintf(`systemctl restart wg-quick@%s 2>&1 || wg syncconf %s <(wg-quick strip %s) 2>&1`,
desired.Interface, desired.Interface, desired.Interface)
actionType = ActionReloadService
} else {
serviceCmd = fmt.Sprintf(`systemctl enable --now wg-quick@%s 2>&1`, desired.Interface)
}
if err := runStep(ctx, runner, host, user, port, &out,
actionType, "", serviceCmd,
fmt.Sprintf("enable/reload service on %s", host)); err != nil {
return out, err
}
if desired.InstallPodman {
freshState := fresh.Servers[host]
// Per-namespace podman network reconcile.
for _, ns := range nsSorted {
contSubnet := containerAssignments[ns][host]
if contSubnet == nil {
continue
}
netName := PodmanNetworkFor(ns)
gw := MachineIP(contSubnet)
var nss *NamespaceServerState
if freshState != nil {
nss = freshState.Namespaces[ns]
}
if nss == nil || !nss.NetworkExists {
netCmd := podmanNetCreateCmd(netName, ns, contSubnet, gw)
if err := runStep(ctx, runner, host, user, port, &out,
ActionCreatePodmanNet, ns, netCmd,
fmt.Sprintf("create Podman network %s on %s", netName, host)); err != nil {
return out, err
}
continue
}
subnetDrift := nss.ContainerSubnet != nil && nss.ContainerSubnet.String() != contSubnet.String()
if nss.DNSEnabled || subnetDrift || nss.Label != ns {
recreateCmd := podmanNetRecreateCmd(netName, ns, contSubnet, gw)
if err := runStep(ctx, runner, host, user, port, &out,
ActionRecreatePodmanNet, ns, recreateCmd,
fmt.Sprintf("recreate Podman network %s on %s", netName, host)); err != nil {
return out, err
}
}
}
// Firewall service: union of namespace subnets; reinstall when missing,
// default-deny flipped, or unit text drifted (e.g. namespace added).
var subnets []*net.IPNet
for _, ns := range nsSorted {
if sn := containerAssignments[ns][host]; sn != nil {
subnets = append(subnets, sn)
}
}
expectedUnit := FirewallServiceUnit(desired.Interface, desired.SortedNamespaces(), subnets, desired.DefaultDenyContainers)
expectedUnitHash := sha256Hex([]byte(expectedUnit))
unitDrift := freshState != nil && freshState.FirewallUnitSha256 != expectedUnitHash
if freshState == nil || !freshState.FirewallActive ||
freshState.DefaultDenyActive != desired.DefaultDenyContainers ||
unitDrift {
fwCmd := InstallFirewallCommand(desired.Interface, desired.SortedNamespaces(), subnets, desired.DefaultDenyContainers)
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallFirewall, "", fwCmd,
fmt.Sprintf("install firewall service on %s", host)); err != nil {
return out, err
}
}
}
return out, nil
}
// Verify SSHes into each host and checks that WireGuard is active with the
// expected number of peers.
func Verify(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
port int,
iface string,
concurrency int,
) []VerifyResult {
results := ssh.ForEachServer(ctx, hosts, concurrency,
func(ctx context.Context, host string) (VerifyResult, error) {
return verifyHost(ctx, runner, host, user, port, iface, len(hosts)-1)
})
out := make([]VerifyResult, len(results))
for i, r := range results {
if r.Err != nil {
out[i] = VerifyResult{Host: r.Host, Err: r.Err}
} else {
out[i] = r.Result
}
}
return out
}
func verifyHost(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
iface string,
expectedPeers int,
) (VerifyResult, error) {
result := VerifyResult{Host: host}
stdout, _, err := runner.Run(ctx, host, user, port,
fmt.Sprintf(`wg show %s dump 2>/dev/null || true`, iface))
if err != nil {
return result, fmt.Errorf("wg show on %s: %w", host, err)
}
lines := nonEmptyLines(stdout)
if len(lines) == 0 {
result.Err = fmt.Errorf("interface %s not active", iface)
return result, nil
}
result.Active = true
result.PeerCount = len(lines) - 1
stdout2, _, _ := runner.Run(ctx, host, user, port,
fmt.Sprintf(`grep '^Address' /etc/wireguard/%s.conf 2>/dev/null || true`, iface))
if addr := strings.TrimSpace(strings.TrimPrefix(strings.TrimSpace(stdout2), "Address =")); addr != "" {
ip, _, _ := net.ParseCIDR(strings.TrimSpace(addr))
result.WireGuardIP = ip
}
if result.PeerCount < expectedPeers {
result.Err = fmt.Errorf("expected %d peer(s), got %d", expectedPeers, result.PeerCount)
}
return result, nil
}
func firstLine(s string) string {
s = strings.TrimSpace(s)
if i := strings.IndexByte(s, '\n'); i >= 0 {
return s[:i]
}
return s
}
func nonEmptyLines(s string) []string {
var out []string
for _, l := range strings.Split(s, "\n") {
if strings.TrimSpace(l) != "" {
out = append(out, l)
}
}
return out
}
// heredocWrite emits a shell command that atomically writes body to remotePath
// via a single-quoted heredoc. Body is trusted (generated by us).
// chmod runs before mv so the final rename is atomic with the intended mode.
func heredocWrite(remotePath, body, tag string, mode os.FileMode) string {
return fmt.Sprintf(`cat > %[1]s.tmp <<'%[3]s'
%[2]s%[3]s
chmod %[4]o %[1]s.tmp
mv %[1]s.tmp %[1]s`, remotePath, body, tag, mode)
}
// phase4Central installs scheduler, generates the JWT keypair, and enables
// the scheduler systemd service on the central host.
func phase4Central(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
mgmtAssignments map[string]net.IP,
) ([]ActionResult, error) {
var out []ActionResult
// 1. Install scheduler binary.
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallScheduler, "", services.SchedulerInstallCommand(desired.SchedulerVersion),
fmt.Sprintf("install scheduler on %s", host)); err != nil {
return out, err
}
// 2. Generate JWT keypair (idempotent).
if err := runStep(ctx, runner, host, user, port, &out,
ActionGenerateJWTKeypair, "", services.EnsureJWTKeypairCommand(),
fmt.Sprintf("generate JWT keypair on %s", host)); err != nil {
return out, err
}
// 3. Write scheduler unit + enable service.
mgmtIP := mgmtAssignments[host]
grpcBind := fmt.Sprintf("%s:%d", mgmtIP, services.SchedulerGRPCPort)
schedulerUnit := services.SchedulerServiceUnit(grpcBind, services.SchedulerJWTPubPath)
serviceCmd := heredocWrite("/etc/systemd/system/scheduler.service",
schedulerUnit, "COOLIFY_SCHEDULER_UNIT_EOF", 0o644) +
` && systemctl daemon-reload` +
` && systemctl enable scheduler` +
` && systemctl restart scheduler`
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallSchedulerService, "", serviceCmd,
fmt.Sprintf("install scheduler service on %s", host)); err != nil {
return out, err
}
return out, nil
}
// phase5PerHost mints a host JWT, writes it to the host, rewrites the coold
// unit with scheduler env vars, and restarts coold.
func phase5PerHost(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
_ MeshState,
mgmtAssignments map[string]net.IP,
containerAssignments map[string]map[string]*net.IPNet,
privKeyPEM []byte,
schedulerURL string,
) ([]ActionResult, error) {
var out []ActionResult
mgmtIP := mgmtAssignments[host]
if mgmtIP == nil {
return out, fmt.Errorf("no mgmt IP for %s", host)
}
// Mint JWT with sub = wg0 mgmt IP (stable, scheduler-addressable identifier).
// caps claim must match what coold will advertise in its Hello frame —
// the scheduler cross-checks and rejects a stream whose Hello elevates over
// its JWT. Per-host toggle via desired.HasBuilderCap(host).
hostID := mgmtIP.String()
hasBuilder := desired.HasBuilderCap(host)
caps := []string{"coold"}
if hasBuilder {
caps = append(caps, "builder")
}
jwtToken, err := services.MintHostJWT(privKeyPEM, hostID, caps)
if err != nil {
return out, fmt.Errorf("mint JWT for %s: %w", host, err)
}
// 1. Write JWT to /etc/coolify/host-jwt (mode 0600, idempotent).
writeJWTCmd := fmt.Sprintf(
`mkdir -p /etc/coolify && printf '%%s' '%s' > %s.tmp && chmod 0600 %s.tmp && mv %s.tmp %s`,
jwtToken, services.HostJWTPath, services.HostJWTPath, services.HostJWTPath, services.HostJWTPath)
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteHostJWT, "", writeJWTCmd,
fmt.Sprintf("write host JWT on %s", host)); err != nil {
return out, err
}
// 2. Install builder binary + buildah/git (only on builder-capable hosts).
if hasBuilder {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallBuilder, "",
services.BuilderInstallCommand(desired.CooldVersion),
fmt.Sprintf("install builder on %s", host)); err != nil {
return out, err
}
}
// 3. Rewrite coold unit with scheduler env vars (and builder env when
// enabled) + restart.
nsSorted := desired.SortedNamespaces()
nsConfigs := buildNamespaceConfigs(host, nsSorted, containerAssignments)
scheduler := &services.SchedulerConfig{
URL: schedulerURL,
JWTPath: services.HostJWTPath,
}
var builderCfg *services.BuilderConfig
if hasBuilder {
denyNets := []string{}
if desired.MgmtPool != nil {
denyNets = append(denyNets, desired.MgmtPool.String())
}
if desired.ContainerPool != nil {
denyNets = append(denyNets, desired.ContainerPool.String())
}
builderCfg = &services.BuilderConfig{
Capacity: desired.BuilderCapacity,
CPUQuota: desired.BuilderCPUQuota,
MemoryMax: desired.BuilderMemoryMax,
TimeoutSecs: desired.BuilderTimeoutSecs,
DenyNets: denyNets,
}
}
cooldUnit := services.CooldServiceUnitWithScheduler(mgmtIP, nsConfigs, scheduler, builderCfg)
updateCmd := heredocWrite("/etc/systemd/system/coold.service",
cooldUnit, "COOLIFY_COOLD_SCHEDULER_UNIT_EOF", 0o644) +
` && systemctl daemon-reload` +
` && systemctl restart coold`
if err := runStep(ctx, runner, host, user, port, &out,
ActionUpdateCooldSchedulerEnv, "", updateCmd,
fmt.Sprintf("update coold scheduler env on %s", host)); err != nil {
return out, err
}
return out, nil
}
// phase3Server downloads corrosion + coold from GitHub releases, writes their
// configs/unit files, and enables both services.
// Guarded by desired.InstallCoold at the caller.
func phase3Server(
ctx context.Context,
runner ssh.Runner,
host, user string,
port int,
desired *DesiredMesh,
fresh MeshState,
mgmtAssignments map[string]net.IP,
containerAssignments map[string]map[string]*net.IPNet,
) ([]ActionResult, error) {
var out []ActionResult
mgmtIP := mgmtAssignments[host]
if mgmtIP == nil {
return out, fmt.Errorf("no mgmt IP allocated for %s", host)
}
nsSorted := desired.SortedNamespaces()
nsConfigs := buildNamespaceConfigs(host, nsSorted, containerAssignments)
if len(nsConfigs) == 0 {
return out, fmt.Errorf("no namespace subnets allocated for %s", host)
}
freshState := fresh.Servers[host]
// 1. Download + install corrosion if version drifted.
if binaryVersionDrift(desired.CorrosionVersion,
freshState != nil && freshState.CorrosionInstalled,
func() string {
if freshState != nil {
return freshState.CorrosionVersion
}
return ""
}()) {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallCorrosion, "",
services.CorrosionInstallCommand(desired.CorrosionVersion),
fmt.Sprintf("install corrosion on %s", host)); err != nil {
return out, err
}
}
// 2. Download + install coold if version drifted.
if binaryVersionDrift(desired.CooldVersion,
freshState != nil && freshState.CooldInstalled,
func() string {
if freshState != nil {
return freshState.CooldVersion
}
return ""
}()) {
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallCoold, "",
services.CooldInstallCommand(desired.CooldVersion),
fmt.Sprintf("install coold on %s", host)); err != nil {
return out, err
}
}
// 3. Create dirs for corrosion state/config/admin socket.
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteCorrosionConfig, "",
`mkdir -p /etc/corrosion/schemas /var/lib/corrosion /var/run/corrosion`,
fmt.Sprintf("mkdir corrosion dirs on %s", host)); err != nil {
return out, err
}
// 4. Write config.toml.
peers := peerMgmtIPs(host, desired.Hosts, mgmtAssignments)
configBody := string(services.CorrosionConfigBytes(mgmtIP,
desired.CorrosionGossipPort, desired.CorrosionAPIPort, peers))
configCmd := heredocWrite("/etc/corrosion/config.toml", configBody, "COOLIFY_CORROSION_EOF", 0o600)
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteCorrosionConfig, "", configCmd,
fmt.Sprintf("write corrosion config on %s", host)); err != nil {
return out, err
}
// 5. Write schema. When schema content drifts (not first install) the
// CR-SQLite on-disk DB is incompatible — stop corrosion and wipe the DB
// so it re-bootstraps from the new schema. Coold repopulates within ~2s.
expectedSchemaSha := sha256Hex([]byte(services.CoolifySchemaSQL))
schemaDrift := freshState != nil &&
freshState.CorrosionSchemaSha256 != "" &&
freshState.CorrosionSchemaSha256 != expectedSchemaSha
schemaCmd := heredocWrite("/etc/corrosion/schemas/coolify.sql",
services.CoolifySchemaSQL, "COOLIFY_SCHEMA_EOF", 0o600)
if schemaDrift {
schemaCmd = `systemctl stop corrosion 2>/dev/null || true; ` +
`rm -f /var/lib/corrosion/corrosion.db ` +
`/var/lib/corrosion/corrosion.db-shm ` +
`/var/lib/corrosion/corrosion.db-wal && ` +
schemaCmd
}
if err := runStep(ctx, runner, host, user, port, &out,
ActionWriteCorrosionSchema, "", schemaCmd,
fmt.Sprintf("write corrosion schema on %s", host)); err != nil {
return out, err
}
// 6. Write corrosion unit + 7. Write coold unit + 8. daemon-reload + enable.
// Use enable + restart (not enable --now) so an already-active service still
// picks up new unit/config/schema without a separate reload step.
//
// Also ensure the coold API bearer token exists before the unit starts.
// The command is idempotent — reruns keep the existing token so clients
// don't get invalidated on every `apply`.
corrosionUnit := services.CorrosionServiceUnit(desired.Interface)
cooldUnit := services.CooldServiceUnit(mgmtIP, nsConfigs)
serviceCmd := services.EnsureCooldAPITokenCommand() +
" && " +
heredocWrite("/etc/systemd/system/corrosion.service",
corrosionUnit, "COOLIFY_CORROSION_UNIT_EOF", 0o644) +
" && " +
heredocWrite("/etc/systemd/system/coold.service",
cooldUnit, "COOLIFY_COOLD_UNIT_EOF", 0o644) +
` && systemctl daemon-reload` +
` && systemctl enable corrosion coold` +
` && systemctl restart corrosion` +
` && sleep 1` +
` && systemctl restart coold`
if err := runStep(ctx, runner, host, user, port, &out,
ActionInstallCorrosionService, "", serviceCmd,
fmt.Sprintf("install corrosion+coold services on %s", host)); err != nil {
return out, err
}
// Append a trailing coold install result so the rendered table matches
// the planned action list (install-coold-service).
out = append(out, ActionResult{
Action: PlannedAction{
Host: host,
Type: ActionInstallCooldService,
Detail: fmt.Sprintf("coold.service (mgmt=%s, namespaces=%d)", mgmtIP, len(nsConfigs)),
},
})
return out, nil
}
+96
View File
@@ -0,0 +1,96 @@
package wireguard
import (
"net"
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
func TestFirstLine(t *testing.T) {
tests := []struct {
input string
want string
}{
{"", ""},
{"single line", "single line"},
{"first\nsecond\nthird", "first"},
{" spaces \nnext", "spaces "},
{"\nleading newline", "leading newline"},
}
for _, tt := range tests {
assert.Equal(t, tt.want, firstLine(tt.input), "input: %q", tt.input)
}
}
func TestPodmanNetCreateCmd_DisablesDNSAndLabels(t *testing.T) {
_, subnet, _ := net.ParseCIDR("10.210.0.0/24")
gw := net.ParseIP("10.210.0.1")
got := podmanNetCreateCmd("coolify-default-mesh", "default", subnet, gw)
// Must pass --disable-dns so aardvark-dns never binds bridge gateway :53
// (coold owns that socket).
assert.Contains(t, got, "--disable-dns", "create must include --disable-dns")
assert.Contains(t, got, "--subnet=10.210.0.0/24")
assert.Contains(t, got, "--gateway=10.210.0.1")
// Labels identify the network as ours + carry the namespace for drift checks.
assert.Contains(t, got, "--label io.coolify.managed=true")
assert.Contains(t, got, "--label io.coolify.namespace=default")
// Idempotency guard must still be present.
assert.Contains(t, got, "podman network exists coolify-default-mesh")
}
func TestPodmanNetRecreateCmd_DropsAndCreatesWithDisableDNS(t *testing.T) {
_, subnet, _ := net.ParseCIDR("10.220.0.0/24")
gw := net.ParseIP("10.220.0.1")
got := podmanNetRecreateCmd("coolify-alpha-mesh", "alpha", subnet, gw)
assert.Contains(t, got, "podman network rm -f coolify-alpha-mesh")
assert.Contains(t, got, "--disable-dns")
assert.Contains(t, got, "--subnet=10.220.0.0/24")
assert.Contains(t, got, "--label io.coolify.namespace=alpha")
// rm must come before create so the ordering is unambiguous.
rmIdx := strings.Index(got, "rm -f")
createIdx := strings.Index(got, "network create")
assert.True(t, rmIdx >= 0 && createIdx > rmIdx, "rm must precede create")
}
func TestHeredocWrite_EmitsChmodBeforeMv(t *testing.T) {
got := heredocWrite("/etc/corrosion/config.toml", "body", "TAG", 0o600)
assert.Contains(t, got, "cat > /etc/corrosion/config.toml.tmp <<'TAG'")
assert.Contains(t, got, "\nbody")
assert.Contains(t, got, "chmod 600 /etc/corrosion/config.toml.tmp")
assert.Contains(t, got, "mv /etc/corrosion/config.toml.tmp /etc/corrosion/config.toml")
chmodIdx := strings.Index(got, "chmod 600")
mvIdx := strings.Index(got, "mv /etc/corrosion")
assert.True(t, chmodIdx > 0 && mvIdx > chmodIdx,
"chmod must precede mv so final rename is atomic with intended mode")
}
func TestHeredocWrite_DifferentModes(t *testing.T) {
unit := heredocWrite("/etc/systemd/system/x.service", "b", "T", 0o644)
assert.Contains(t, unit, "chmod 644 /etc/systemd/system/x.service.tmp")
secret := heredocWrite("/etc/corrosion/schemas/coolify.sql", "b", "T", 0o600)
assert.Contains(t, secret, "chmod 600 /etc/corrosion/schemas/coolify.sql.tmp")
}
func TestNonEmptyLines(t *testing.T) {
tests := []struct {
input string
want []string
}{
{"", nil},
{"line1\nline2", []string{"line1", "line2"}},
{"line1\n\nline2", []string{"line1", "line2"}},
{" \n \nactual", []string{"actual"}},
{"only", []string{"only"}},
}
for _, tt := range tests {
got := nonEmptyLines(tt.input)
assert.Equal(t, tt.want, got, "input: %q", tt.input)
}
}
+95
View File
@@ -0,0 +1,95 @@
package wireguard
import (
"fmt"
"net"
"strings"
)
// PeerConfig holds the information needed to write a [Peer] block.
type PeerConfig struct {
// Endpoint is the SSH/public IP of the peer (used as WG endpoint).
Endpoint string
// PublicKey is the peer's WireGuard public key.
PublicKey string
// MgmtIP is the peer's /32 wg0 management IP.
MgmtIP net.IP
// ContainerSubnets is the peer's per-namespace container bridge subnets,
// sorted by namespace name for stable output. All of them — along with
// MgmtIP/32 — are listed in AllowedIPs so every namespace's cross-host
// traffic can route via the tunnel.
ContainerSubnets []*net.IPNet
}
// allowedIPsLine joins the mgmt /32 and every container subnet into a single
// comma-separated AllowedIPs value.
func allowedIPsLine(p PeerConfig) string {
parts := make([]string, 0, 1+len(p.ContainerSubnets))
parts = append(parts, fmt.Sprintf("%s/32", p.MgmtIP))
for _, sn := range p.ContainerSubnets {
if sn == nil {
continue
}
parts = append(parts, sn.String())
}
return strings.Join(parts, ", ")
}
// RenderConfig returns the content of wg0.conf for one host.
//
// The host's own Address is the management IP /32 (e.g. 100.64.0.0/32). It
// lives in a separate pool from the container subnets, so the Podman bridges
// can own their per-host /24s without conflict.
//
// The literal string __PRIVKEY__ is used as a placeholder; callers must
// substitute the actual key before (or during) writing to disk.
func RenderConfig(mgmtIP net.IP, listenPort int, peers []PeerConfig) string {
var b strings.Builder
fmt.Fprintf(&b, "[Interface]\n")
fmt.Fprintf(&b, "Address = %s/32\n", mgmtIP)
fmt.Fprintf(&b, "ListenPort = %d\n", listenPort)
fmt.Fprintf(&b, "PrivateKey = __PRIVKEY__\n")
for _, p := range peers {
fmt.Fprintf(&b, "\n[Peer]\n")
fmt.Fprintf(&b, "# %s\n", p.Endpoint)
fmt.Fprintf(&b, "PublicKey = %s\n", p.PublicKey)
fmt.Fprintf(&b, "AllowedIPs = %s\n", allowedIPsLine(p))
fmt.Fprintf(&b, "Endpoint = %s:%d\n", p.Endpoint, listenPort)
fmt.Fprintf(&b, "PersistentKeepalive = 25\n")
}
return b.String()
}
// WriteConfigCommand returns the shell command that atomically writes
// /etc/wireguard/<iface>.conf on the remote host.
//
// The private key is read from /etc/wireguard/privatekey on the remote so it
// never traverses SSH. The config is written to a .tmp file first and then
// moved into place so a killed session cannot leave a torn config.
func WriteConfigCommand(iface string, mgmtIP net.IP, listenPort int, peers []PeerConfig) string {
var b strings.Builder
b.WriteString(`PRIVKEY=$(cat /etc/wireguard/privatekey) && `)
b.WriteString(`mkdir -p /etc/wireguard && `)
b.WriteString(`{ echo "[Interface]"; `)
b.WriteString(fmt.Sprintf(`echo "Address = %s/32"; `, mgmtIP))
b.WriteString(fmt.Sprintf(`echo "ListenPort = %d"; `, listenPort))
b.WriteString(`echo "PrivateKey = $PRIVKEY"; `)
for _, p := range peers {
b.WriteString(`echo ""; `)
b.WriteString(`echo "[Peer]"; `)
b.WriteString(fmt.Sprintf(`echo "# %s"; `, p.Endpoint))
b.WriteString(fmt.Sprintf(`echo "PublicKey = %s"; `, p.PublicKey))
b.WriteString(fmt.Sprintf(`echo "AllowedIPs = %s"; `, allowedIPsLine(p)))
b.WriteString(fmt.Sprintf(`echo "Endpoint = %s:%d"; `, p.Endpoint, listenPort))
b.WriteString(`echo "PersistentKeepalive = 25"; `)
}
b.WriteString(fmt.Sprintf(`} > /etc/wireguard/%s.conf.tmp && `, iface))
b.WriteString(fmt.Sprintf(`chmod 600 /etc/wireguard/%s.conf.tmp && `, iface))
b.WriteString(fmt.Sprintf(`mv /etc/wireguard/%s.conf.tmp /etc/wireguard/%s.conf`, iface, iface))
return b.String()
}
+85
View File
@@ -0,0 +1,85 @@
package wireguard
import (
"net"
"strings"
"testing"
"github.com/stretchr/testify/assert"
)
func TestRenderConfig_NoPeers(t *testing.T) {
mgmtIP := net.ParseIP("100.64.0.1").To4()
got := RenderConfig(mgmtIP, 51820, nil)
assert.Contains(t, got, "[Interface]")
assert.Contains(t, got, "Address = 100.64.0.1/32")
assert.Contains(t, got, "ListenPort = 51820")
assert.Contains(t, got, "PrivateKey = __PRIVKEY__")
assert.NotContains(t, got, "[Peer]")
}
func TestRenderConfig_WithPeers(t *testing.T) {
mgmtIP := net.ParseIP("100.64.0.1").To4()
peers := []PeerConfig{
{
Endpoint: "203.0.113.11",
PublicKey: "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=",
MgmtIP: net.ParseIP("100.64.0.1").To4(),
ContainerSubnets: []*net.IPNet{mustParseCIDR("10.210.1.0/24")},
},
{
Endpoint: "203.0.113.12",
PublicKey: "CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=",
MgmtIP: net.ParseIP("100.64.0.2").To4(),
ContainerSubnets: []*net.IPNet{
mustParseCIDR("10.210.2.0/24"),
mustParseCIDR("10.220.2.0/24"),
},
},
}
got := RenderConfig(mgmtIP, 51820, peers)
assert.Equal(t, 2, strings.Count(got, "[Peer]"))
assert.Contains(t, got, "PublicKey = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=")
assert.Contains(t, got, "Endpoint = 203.0.113.11:51820")
assert.Contains(t, got, "AllowedIPs = 100.64.0.1/32, 10.210.1.0/24")
assert.Contains(t, got, "PersistentKeepalive = 25")
assert.Contains(t, got, "PublicKey = CCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCCC=")
// Multi-namespace peer lists every namespace subnet after the mgmt /32.
assert.Contains(t, got, "AllowedIPs = 100.64.0.2/32, 10.210.2.0/24, 10.220.2.0/24")
}
func TestWriteConfigCommand_ContainsPrivkeyRead(t *testing.T) {
mgmtIP := net.ParseIP("100.64.0.1").To4()
peers := []PeerConfig{
{
Endpoint: "203.0.113.11",
PublicKey: "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBB=",
MgmtIP: net.ParseIP("100.64.0.1").To4(),
ContainerSubnets: []*net.IPNet{mustParseCIDR("10.210.1.0/24")},
},
}
cmd := WriteConfigCommand("wg0", mgmtIP, 51820, peers)
assert.Contains(t, cmd, "cat /etc/wireguard/privatekey")
assert.Contains(t, cmd, "$PRIVKEY")
assert.Contains(t, cmd, ".conf.tmp")
assert.Contains(t, cmd, "chmod 600 /etc/wireguard/wg0.conf.tmp")
assert.Contains(t, cmd, "mv /etc/wireguard/wg0.conf.tmp /etc/wireguard/wg0.conf")
// Host Address is the mgmt /32 — outside the container pool.
assert.Contains(t, cmd, "Address = 100.64.0.1/32")
// Peer AllowedIPs lists peer mgmt /32 + peer container /24.
assert.Contains(t, cmd, "100.64.0.1/32, 10.210.1.0/24")
}
func TestWriteConfigCommand_NoPeers(t *testing.T) {
mgmtIP := net.ParseIP("100.64.0.1").To4()
cmd := WriteConfigCommand("wg0", mgmtIP, 51820, nil)
assert.Contains(t, cmd, "PRIVKEY")
assert.Contains(t, cmd, "51820")
assert.NotContains(t, cmd, "[Peer]")
}
+224
View File
@@ -0,0 +1,224 @@
package wireguard
import (
"fmt"
"net"
"sort"
"strings"
)
const firewallUnitPath = "/etc/systemd/system/coolify-mesh-fw.service"
const firewallServiceName = "coolify-mesh-fw.service"
// AllowRulesPath is the on-disk location where coold snapshots the
// COOLIFY-ALLOW chain as an iptables-restore fragment on every rule mutate.
// The firewall unit reads this file at boot/restart to repopulate the chain
// after the kernel tables are cleared.
const AllowRulesPath = "/etc/coolify/allow.rules"
// BridgeTableName is the nftables table name owned by the CLI scaffold.
const BridgeTableName = "coolify_bridge"
// BridgeAllowRulesPath is where coold writes the nft bridge-family allow
// fragment. The firewall unit replays it at start/restart.
const BridgeAllowRulesPath = "/etc/coolify/allow.nft"
// BridgeScaffoldPath is where the CLI writes the static bridge chain
// scaffold (forward + coolify_intra chains). Applied at unit start/restart.
const BridgeScaffoldPath = "/etc/coolify/bridge-fw.nft"
// FirewallServiceUnit returns the systemd unit text that installs the
// idempotent iptables rules required for cross-host container traffic over WG.
//
// containerSubnets is the per-namespace list of subnets on this host (one
// /<prefix> per namespace). Rules are emitted once per subnet so every
// namespace is covered by the same host-global COOLIFY-INTRA / COOLIFY-ALLOW
// chain pair.
//
// Two modes:
//
// - defaultDeny == false (mode A, blanket allow): installs FORWARD ACCEPT
// rules for every subnet. Tears down any default-deny scaffold left over
// from a prior --default-deny run.
//
// - defaultDeny == true (mode B, default deny): removes blanket ACCEPT,
// installs COOLIFY-INTRA + COOLIFY-ALLOW chains, and adds FORWARD jumps
// so any traffic with a container subnet as source OR destination
// traverses the deny chain. Conntrack ESTABLISHED/RELATED is accepted
// early so reply traffic for already-allowed flows bypasses the chain.
//
// Note: default-deny only enforces CROSS-HOST container traffic. Same-
// namespace intra-host traffic stays at L2 and bypasses iptables; cross-
// namespace intra-host traffic is blocked at L2 anyway because each namespace
// has its own podman bridge.
//
// Both modes preserve the POSTROUTING RETURN rule that prevents podman's
// MASQUERADE from rewriting container egress to wg0's IP.
func FirewallServiceUnit(iface string, namespaces []string, containerSubnets []*net.IPNet, defaultDeny bool) string {
var b strings.Builder
fmt.Fprintf(&b, `[Unit]
Description=Coolify mesh firewall rules
After=wg-quick@%[1]s.service network-online.target
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
`, iface)
// POSTROUTING RETURN — needed in both modes, once per subnet.
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -t nat -C POSTROUTING -s %[2]s -o %[1]s -j RETURN 2>/dev/null || /usr/sbin/iptables -t nat -I POSTROUTING -s %[2]s -o %[1]s -j RETURN"
`, iface, sn.String())
}
if !defaultDeny {
fmt.Fprint(&b, `# Tear down default-deny scaffold from prior --default-deny run.
`)
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -d %[1]s -j COOLIFY-INTRA 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -s %[1]s -j COOLIFY-INTRA 2>/dev/null || true"
`, sn.String())
}
fmt.Fprintf(&b, `ExecStart=/bin/sh -c "/usr/sbin/iptables -F COOLIFY-INTRA 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -X COOLIFY-INTRA 2>/dev/null || true"
# COOLIFY-ALLOW intentionally NOT removed preserves runtime allows for re-enable.
# Remove bridge-family scaffold (permissive mode) before installing blanket ACCEPT.
ExecStart=/bin/sh -c "nft delete table bridge %[1]s 2>/dev/null || true"
# Blanket ACCEPT allow all traffic to/from every namespace's container subnet.
`, BridgeTableName)
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -s %[1]s -j ACCEPT 2>/dev/null || /usr/sbin/iptables -I FORWARD -s %[1]s -j ACCEPT"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -d %[1]s -j ACCEPT 2>/dev/null || /usr/sbin/iptables -I FORWARD -d %[1]s -j ACCEPT"
`, sn.String())
}
} else {
fmt.Fprint(&b, `# Remove blanket ACCEPT from prior mode-A run.
`)
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -s %[1]s -j ACCEPT 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -d %[1]s -j ACCEPT 2>/dev/null || true"
`, sn.String())
}
fmt.Fprintf(&b, `
# Create chains (idempotent).
ExecStart=/bin/sh -c "/usr/sbin/iptables -N COOLIFY-ALLOW 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -N COOLIFY-INTRA 2>/dev/null || true"
# Flush COOLIFY-INTRA so order is deterministic on every restart.
ExecStart=/usr/sbin/iptables -F COOLIFY-INTRA
ExecStart=/usr/sbin/iptables -A COOLIFY-INTRA -j COOLIFY-ALLOW
ExecStart=/usr/sbin/iptables -A COOLIFY-INTRA -j DROP
# Repopulate COOLIFY-ALLOW from coold's canonical snapshot. File is rewritten
# by coold on every rule mutate, so it is the source of truth across reboots
# and service restarts. Flush first because 'iptables-restore --noflush'
# leaves existing chain contents in place and would otherwise duplicate every
# rule on re-run.
ExecStart=/bin/sh -c "[ -s %[1]s ] && /usr/sbin/iptables -F COOLIFY-ALLOW && /usr/sbin/iptables-restore --noflush < %[1]s || true"
# Conntrack early-accept at top of FORWARD (idempotent).
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT 2>/dev/null || /usr/sbin/iptables -I FORWARD 1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT"
# Top-level FORWARD jumps for every namespace's subnet (both directions).
`, AllowRulesPath)
for _, sn := range containerSubnets {
fmt.Fprintf(&b,
`ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -d %[1]s -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -d %[1]s -j COOLIFY-INTRA"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -s %[1]s -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -s %[1]s -j COOLIFY-INTRA"
`, sn.String())
}
fmt.Fprintf(&b, `# Bridge-family nft scaffold intra-namespace default-deny.
ExecStart=/bin/sh -c "nft list table bridge %[1]s >/dev/null 2>&1 || nft add table bridge %[1]s"
ExecStart=/bin/sh -c "nft add chain bridge %[1]s coolify_allow '{ }' 2>/dev/null || true"
ExecStart=/bin/sh -c "nft delete chain bridge %[1]s forward 2>/dev/null || true"
ExecStart=/bin/sh -c "nft delete chain bridge %[1]s coolify_intra 2>/dev/null || true"
ExecStart=/bin/sh -c "nft -f %[2]s"
ExecStart=/bin/sh -c "[ -s %[3]s ] && nft -f %[3]s || true"
`, BridgeTableName, BridgeScaffoldPath, BridgeAllowRulesPath)
}
_ = namespaces // kept on signature for future per-namespace dispatch; scaffold now keys off subnets (bridge ifnames exceed IFNAMSIZ=16).
b.WriteString(`
[Install]
WantedBy=multi-user.target
`)
return b.String()
}
// InstallFirewallCommand returns a shell command that atomically writes the
// service unit, reloads systemd, and enables/starts (or restarts) it.
func InstallFirewallCommand(iface string, namespaces []string, containerSubnets []*net.IPNet, defaultDeny bool) string {
unit := FirewallServiceUnit(iface, namespaces, containerSubnets, defaultDeny)
var b strings.Builder
b.WriteString(fmt.Sprintf(`cat > %s.tmp <<'COOLIFY_FW_EOF'
%sCOOLIFY_FW_EOF
mv %s.tmp %s && `, firewallUnitPath, unit, firewallUnitPath, firewallUnitPath))
// /etc/coolify may not exist yet on a fresh host (coold's token-gen is the
// only other writer and runs later in phase 2). Create it before the
// bridge scaffold write so `cat > .tmp` doesn't ENOENT.
b.WriteString("mkdir -p /etc/coolify && ")
if defaultDeny {
scaffold := renderBridgeScaffold(containerSubnets)
b.WriteString(fmt.Sprintf(`cat > %s.tmp <<'COOLIFY_BR_EOF'
%sCOOLIFY_BR_EOF
mv %s.tmp %s && `, BridgeScaffoldPath, scaffold, BridgeScaffoldPath, BridgeScaffoldPath))
} else {
b.WriteString(fmt.Sprintf("rm -f %s && ", BridgeScaffoldPath))
}
b.WriteString(`systemctl daemon-reload && `)
// Use restart so a flag flip re-runs ExecStart= even if the unit is
// already active (Type=oneshot with RemainAfterExit=yes blocks plain
// "start" from running again).
b.WriteString(fmt.Sprintf(`systemctl enable %s && systemctl restart %s`, firewallServiceName, firewallServiceName))
return b.String()
}
// renderBridgeScaffold builds the nft file-format content for the bridge
// scaffold. Uses `add table` + `add chain` (idempotent) then `flush chain` +
// `add rule` so forward and coolify_intra are atomically replaced on every
// apply without touching coolify_allow (owned by coold).
//
// Dispatch to coolify_intra is keyed on container subnet (ip saddr / ip daddr)
// rather than bridge interface name — podman auto-names bridges (e.g. podman2)
// and the CLI-level "coolify-<ns>-mesh" network name exceeds Linux IFNAMSIZ=16
// when the kernel sees it anyway. Subnets are disjoint per namespace so this
// still confines deny to coolify-managed traffic and leaves foreign bridges
// untouched.
func renderBridgeScaffold(subnets []*net.IPNet) string {
sortedSubnets := make([]string, 0, len(subnets))
for _, sn := range subnets {
sortedSubnets = append(sortedSubnets, sn.String())
}
sort.Strings(sortedSubnets)
subnetSet := "{ " + strings.Join(sortedSubnets, ", ") + " }"
var b strings.Builder
b.WriteString("# Managed by coolify init — do not edit manually.\n")
b.WriteString("# Replaces forward + coolify_intra chains on restart; never touches coolify_allow.\n")
// Order matters: chains referenced by `jump` must exist before the rule
// is added (nft validates the target at add-rule time). coolify_allow is
// created by the preceding ExecStart line; declare coolify_intra here
// before the forward-chain rules jump to it.
fmt.Fprintf(&b, "add table bridge %s\n", BridgeTableName)
fmt.Fprintf(&b, "add chain bridge %s coolify_intra\n", BridgeTableName)
fmt.Fprintf(&b, "flush chain bridge %s coolify_intra\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s coolify_intra jump coolify_allow\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s coolify_intra drop\n", BridgeTableName)
fmt.Fprintf(&b, "add chain bridge %s forward { type filter hook forward priority -200; policy accept; }\n", BridgeTableName)
fmt.Fprintf(&b, "flush chain bridge %s forward\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s forward meta protocol != ip accept\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s forward ct state established,related accept\n", BridgeTableName)
fmt.Fprintf(&b, "add rule bridge %s forward ip saddr %s jump coolify_intra\n", BridgeTableName, subnetSet)
fmt.Fprintf(&b, "add rule bridge %s forward ip daddr %s jump coolify_intra\n", BridgeTableName, subnetSet)
return b.String()
}
+228
View File
@@ -0,0 +1,228 @@
package wireguard
import (
"net"
"os"
"path/filepath"
"strings"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestFirewallServiceUnit_DefaultDenyOff(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, false)
assert.Contains(t, got, "[Unit]")
assert.Contains(t, got, "Description=Coolify mesh firewall rules")
assert.Contains(t, got, "After=wg-quick@wg0.service")
assert.Contains(t, got, "Type=oneshot")
assert.Contains(t, got, "RemainAfterExit=yes")
// Blanket allow rules present.
assert.Contains(t, got, "/usr/sbin/iptables -I FORWARD -s 10.210.0.0/24 -j ACCEPT")
assert.Contains(t, got, "/usr/sbin/iptables -I FORWARD -d 10.210.0.0/24 -j ACCEPT")
// Teardown of default-deny scaffold present (idempotent cleanup).
assert.Contains(t, got, "/usr/sbin/iptables -X COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -D FORWARD -s 10.210.0.0/24 -j COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -D FORWARD -d 10.210.0.0/24 -j COOLIFY-INTRA")
// Default-deny chain rules MUST NOT be present.
assert.NotContains(t, got, "-A COOLIFY-INTRA -j COOLIFY-ALLOW")
assert.NotContains(t, got, "-A COOLIFY-INTRA -j DROP")
// COOLIFY-ALLOW chain is never destroyed.
assert.NotContains(t, got, "-X COOLIFY-ALLOW")
// POSTROUTING RETURN preserved (needed in both modes).
assert.Contains(t, got, "/usr/sbin/iptables -t nat -I POSTROUTING -s 10.210.0.0/24 -o wg0 -j RETURN")
assert.Contains(t, got, "WantedBy=multi-user.target")
}
func TestFirewallServiceUnit_DefaultDenyOn(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, true)
// Chains created.
assert.Contains(t, got, "/usr/sbin/iptables -N COOLIFY-ALLOW")
assert.Contains(t, got, "/usr/sbin/iptables -N COOLIFY-INTRA")
// Conntrack early-accept.
assert.Contains(t, got, "-m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT")
// COOLIFY-INTRA flush + jump to ALLOW + DROP.
assert.Contains(t, got, "/usr/sbin/iptables -F COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -A COOLIFY-INTRA -j COOLIFY-ALLOW")
assert.Contains(t, got, "/usr/sbin/iptables -A COOLIFY-INTRA -j DROP")
// FORWARD jumps for both directions of container subnet traffic.
assert.Contains(t, got, "/usr/sbin/iptables -A FORWARD -d 10.210.0.0/24 -j COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -A FORWARD -s 10.210.0.0/24 -j COOLIFY-INTRA")
// Teardown of blanket ACCEPT from prior mode-A run.
assert.Contains(t, got, "/usr/sbin/iptables -D FORWARD -s 10.210.0.0/24 -j ACCEPT")
assert.Contains(t, got, "/usr/sbin/iptables -D FORWARD -d 10.210.0.0/24 -j ACCEPT")
// Blanket ACCEPT rules MUST NOT be installed in default-deny mode.
assert.NotContains(t, got, "/usr/sbin/iptables -I FORWARD -s 10.210.0.0/24 -j ACCEPT")
assert.NotContains(t, got, "/usr/sbin/iptables -I FORWARD -d 10.210.0.0/24 -j ACCEPT")
// COOLIFY-ALLOW chain is never destroyed. It IS flushed-and-restored at
// boot/restart from the canonical snapshot — that's how runtime allow
// rules survive reboots.
assert.NotContains(t, got, "-X COOLIFY-ALLOW")
assert.Contains(t, got, "/usr/sbin/iptables -F COOLIFY-ALLOW")
assert.Contains(t, got, "/usr/sbin/iptables-restore --noflush < "+AllowRulesPath)
assert.Contains(t, got, "[ -s "+AllowRulesPath+" ]")
// POSTROUTING RETURN preserved.
assert.Contains(t, got, "/usr/sbin/iptables -t nat -I POSTROUTING -s 10.210.0.0/24 -o wg0 -j RETURN")
}
func TestFirewallServiceUnit_DefaultDenyOff_NoAllowRestore(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, false)
// Blanket-allow mode bypasses COOLIFY-ALLOW entirely — no restore.
assert.NotContains(t, got, "iptables-restore")
assert.NotContains(t, got, AllowRulesPath)
}
func TestInstallFirewallCommand_AtomicWriteAndEnable(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.5.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, false)
// Atomic write via .tmp + mv.
assert.Contains(t, cmd, "/etc/systemd/system/coolify-mesh-fw.service.tmp")
assert.Contains(t, cmd, "mv /etc/systemd/system/coolify-mesh-fw.service.tmp /etc/systemd/system/coolify-mesh-fw.service")
// systemd reload + enable + restart (so a flag flip re-runs ExecStart).
assert.Contains(t, cmd, "systemctl daemon-reload")
assert.Contains(t, cmd, "systemctl enable coolify-mesh-fw.service")
assert.Contains(t, cmd, "systemctl restart coolify-mesh-fw.service")
// Subnet baked into command.
assert.Contains(t, cmd, "10.210.5.0/24")
}
func TestInstallFirewallCommand_DefaultDenyEmbedded(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.5.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, true)
// Default-deny variant of unit must be embedded in the heredoc.
assert.Contains(t, cmd, "-A COOLIFY-INTRA -j DROP")
}
func TestFirewallServiceUnit_BridgeScaffold_DefaultDenyOn(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, true)
assert.Contains(t, got, "nft list table bridge coolify_bridge")
assert.Contains(t, got, "nft add table bridge coolify_bridge")
assert.Contains(t, got, "nft add chain bridge coolify_bridge coolify_allow")
assert.Contains(t, got, "nft delete chain bridge coolify_bridge forward")
assert.Contains(t, got, "nft delete chain bridge coolify_bridge coolify_intra")
assert.Contains(t, got, "nft -f /etc/coolify/bridge-fw.nft")
assert.Contains(t, got, "/etc/coolify/allow.nft")
assert.NotContains(t, got, "-X COOLIFY-ALLOW")
}
func TestFirewallServiceUnit_BridgeScaffold_DefaultDenyOff(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, false)
assert.Contains(t, got, "nft delete table bridge coolify_bridge")
assert.NotContains(t, got, "nft add table bridge coolify_bridge")
assert.NotContains(t, got, "nft -f /etc/coolify/bridge-fw.nft")
}
func TestFirewallServiceUnit_BridgeSetStableSortedSubnets(t *testing.T) {
// Pass subnets in reverse-sorted order — scaffold must sort them.
subnets := []*net.IPNet{
mustParseCIDR("10.220.1.0/24"),
mustParseCIDR("10.210.1.0/24"),
}
// renderBridgeScaffold is embedded in InstallFirewallCommand, so check that.
cmd := InstallFirewallCommand("wg0", []string{"alpha", "default"}, subnets, true)
// Assert the nft scaffold set contains both, sorted:
// `ip saddr { 10.210.1.0/24, 10.220.1.0/24 } jump coolify_intra`
assert.Contains(t, cmd, "ip saddr { 10.210.1.0/24, 10.220.1.0/24 } jump coolify_intra")
assert.Contains(t, cmd, "ip daddr { 10.210.1.0/24, 10.220.1.0/24 } jump coolify_intra")
}
func TestFirewallServiceUnit_BridgeScaffold_UsesIPSaddrNotIifname(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, true)
// Podman bridge names exceed IFNAMSIZ=16 (e.g. "coolify-default-mesh" = 20
// chars). Scaffold MUST key dispatch on ip saddr/daddr, never iifname.
assert.Contains(t, cmd, "ip saddr")
assert.Contains(t, cmd, "ip daddr")
assert.NotContains(t, cmd, "iifname")
assert.NotContains(t, cmd, "oifname")
assert.NotContains(t, cmd, "coolify-default-mesh\"")
}
func TestInstallFirewallCommand_WritesBridgeScaffoldFile(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, true)
assert.Contains(t, cmd, "/etc/coolify/bridge-fw.nft")
assert.Contains(t, cmd, "COOLIFY_BR_EOF")
assert.Contains(t, cmd, "bridge-fw.nft.tmp")
// /etc/coolify must be created before bridge-fw.nft.tmp is written —
// without it, `cat > .tmp` fails on fresh hosts.
mkdirIdx := strings.Index(cmd, "mkdir -p /etc/coolify")
tmpIdx := strings.Index(cmd, "bridge-fw.nft.tmp")
assert.GreaterOrEqual(t, mkdirIdx, 0, "mkdir -p /etc/coolify must be present")
assert.Less(t, mkdirIdx, tmpIdx, "mkdir must run before bridge-fw.nft.tmp write")
}
func TestInstallFirewallCommand_DefaultDenyOff_RemovesBridgeScaffold(t *testing.T) {
subnets := []*net.IPNet{mustParseCIDR("10.210.0.0/24")}
cmd := InstallFirewallCommand("wg0", []string{"default"}, subnets, false)
assert.Contains(t, cmd, "rm -f /etc/coolify/bridge-fw.nft")
assert.NotContains(t, cmd, "COOLIFY_BR_EOF")
}
func TestFirewallServiceUnit_GoldenFixture_TwoNamespaces(t *testing.T) {
subnets := []*net.IPNet{
mustParseCIDR("10.210.0.0/24"),
mustParseCIDR("10.220.0.0/24"),
}
got := FirewallServiceUnit("wg0", []string{"alpha", "default"}, subnets, true)
fixturePath := filepath.Join("..", "..", "test", "fixtures", "firewall_unit_deny_two_ns.txt")
if os.Getenv("UPDATE_GOLDEN") == "1" {
err := os.WriteFile(fixturePath, []byte(got), 0o600)
require.NoError(t, err, "failed to write golden fixture")
t.Logf("golden fixture updated: %s", fixturePath)
return
}
b, err := os.ReadFile(fixturePath)
require.NoError(t, err, "golden fixture missing — run with UPDATE_GOLDEN=1 to create it")
assert.Equal(t, string(b), got)
}
func TestFirewallServiceUnit_MultipleNamespacesEmitPerSubnetRules(t *testing.T) {
subnets := []*net.IPNet{
mustParseCIDR("10.210.1.0/24"),
mustParseCIDR("10.220.1.0/24"),
}
got := FirewallServiceUnit("wg0", []string{"default"}, subnets, true)
// Each namespace subnet gets its own POSTROUTING RETURN + FORWARD jumps.
for _, sub := range []string{"10.210.1.0/24", "10.220.1.0/24"} {
assert.Contains(t, got, "/usr/sbin/iptables -t nat -I POSTROUTING -s "+sub+" -o wg0 -j RETURN")
assert.Contains(t, got, "/usr/sbin/iptables -A FORWARD -d "+sub+" -j COOLIFY-INTRA")
assert.Contains(t, got, "/usr/sbin/iptables -A FORWARD -s "+sub+" -j COOLIFY-INTRA")
}
}
+214
View File
@@ -0,0 +1,214 @@
package wireguard
import (
"fmt"
"strings"
)
// actionCategory classifies every ActionType so the intent filter can decide
// whether to allow, skip, or block it per host.
type actionCategory int
const (
// catSafeAlways: pure add/first-time install. Idempotent, no runtime
// disruption on re-run. Included in every intent.
catSafeAlways actionCategory = iota
// catPeerRefresh: rewrites a config or restarts a service as part of
// keeping peer/namespace state in sync. Idempotent, short service blip
// at worst. Allowed in every intent (extend needs it on existing hosts
// to pick up the new peer's AllowedIPs; upgrade needs it for the
// post-install service restart).
catPeerRefresh
// catDestructiveReplace: recreates a resource that may currently be in
// use (running containers on a podman bridge). Blocked on existing
// hosts in extend mode unless --allow-replace is set. Always blocked
// in upgrade mode.
catDestructiveReplace
// catVersionBump: re-downloads an agent binary (coold, corrosion,
// scheduler, builder). Runs on new hosts in extend mode (first install)
// but not on existing hosts. Always allowed in upgrade mode.
catVersionBump
// catWipeDB: special-case for ActionWriteCorrosionSchema when the
// schema drift branch fires (pre-existing sqlite DB gets deleted).
// Only allowed in bootstrap mode and on brand-new hosts in extend
// mode. Never allowed on existing hosts, even with --allow-replace.
catWipeDB
// catCorrosionSchemaFirstWrite: ActionWriteCorrosionSchema when no
// prior schema is present (CorrosionSchemaSha256 is empty). Safe
// everywhere because nothing gets wiped.
catCorrosionSchemaFirstWrite
)
// categorize returns the category for a planned action. The schema action is
// looked up contextually (plan fills Detail with "[schema drift — DB will be
// reset]" when the wipe branch applies).
func categorize(a PlannedAction) actionCategory {
switch a.Type {
case ActionInstallWG,
ActionGenKeyPair,
ActionAllocateMgmtIP,
ActionAllocateContainerSubnet,
ActionEnableService,
ActionInstallPodman,
ActionEnablePodmanSocket,
ActionEnableIPForward,
ActionCreatePodmanNet,
ActionGenerateJWTKeypair,
ActionAddPeer,
ActionRemovePeer:
return catSafeAlways
case ActionWriteConfig,
ActionReloadService,
ActionInstallFirewall,
ActionWriteCorrosionConfig,
ActionInstallCorrosionService,
ActionInstallCooldService,
ActionInstallSchedulerService,
ActionWriteHostJWT,
ActionUpdateCooldSchedulerEnv:
return catPeerRefresh
case ActionRecreatePodmanNet:
return catDestructiveReplace
case ActionInstallCorrosion,
ActionInstallCoold,
ActionInstallScheduler,
ActionInstallBuilder:
return catVersionBump
case ActionWriteCorrosionSchema:
if strings.Contains(a.Detail, "DB will be reset") {
return catWipeDB
}
return catCorrosionSchemaFirstWrite
}
return catSafeAlways
}
// ValidateIntent enforces pre-plan invariants the filter itself can't express.
func ValidateIntent(d *DesiredMesh) error {
switch d.Intent {
case IntentBootstrap:
return nil
case IntentExtend:
if len(d.NewHosts) == 0 {
return fmt.Errorf("extend mode requires at least one host in NewHosts")
}
hostSet := make(map[string]struct{}, len(d.Hosts))
for _, h := range d.Hosts {
hostSet[h] = struct{}{}
}
for _, nh := range d.NewHosts {
if _, ok := hostSet[nh]; !ok {
return fmt.Errorf("extend mode: new host %q not in --servers list", nh)
}
}
return nil
case IntentUpgrade:
if !d.AllowNightly {
for _, pair := range [][2]string{
{"--coold-version", d.CooldVersion},
{"--corrosion-version", d.CorrosionVersion},
{"--scheduler-version", d.SchedulerVersion},
} {
if pair[1] == "nightly" {
return fmt.Errorf(
"upgrade mode rejects %s=nightly (moving target forces re-install every run); pin a version or pass --allow-nightly",
pair[0],
)
}
}
}
return nil
default:
return fmt.Errorf("unknown intent %q", d.Intent)
}
}
// filterByIntent mutates plan.Actions in place, moving blocked/skipped actions
// into plan.Skipped with a reason. For IntentBootstrap (default) it is a no-op.
func filterByIntent(plan *Plan, d *DesiredMesh) {
if d.Intent == IntentBootstrap {
return
}
newHostSet := make(map[string]struct{}, len(d.NewHosts))
for _, h := range d.NewHosts {
newHostSet[h] = struct{}{}
}
kept := plan.Actions[:0]
for _, a := range plan.Actions {
reason := decide(a, d, newHostSet)
if reason == "" {
kept = append(kept, a)
continue
}
plan.Skipped = append(plan.Skipped, SkippedAction{Action: a, Reason: reason})
}
plan.Actions = kept
}
// decide returns an empty string when the action should run, or a short
// human-readable reason when it should be skipped.
func decide(a PlannedAction, d *DesiredMesh, newHostSet map[string]struct{}) string {
cat := categorize(a)
_, isNewHost := newHostSet[a.Host]
switch d.Intent {
case IntentExtend:
if isNewHost {
// Everything runs on a brand-new host — it needs the full install.
return ""
}
// Existing host in extend mode: only peer-refresh and safe-always
// (whose guards prevent re-runs on converged hosts) actions run.
switch cat {
case catSafeAlways, catPeerRefresh:
return ""
case catDestructiveReplace:
if d.AllowReplace {
return ""
}
return "extend: destructive-replace on existing host blocked; pass --allow-replace to override"
case catVersionBump:
return "extend: version-bump on existing host skipped; use `coolify init upgrade` to bump versions"
case catWipeDB:
return "extend: corrosion DB wipe on existing host is never allowed; resolve schema drift with `coolify init upgrade` on a fresh schema"
case catCorrosionSchemaFirstWrite:
return ""
}
case IntentUpgrade:
switch cat {
case catVersionBump:
return ""
case catPeerRefresh:
if isUpgradeServiceRestart(a.Type) {
return ""
}
return "upgrade: peer-refresh skipped; use `coolify init extend` for mesh topology changes"
case catSafeAlways, catDestructiveReplace, catWipeDB, catCorrosionSchemaFirstWrite:
return "upgrade: non-version-bump action skipped"
}
default:
// IntentBootstrap (and unknown intents) keep every action.
}
return ""
}
// isUpgradeServiceRestart returns true when a peer-refresh action is the
// follow-up systemctl restart after a binary install and must run in upgrade
// mode to pick up the new binary.
func isUpgradeServiceRestart(t ActionType) bool {
switch t {
case ActionInstallCorrosionService,
ActionInstallCooldService,
ActionInstallSchedulerService:
return true
default:
return false
}
}
+291
View File
@@ -0,0 +1,291 @@
package wireguard
import (
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func TestValidateIntent_Bootstrap(t *testing.T) {
d := &DesiredMesh{Intent: IntentBootstrap}
require.NoError(t, ValidateIntent(d))
}
func TestValidateIntent_ExtendRequiresNewHosts(t *testing.T) {
d := &DesiredMesh{
Intent: IntentExtend,
Hosts: []string{"A", "B"},
}
err := ValidateIntent(d)
require.Error(t, err)
assert.Contains(t, err.Error(), "NewHosts")
}
func TestValidateIntent_ExtendNewHostMustBeInServers(t *testing.T) {
d := &DesiredMesh{
Intent: IntentExtend,
Hosts: []string{"A", "B"},
NewHosts: []string{"C"},
}
err := ValidateIntent(d)
require.Error(t, err)
assert.Contains(t, err.Error(), `"C"`)
assert.Contains(t, err.Error(), "--servers")
}
func TestValidateIntent_ExtendHappy(t *testing.T) {
d := &DesiredMesh{
Intent: IntentExtend,
Hosts: []string{"A", "B", "C"},
NewHosts: []string{"C"},
}
require.NoError(t, ValidateIntent(d))
}
func TestValidateIntent_UpgradeRejectsNightlyByDefault(t *testing.T) {
for _, tc := range []struct {
name string
d DesiredMesh
}{
{"coold", DesiredMesh{Intent: IntentUpgrade, CooldVersion: "nightly", CorrosionVersion: "v1", SchedulerVersion: "v1"}},
{"corrosion", DesiredMesh{Intent: IntentUpgrade, CooldVersion: "v1", CorrosionVersion: "nightly", SchedulerVersion: "v1"}},
{"scheduler", DesiredMesh{Intent: IntentUpgrade, CooldVersion: "v1", CorrosionVersion: "v1", SchedulerVersion: "nightly"}},
} {
t.Run(tc.name, func(t *testing.T) {
err := ValidateIntent(&tc.d)
require.Error(t, err)
assert.Contains(t, err.Error(), "nightly")
})
}
}
func TestValidateIntent_UpgradeAllowsNightlyWhenOpted(t *testing.T) {
d := &DesiredMesh{
Intent: IntentUpgrade,
CooldVersion: "nightly",
CorrosionVersion: "nightly",
SchedulerVersion: "nightly",
AllowNightly: true,
}
require.NoError(t, ValidateIntent(d))
}
func TestValidateIntent_UpgradeAllowsPinned(t *testing.T) {
d := &DesiredMesh{
Intent: IntentUpgrade,
CooldVersion: "v1.2.3",
CorrosionVersion: "v0.9.0",
SchedulerVersion: "v0.3.0",
}
require.NoError(t, ValidateIntent(d))
}
func TestValidateIntent_UnknownIntent(t *testing.T) {
d := &DesiredMesh{Intent: Intent("bogus")}
err := ValidateIntent(d)
require.Error(t, err)
assert.Contains(t, err.Error(), "bogus")
}
func TestCategorize(t *testing.T) {
cases := []struct {
t ActionType
want actionCategory
}{
{ActionInstallWG, catSafeAlways},
{ActionGenKeyPair, catSafeAlways},
{ActionAllocateMgmtIP, catSafeAlways},
{ActionAllocateContainerSubnet, catSafeAlways},
{ActionEnableService, catSafeAlways},
{ActionInstallPodman, catSafeAlways},
{ActionEnablePodmanSocket, catSafeAlways},
{ActionEnableIPForward, catSafeAlways},
{ActionCreatePodmanNet, catSafeAlways},
{ActionGenerateJWTKeypair, catSafeAlways},
{ActionAddPeer, catSafeAlways},
{ActionRemovePeer, catSafeAlways},
{ActionWriteConfig, catPeerRefresh},
{ActionReloadService, catPeerRefresh},
{ActionInstallFirewall, catPeerRefresh},
{ActionWriteCorrosionConfig, catPeerRefresh},
{ActionInstallCorrosionService, catPeerRefresh},
{ActionInstallCooldService, catPeerRefresh},
{ActionInstallSchedulerService, catPeerRefresh},
{ActionWriteHostJWT, catPeerRefresh},
{ActionUpdateCooldSchedulerEnv, catPeerRefresh},
{ActionRecreatePodmanNet, catDestructiveReplace},
{ActionInstallCorrosion, catVersionBump},
{ActionInstallCoold, catVersionBump},
{ActionInstallScheduler, catVersionBump},
{ActionInstallBuilder, catVersionBump},
}
for _, tc := range cases {
t.Run(string(tc.t), func(t *testing.T) {
assert.Equal(t, tc.want, categorize(PlannedAction{Type: tc.t}))
})
}
}
func TestCategorize_SchemaWipeVsFirstWrite(t *testing.T) {
firstWrite := PlannedAction{
Type: ActionWriteCorrosionSchema,
Detail: "/etc/corrosion/schemas/coolify.sql",
}
wipe := PlannedAction{
Type: ActionWriteCorrosionSchema,
Detail: "/etc/corrosion/schemas/coolify.sql [schema drift — DB will be reset]",
}
assert.Equal(t, catCorrosionSchemaFirstWrite, categorize(firstWrite))
assert.Equal(t, catWipeDB, categorize(wipe))
}
func TestFilterByIntent_BootstrapNoop(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A", Type: ActionInstallCoold},
{Host: "B", Type: ActionRecreatePodmanNet},
{Host: "B", Type: ActionWriteCorrosionSchema, Detail: "DB will be reset"},
}}
filterByIntent(plan, &DesiredMesh{Intent: IntentBootstrap})
assert.Len(t, plan.Actions, 3)
assert.Empty(t, plan.Skipped)
}
func TestFilterByIntent_ExtendNewHostRunsEverything(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-new", Type: ActionInstallCoold},
{Host: "A-new", Type: ActionInstallCorrosion},
{Host: "A-new", Type: ActionCreatePodmanNet},
{Host: "A-new", Type: ActionWriteCorrosionSchema, Detail: "first write"},
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
})
assert.Len(t, plan.Actions, 4)
assert.Empty(t, plan.Skipped)
}
func TestFilterByIntent_ExtendExistingHostPeerRefreshOnly(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-old", Type: ActionWriteConfig},
{Host: "A-old", Type: ActionReloadService},
{Host: "A-old", Type: ActionWriteCorrosionConfig},
{Host: "A-old", Type: ActionInstallFirewall},
{Host: "A-old", Type: ActionInstallCoold}, // version bump: skipped
{Host: "A-old", Type: ActionInstallBuilder}, // version bump: skipped
{Host: "A-new", Type: ActionInstallCoold}, // new host: kept
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
})
kept := map[ActionType]bool{}
for _, a := range plan.Actions {
kept[a.Type] = true
}
assert.True(t, kept[ActionWriteConfig])
assert.True(t, kept[ActionReloadService])
assert.True(t, kept[ActionWriteCorrosionConfig])
assert.True(t, kept[ActionInstallFirewall])
skippedTypes := map[ActionType]int{}
for _, s := range plan.Skipped {
skippedTypes[s.Action.Type]++
}
// InstallCoold/InstallBuilder appear once for the existing host and kept once for the new host.
assert.Equal(t, 1, skippedTypes[ActionInstallCoold])
assert.Equal(t, 1, skippedTypes[ActionInstallBuilder])
// Exactly one InstallCoold survived — the one targeting the new host.
var survivors []string
for _, a := range plan.Actions {
if a.Type == ActionInstallCoold {
survivors = append(survivors, a.Host)
}
}
assert.Equal(t, []string{"A-new"}, survivors)
}
func TestFilterByIntent_ExtendBlocksDestructiveOnExistingWithoutAllowReplace(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-old", Type: ActionRecreatePodmanNet, Detail: "coolify-default-mesh — dns_enabled=true"},
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
})
assert.Empty(t, plan.Actions)
require.Len(t, plan.Skipped, 1)
assert.Contains(t, plan.Skipped[0].Reason, "--allow-replace")
}
func TestFilterByIntent_ExtendAllowReplaceUnlocksDestructive(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-old", Type: ActionRecreatePodmanNet},
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
AllowReplace: true,
})
assert.Len(t, plan.Actions, 1)
assert.Empty(t, plan.Skipped)
}
func TestFilterByIntent_ExtendAllowReplaceDoesNotUnlockWipeDB(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A-old", Type: ActionWriteCorrosionSchema, Detail: "schema drift — DB will be reset"},
}}
filterByIntent(plan, &DesiredMesh{
Intent: IntentExtend,
NewHosts: []string{"A-new"},
AllowReplace: true,
})
assert.Empty(t, plan.Actions)
require.Len(t, plan.Skipped, 1)
assert.Contains(t, plan.Skipped[0].Reason, "never allowed")
}
func TestFilterByIntent_UpgradeOnlyKeepsVersionBumpsAndServiceRestarts(t *testing.T) {
plan := &Plan{Actions: []PlannedAction{
{Host: "A", Type: ActionInstallCoold},
{Host: "A", Type: ActionInstallCorrosion},
{Host: "A", Type: ActionInstallScheduler},
{Host: "A", Type: ActionInstallBuilder},
{Host: "A", Type: ActionInstallCooldService},
{Host: "A", Type: ActionInstallCorrosionService},
{Host: "A", Type: ActionInstallSchedulerService},
{Host: "A", Type: ActionWriteConfig}, // skipped
{Host: "A", Type: ActionReloadService}, // skipped
{Host: "A", Type: ActionCreatePodmanNet}, // skipped
{Host: "A", Type: ActionRecreatePodmanNet}, // skipped
{Host: "A", Type: ActionInstallFirewall}, // skipped (non-restart peer-refresh)
}}
filterByIntent(plan, &DesiredMesh{Intent: IntentUpgrade})
kept := map[ActionType]bool{}
for _, a := range plan.Actions {
kept[a.Type] = true
}
for _, want := range []ActionType{
ActionInstallCoold, ActionInstallCorrosion, ActionInstallScheduler, ActionInstallBuilder,
ActionInstallCooldService, ActionInstallCorrosionService, ActionInstallSchedulerService,
} {
assert.True(t, kept[want], "expected %s kept in upgrade", want)
}
skippedTypes := map[ActionType]bool{}
for _, s := range plan.Skipped {
skippedTypes[s.Action.Type] = true
}
for _, want := range []ActionType{
ActionWriteConfig, ActionReloadService, ActionCreatePodmanNet, ActionRecreatePodmanNet, ActionInstallFirewall,
} {
assert.True(t, skippedTypes[want], "expected %s skipped in upgrade", want)
}
}
+627
View File
@@ -0,0 +1,627 @@
package wireguard
import (
"crypto/sha256"
"encoding/hex"
"fmt"
"net"
"strings"
"github.com/coollabsio/coolify-cli/internal/services"
)
// ActionType identifies the kind of change required.
type ActionType string
const (
ActionInstallWG ActionType = "install-wg"
ActionGenKeyPair ActionType = "gen-keypair"
ActionAllocateMgmtIP ActionType = "allocate-mgmt-ip"
ActionAllocateContainerSubnet ActionType = "allocate-container-subnet"
ActionWriteConfig ActionType = "write-config"
ActionEnableService ActionType = "enable-service"
ActionReloadService ActionType = "reload-service"
ActionAddPeer ActionType = "add-peer"
ActionRemovePeer ActionType = "remove-peer"
ActionInstallPodman ActionType = "install-podman"
ActionEnablePodmanSocket ActionType = "enable-podman-socket"
ActionEnableIPForward ActionType = "enable-ip-forward"
ActionCreatePodmanNet ActionType = "create-podman-network"
ActionRecreatePodmanNet ActionType = "recreate-podman-network"
ActionInstallFirewall ActionType = "install-firewall"
ActionInstallCorrosion ActionType = "install-corrosion"
ActionInstallCoold ActionType = "install-coold"
ActionWriteCorrosionConfig ActionType = "write-corrosion-config"
ActionWriteCorrosionSchema ActionType = "write-corrosion-schema"
ActionInstallCorrosionService ActionType = "install-corrosion-service"
ActionInstallCooldService ActionType = "install-coold-service"
ActionInstallScheduler ActionType = "install-scheduler"
ActionGenerateJWTKeypair ActionType = "generate-jwt-keypair"
ActionInstallSchedulerService ActionType = "install-scheduler-service"
ActionWriteHostJWT ActionType = "write-host-jwt"
ActionUpdateCooldSchedulerEnv ActionType = "update-coold-scheduler-env"
ActionInstallBuilder ActionType = "install-builder"
)
// PlannedAction is one step that apply must execute on a host.
type PlannedAction struct {
Host string
Namespace string // empty for host-global actions
Type ActionType
Detail string
}
// Plan is the list of actions needed to converge the mesh to the desired state.
type Plan struct {
Actions []PlannedAction
// MgmtAssignments maps host → planned WG management /32 IP.
MgmtAssignments map[string]net.IP
// SubnetAssignments maps namespace → host → planned container subnet.
SubnetAssignments map[string]map[string]*net.IPNet
// Warnings contains non-fatal conflict messages from the IP allocator.
Warnings []Warning
// Skipped lists actions that were filtered out by the Intent gate
// (e.g. destructive-replace on an existing host in extend mode). Exposed
// so the plan preview can show operators what would have fired and why.
Skipped []SkippedAction
}
// SkippedAction is a PlannedAction that BuildPlan would have emitted but the
// Intent filter suppressed. Reason is a short human-readable message.
type SkippedAction struct {
Action PlannedAction
Reason string
}
// IsEmpty returns true when the mesh is already converged (no changes needed).
func (p *Plan) IsEmpty() bool { return len(p.Actions) == 0 }
// BuildPlan computes the actions required to bring current into alignment
// with desired. It is a pure function: no SSH, no I/O.
func BuildPlan(desired *DesiredMesh, current MeshState) (*Plan, error) {
if desired.DefaultDenyContainers && !desired.InstallPodman {
return nil, fmt.Errorf("--default-deny requires --podman")
}
if desired.InstallCoold && !desired.InstallPodman {
return nil, fmt.Errorf("--install-coold requires --podman")
}
if desired.InstallPodman && len(desired.Namespaces) == 0 {
return nil, fmt.Errorf("at least one namespace is required")
}
if err := ValidateIntent(desired); err != nil {
return nil, err
}
// Validate per-host preconditions before computing actions.
for _, host := range desired.Hosts {
if state, ok := current.Servers[host]; ok && desired.DefaultDenyContainers {
if !state.NftAvailable {
return nil, fmt.Errorf(
"host %s: nft binary not available; install nftables or pass --skip-default-deny",
host,
)
}
}
}
mgmtAssignments, mgmtWarns, err := AllocateMgmtIPs(desired.MgmtPool, current.AssignedMgmtIPs(), desired.Hosts)
if err != nil {
return nil, fmt.Errorf("mgmt IP allocation: %w", err)
}
containerAssignments, contWarns, err := AllocateNamespaced(
desired.ContainerPool, desired.ContainerPrefix,
current.AssignedContainerSubnets(), desired.Namespaces, desired.Hosts)
if err != nil {
return nil, fmt.Errorf("container subnet allocation: %w", err)
}
plan := &Plan{
MgmtAssignments: mgmtAssignments,
SubnetAssignments: containerAssignments,
Warnings: append(mgmtWarns, contWarns...),
}
nsSorted := desired.SortedNamespaces()
for _, host := range desired.Hosts {
state, ok := current.Servers[host]
if !ok {
state = &ServerState{Host: host, Namespaces: map[string]*NamespaceServerState{}}
}
if state.Namespaces == nil {
state.Namespaces = map[string]*NamespaceServerState{}
}
// --- WireGuard installation ---
if !state.Installed {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallWG,
Detail: "wireguard not installed",
})
}
// --- Key generation ---
if !state.KeysExist {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionGenKeyPair,
Detail: "no keys at /etc/wireguard/privatekey",
})
}
// --- Mgmt IP allocation ---
mgmtIP := mgmtAssignments[host]
if state.WireGuardMgmtIP == nil ||
!state.WireGuardMgmtIP.Equal(mgmtIP) {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionAllocateMgmtIP,
Detail: fmt.Sprintf("%s/32", mgmtIP),
})
}
// --- Container subnet allocation (one per namespace) ---
if desired.InstallPodman {
for _, ns := range nsSorted {
contSubnet := containerAssignments[ns][host]
current := state.Namespaces[ns]
if current == nil || current.ContainerSubnet == nil ||
current.ContainerSubnet.String() != contSubnet.String() {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Namespace: ns,
Type: ActionAllocateContainerSubnet,
Detail: contSubnet.String(),
})
}
}
}
// --- Peer diff ---
desiredPeerKeys := make(map[string]bool)
for _, peer := range desired.Hosts {
if peer == host {
continue
}
if ps, ok2 := current.Servers[peer]; ok2 && ps.PublicKey != "" {
desiredPeerKeys[ps.PublicKey] = true
}
}
currentPeerKeys := make(map[string]bool)
for _, p := range state.Peers {
currentPeerKeys[p.PublicKey] = true
}
for key := range desiredPeerKeys {
if !currentPeerKeys[key] {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionAddPeer,
Detail: truncateKey(key),
})
}
}
for key := range currentPeerKeys {
if !desiredPeerKeys[key] {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionRemovePeer,
Detail: truncateKey(key),
})
}
}
// --- Config write ---
mgmtMismatch := state.WireGuardMgmtIP == nil || !state.WireGuardMgmtIP.Equal(mgmtIP)
allowedIPsDrift := allowedIPsNeedsRewrite(host, desired, current, containerAssignments, mgmtAssignments, state)
needsConfig := mgmtMismatch ||
allowedIPsDrift ||
len(plan.actionsForHost(host, ActionAddPeer)) > 0 ||
len(plan.actionsForHost(host, ActionRemovePeer)) > 0 ||
!state.KeysExist ||
!state.Installed ||
len(desired.Hosts) > 1 && state.ListenPort != desired.ListenPort
if needsConfig {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionWriteConfig,
Detail: fmt.Sprintf("%s.conf (%d peer(s))", desired.Interface, len(desired.Hosts)-1),
})
}
// --- WG service ---
if !state.Active {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionEnableService,
Detail: fmt.Sprintf("systemctl enable --now wg-quick@%s", desired.Interface),
})
} else if needsConfig {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionReloadService,
Detail: fmt.Sprintf("systemctl reload wg-quick@%s (config changed)", desired.Interface),
})
}
// --- Podman stack ---
if desired.InstallPodman {
if !state.PodmanInstalled {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallPodman,
Detail: "podman not installed",
})
}
if !state.PodmanSocketActive {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionEnablePodmanSocket,
Detail: "systemctl enable --now podman.socket",
})
}
if !state.IPForwardEnabled {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionEnableIPForward,
Detail: "net.ipv4.ip_forward=1",
})
}
for _, ns := range nsSorted {
contSubnet := containerAssignments[ns][host]
netName := PodmanNetworkFor(ns)
nss := state.Namespaces[ns]
gw := MachineIP(contSubnet)
if nss == nil || !nss.NetworkExists {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Namespace: ns,
Type: ActionCreatePodmanNet,
Detail: fmt.Sprintf("%s subnet=%s gateway=%s", netName, contSubnet, gw),
})
continue
}
if nss.DNSEnabled ||
(nss.ContainerSubnet != nil && nss.ContainerSubnet.String() != contSubnet.String()) ||
nss.Label != ns {
reasons := []string{}
if nss.DNSEnabled {
reasons = append(reasons, "dns_enabled=true")
}
if nss.ContainerSubnet != nil && nss.ContainerSubnet.String() != contSubnet.String() {
reasons = append(reasons, fmt.Sprintf("subnet drift (have %s, want %s)", nss.ContainerSubnet, contSubnet))
}
if nss.Label != ns {
reasons = append(reasons, fmt.Sprintf("label=%q mismatch", nss.Label))
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Namespace: ns,
Type: ActionRecreatePodmanNet,
Detail: fmt.Sprintf("%s — %s", netName, strings.Join(reasons, "; ")),
})
}
}
// Expected firewall unit text — hash it and compare against the
// remote unit so adding/removing a namespace reinstalls the unit.
var subnets []*net.IPNet
for _, ns := range nsSorted {
subnets = append(subnets, containerAssignments[ns][host])
}
expectedUnit := FirewallServiceUnit(desired.Interface, desired.SortedNamespaces(), subnets, desired.DefaultDenyContainers)
expectedUnitHash := sha256Hex([]byte(expectedUnit))
unitDrift := state.FirewallUnitSha256 != expectedUnitHash
if !state.FirewallActive ||
state.DefaultDenyActive != desired.DefaultDenyContainers ||
unitDrift {
detail := fmt.Sprintf("coolify-mesh-fw.service (%s, %d namespace(s), default-deny=%v)",
desired.Interface, len(subnets), desired.DefaultDenyContainers)
if unitDrift && state.FirewallUnitSha256 != "" {
detail += " [unit drift]"
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallFirewall,
Detail: detail,
})
}
}
// --- Corrosion + coold stack (v5 control plane) ---
if desired.InstallCoold {
corrosionDrift := binaryVersionDrift(desired.CorrosionVersion, state.CorrosionInstalled, state.CorrosionVersion)
cooldDrift := binaryVersionDrift(desired.CooldVersion, state.CooldInstalled, state.CooldVersion)
if corrosionDrift {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallCorrosion,
Detail: fmt.Sprintf("corrosion %s → /usr/local/bin/corrosion", desired.CorrosionVersion),
})
}
if cooldDrift {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallCoold,
Detail: fmt.Sprintf("coold %s → /usr/local/bin/coold", desired.CooldVersion),
})
}
peers := peerMgmtIPs(host, desired.Hosts, mgmtAssignments)
expectedConfig := services.CorrosionConfigBytes(mgmtIP,
desired.CorrosionGossipPort, desired.CorrosionAPIPort, peers)
expectedHash := sha256Hex(expectedConfig)
configDrift := state.CorrosionConfigHash != expectedHash
if configDrift {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionWriteCorrosionConfig,
Detail: fmt.Sprintf("/etc/corrosion/config.toml (peers=%d)", len(peers)),
})
}
expectedSchemaSha := sha256Hex([]byte(services.CoolifySchemaSQL))
schemaDrift := state.CorrosionSchemaSha256 != expectedSchemaSha
if !state.CorrosionSchemaExists || schemaDrift {
detail := "/etc/corrosion/schemas/coolify.sql"
if schemaDrift && state.CorrosionSchemaSha256 != "" {
detail += " [schema drift — DB will be reset]"
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionWriteCorrosionSchema,
Detail: detail,
})
}
nsConfigs := buildNamespaceConfigs(host, nsSorted, containerAssignments)
expectedCooldUnit := services.CooldServiceUnit(mgmtIP, nsConfigs)
cooldUnitDrift := state.CooldUnitSha256 != sha256Hex([]byte(expectedCooldUnit))
if !state.CorrosionActive || configDrift || corrosionDrift || schemaDrift {
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallCorrosionService,
Detail: "systemctl enable --now corrosion",
})
}
if !state.CooldActive || configDrift || cooldDrift || cooldUnitDrift {
detail := fmt.Sprintf("systemctl enable --now coold (mgmt=%s, namespaces=%d)", mgmtIP, len(nsConfigs))
if cooldUnitDrift && state.CooldUnitSha256 != "" {
detail += " [unit drift]"
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallCooldService,
Detail: detail,
})
}
}
}
// --- Scheduler + JWT stack (central-only) ---
if desired.CentralHost != "" {
plan.Actions = append(plan.Actions,
PlannedAction{
Host: desired.CentralHost,
Type: ActionInstallScheduler,
Detail: fmt.Sprintf("scheduler %s → /usr/local/bin/scheduler", desired.SchedulerVersion),
},
PlannedAction{
Host: desired.CentralHost,
Type: ActionGenerateJWTKeypair,
Detail: "ES256 EC P-256 keypair at /etc/coolify/jwt.{priv,pub}",
},
PlannedAction{
Host: desired.CentralHost,
Type: ActionInstallSchedulerService,
Detail: fmt.Sprintf("scheduler.service (:%d)",
services.SchedulerGRPCPort),
},
)
// Per-host: JWT + coold unit rewrite (inject scheduler env).
for _, host := range desired.Hosts {
plan.Actions = append(plan.Actions,
PlannedAction{
Host: host,
Type: ActionWriteHostJWT,
Detail: services.HostJWTPath,
},
PlannedAction{
Host: host,
Type: ActionUpdateCooldSchedulerEnv,
Detail: "coold.service += SCHEDULER_URL + HOST_JWT_PATH",
},
)
}
}
// --- Builder capability (per-host, requires scheduler) ---
//
// No separate systemd unit and no second JWT — the builder binary is a
// short-lived subprocess coold spawns under a `systemd-run --pipe`
// transient unit. All we install at provisioning time is the binary plus
// its tool deps (buildah, git); coold advertises the capability via its
// Hello frame, and the JWT `caps` claim (handled by the host-JWT action
// above) authorizes it. Only hosts in the desired builder set get the
// binary install; others stay coold-only.
if desired.CentralHost != "" {
for _, host := range desired.Hosts {
if !desired.HasBuilderCap(host) {
continue
}
plan.Actions = append(plan.Actions, PlannedAction{
Host: host,
Type: ActionInstallBuilder,
Detail: fmt.Sprintf("builder %s → %s (+ buildah, git; capacity=%d)", desired.CooldVersion, services.BuilderBinaryPath, maxCapacity(desired.BuilderCapacity)),
})
}
}
filterByIntent(plan, desired)
return plan, nil
}
func maxCapacity(c int) int {
if c <= 0 {
return 2
}
return c
}
// buildNamespaceConfigs builds the per-namespace CooldNamespace slice for this
// host, in namespace name order. Gateway IP for each namespace is the .1 of
// that namespace's per-host container subnet.
func buildNamespaceConfigs(host string, nsSorted []string, assignments map[string]map[string]*net.IPNet) []services.CooldNamespace {
out := make([]services.CooldNamespace, 0, len(nsSorted))
for _, ns := range nsSorted {
subnet := assignments[ns][host]
if subnet == nil {
continue
}
out = append(out, services.CooldNamespace{
Name: ns,
Network: PodmanNetworkFor(ns),
BridgeGateway: MachineIP(subnet),
})
}
return out
}
// binaryVersionDrift returns true when a binary needs (re-)installation.
// Rules:
// - not installed → always drift
// - marker absent (empty haveVersion) → treat as drift (first-migration case)
// - "nightly" tag → always re-install (moving target)
// - pinned tag → drift only when marker differs from desired
func binaryVersionDrift(desiredVersion string, installed bool, haveVersion string) bool {
if !installed || haveVersion == "" {
return true
}
if desiredVersion == "nightly" {
return true
}
return haveVersion != desiredVersion
}
// allowedIPsNeedsRewrite returns true when any [Peer] block on host does not
// have the expected AllowedIPs (peer mgmt /32 + every namespace subnet).
func allowedIPsNeedsRewrite(
host string,
desired *DesiredMesh,
current MeshState,
containerAssignments map[string]map[string]*net.IPNet,
mgmtAssignments map[string]net.IP,
state *ServerState,
) bool {
if state == nil {
return false
}
nsSorted := desired.SortedNamespaces()
// Build pub-key → expected AllowedIPs set for every peer we should have.
want := map[string]map[string]struct{}{}
for _, peer := range desired.Hosts {
if peer == host {
continue
}
ps, ok := current.Servers[peer]
if !ok || ps.PublicKey == "" {
continue
}
mgmtIP := mgmtAssignments[peer]
if mgmtIP == nil {
continue
}
entries := map[string]struct{}{fmt.Sprintf("%s/32", mgmtIP): {}}
for _, ns := range nsSorted {
if sn := containerAssignments[ns][peer]; sn != nil {
entries[sn.String()] = struct{}{}
}
}
want[ps.PublicKey] = entries
}
// Compare against parsed peers in the current config. If any desired peer
// has different AllowedIPs (missing or extra), we need to rewrite.
have := map[string]map[string]struct{}{}
for _, p := range state.Peers {
s := map[string]struct{}{}
for _, a := range p.AllowedIPs {
s[strings.TrimSpace(a)] = struct{}{}
}
have[p.PublicKey] = s
}
for pk, wantSet := range want {
haveSet, ok := have[pk]
if !ok {
return true
}
if !sameStringSet(wantSet, haveSet) {
return true
}
}
return false
}
func sameStringSet(a, b map[string]struct{}) bool {
if len(a) != len(b) {
return false
}
for k := range a {
if _, ok := b[k]; !ok {
return false
}
}
return true
}
// peerMgmtIPs returns the mgmt IPs of all hosts except self, drawn from the
// planned assignments so the result is stable even before any host has been
// probed.
func peerMgmtIPs(self string, hosts []string, assignments map[string]net.IP) []net.IP {
out := make([]net.IP, 0, len(hosts)-1)
for _, h := range hosts {
if h == self {
continue
}
if ip, ok := assignments[h]; ok && ip != nil {
out = append(out, ip)
}
}
return out
}
func sha256Hex(b []byte) string {
sum := sha256.Sum256(b)
return hex.EncodeToString(sum[:])
}
// actionsForHost returns the subset of plan.Actions matching host and atype.
func (p *Plan) actionsForHost(host string, atype ActionType) []PlannedAction {
var out []PlannedAction
for _, a := range p.Actions {
if a.Host == host && a.Type == atype {
out = append(out, a)
}
}
return out
}
// truncateKey shortens a base64 key to the first 8 chars + "…" for display.
func truncateKey(key string) string {
if len(key) <= 8 {
return key
}
return key[:8] + "..."
}
+658
View File
@@ -0,0 +1,658 @@
package wireguard
import (
"net"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
var (
defaultMgmtPool = mustParseCIDR("100.64.0.0/16")
defaultContainerPool = mustParseCIDR("10.210.0.0/16")
)
func desiredTwoHosts() *DesiredMesh {
return &DesiredMesh{
Hosts: []string{"1.1.1.1", "2.2.2.2"},
Interface: "wg0",
MgmtPool: defaultMgmtPool,
ContainerPool: defaultContainerPool,
ContainerPrefix: 24,
ListenPort: 51820,
}
}
func desiredWithPodman() *DesiredMesh {
d := desiredTwoHosts()
d.InstallPodman = true
d.Namespaces = []string{DefaultNamespace}
return d
}
// convergedServer returns a ServerState fully reconciled for the single
// `default` namespace with the supplied subnet.
func convergedServer(host, pubkey, peerKey, mgmtIP, contSubnet string) *ServerState {
sn := mustParseCIDR(contSubnet)
firewallHash := sha256Hex([]byte(FirewallServiceUnit("wg0", []string{"default"}, []*net.IPNet{sn}, false)))
return &ServerState{
Host: host,
Installed: true,
KeysExist: true,
PublicKey: pubkey,
WireGuardMgmtIP: net.ParseIP(mgmtIP).To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{
PublicKey: peerKey,
AllowedIPs: []string{peerMgmtForPub(peerKey), peerSubnetForPub(peerKey)},
}},
PodmanInstalled: true,
PodmanSocketActive: true,
IPForwardEnabled: true,
FirewallActive: true,
FirewallUnitSha256: firewallHash,
Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {
Namespace: DefaultNamespace,
NetworkExists: true,
ContainerSubnet: sn,
DNSEnabled: false,
Label: DefaultNamespace,
},
},
}
}
// peerMgmtForPub / peerSubnetForPub map the well-known test public keys to
// the mgmt /32 and /24 each peer is expected to own in the two-host fixture.
func peerMgmtForPub(pub string) string {
switch pub {
case "AAAAAAAA=":
return "100.64.0.1/32"
case "BBBBBBBB=":
return "100.64.0.2/32"
}
return ""
}
func peerSubnetForPub(pub string) string {
switch pub {
case "AAAAAAAA=":
return "10.210.0.0/24"
case "BBBBBBBB=":
return "10.210.1.0/24"
}
return ""
}
func TestBuildPlan_AlreadyConverged_NoPodman(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("100.64.0.1").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "BBBBBBBB=", AllowedIPs: []string{"100.64.0.2/32"}}},
},
"2.2.2.2": {
Host: "2.2.2.2",
Installed: true,
KeysExist: true,
PublicKey: "BBBBBBBB=",
WireGuardMgmtIP: net.ParseIP("100.64.0.2").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "AAAAAAAA=", AllowedIPs: []string{"100.64.0.1/32"}}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.True(t, plan.IsEmpty(), "expected empty plan, got: %+v", plan.Actions)
}
func TestBuildPlan_FreshBootstrap(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{Servers: map[string]*ServerState{}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.False(t, plan.IsEmpty())
actionTypes := func(host string) []ActionType {
var out []ActionType
for _, a := range plan.Actions {
if a.Host == host {
out = append(out, a.Type)
}
}
return out
}
for _, host := range []string{"1.1.1.1", "2.2.2.2"} {
types := actionTypes(host)
assert.Contains(t, types, ActionInstallWG, host)
assert.Contains(t, types, ActionGenKeyPair, host)
assert.Contains(t, types, ActionAllocateMgmtIP, host)
assert.Contains(t, types, ActionWriteConfig, host)
assert.Contains(t, types, ActionEnableService, host)
}
}
func TestBuildPlan_MgmtIPMismatchTriggersRewrite(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("10.210.0.1").To4(), // outside 100.64/16
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "BBBBBBBB="}},
},
"2.2.2.2": {
Host: "2.2.2.2",
Installed: true,
KeysExist: true,
PublicKey: "BBBBBBBB=",
WireGuardMgmtIP: net.ParseIP("100.64.0.2").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "AAAAAAAA="}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.NotEmpty(t, plan.Warnings)
var aTypes []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
aTypes = append(aTypes, a.Type)
}
}
assert.Contains(t, aTypes, ActionAllocateMgmtIP)
assert.Contains(t, aTypes, ActionWriteConfig)
}
func TestBuildPlan_AddPeer(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("100.64.0.1").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{},
},
"2.2.2.2": {
Host: "2.2.2.2",
Installed: true,
KeysExist: true,
PublicKey: "BBBBBBBB=",
WireGuardMgmtIP: net.ParseIP("100.64.0.2").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "AAAAAAAA="}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
var types []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
types = append(types, a.Type)
}
}
assert.Contains(t, types, ActionAddPeer)
assert.Contains(t, types, ActionWriteConfig)
assert.Contains(t, types, ActionReloadService)
}
func TestBuildPlan_RemovePeer(t *testing.T) {
desired := &DesiredMesh{
Hosts: []string{"1.1.1.1"},
Interface: "wg0",
MgmtPool: defaultMgmtPool,
ContainerPool: defaultContainerPool,
ContainerPrefix: 24,
ListenPort: 51820,
}
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("100.64.0.1").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "STALEKEY="}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
var types []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
types = append(types, a.Type)
}
}
assert.Contains(t, types, ActionRemovePeer)
assert.Contains(t, types, ActionWriteConfig)
}
func TestBuildPlan_StableMgmtAndContainerAssignments(t *testing.T) {
desired := desiredWithPodman()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
WireGuardMgmtIP: net.ParseIP("100.64.0.7").To4(),
Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {
Namespace: DefaultNamespace,
NetworkExists: true,
ContainerSubnet: mustParseCIDR("10.210.5.0/24"),
Label: DefaultNamespace,
},
},
},
"2.2.2.2": {
Host: "2.2.2.2",
WireGuardMgmtIP: nil,
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.Equal(t, "100.64.0.7", plan.MgmtAssignments["1.1.1.1"].String())
assert.Equal(t, "10.210.5.0/24", plan.SubnetAssignments[DefaultNamespace]["1.1.1.1"].String())
}
func TestBuildPlan_PodmanFullStack(t *testing.T) {
desired := desiredWithPodman()
current := MeshState{Servers: map[string]*ServerState{}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
collect := func(host string) []ActionType {
var out []ActionType
for _, a := range plan.Actions {
if a.Host == host {
out = append(out, a.Type)
}
}
return out
}
for _, h := range []string{"1.1.1.1", "2.2.2.2"} {
types := collect(h)
assert.Contains(t, types, ActionInstallPodman, h)
assert.Contains(t, types, ActionEnablePodmanSocket, h)
assert.Contains(t, types, ActionEnableIPForward, h)
assert.Contains(t, types, ActionCreatePodmanNet, h)
assert.Contains(t, types, ActionInstallFirewall, h)
assert.Contains(t, types, ActionAllocateContainerSubnet, h)
}
}
func TestBuildPlan_PodmanIdempotent(t *testing.T) {
desired := desiredWithPodman()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24"),
"2.2.2.2": convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24"),
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.True(t, plan.IsEmpty(), "expected empty plan, got: %+v", plan.Actions)
}
func TestBuildPlan_PodmanNotRequested(t *testing.T) {
desired := desiredTwoHosts() // InstallPodman == false
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
Installed: true,
KeysExist: true,
PublicKey: "AAAAAAAA=",
WireGuardMgmtIP: net.ParseIP("100.64.0.1").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "BBBBBBBB="}},
},
"2.2.2.2": {
Host: "2.2.2.2",
Installed: true,
KeysExist: true,
PublicKey: "BBBBBBBB=",
WireGuardMgmtIP: net.ParseIP("100.64.0.2").To4(),
ListenPort: 51820,
Active: true,
Peers: []Peer{{PublicKey: "AAAAAAAA="}},
},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
for _, a := range plan.Actions {
assert.NotEqual(t, ActionInstallPodman, a.Type)
assert.NotEqual(t, ActionEnablePodmanSocket, a.Type)
assert.NotEqual(t, ActionEnableIPForward, a.Type)
assert.NotEqual(t, ActionCreatePodmanNet, a.Type)
assert.NotEqual(t, ActionInstallFirewall, a.Type)
assert.NotEqual(t, ActionAllocateContainerSubnet, a.Type)
}
}
func TestBuildPlan_PodmanDNSEnabledTriggersRecreate(t *testing.T) {
desired := desiredWithPodman()
srvA := convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24")
srvA.Namespaces[DefaultNamespace].DNSEnabled = true // drift: aardvark-dns would squat :53
srvB := convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24")
current := MeshState{Servers: map[string]*ServerState{"1.1.1.1": srvA, "2.2.2.2": srvB}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
var aTypes, bTypes []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
aTypes = append(aTypes, a.Type)
}
if a.Host == "2.2.2.2" {
bTypes = append(bTypes, a.Type)
}
}
assert.Contains(t, aTypes, ActionRecreatePodmanNet, "host A must recreate (dns_enabled=true)")
assert.NotContains(t, aTypes, ActionCreatePodmanNet, "host A already exists — only recreate")
assert.NotContains(t, bTypes, ActionRecreatePodmanNet, "host B fine, no recreate")
assert.NotContains(t, bTypes, ActionCreatePodmanNet, "host B fine, no create")
}
func TestBuildPlan_FirewallMissing(t *testing.T) {
desired := desiredWithPodman()
srvA := convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24")
srvA.FirewallActive = false
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": srvA,
"2.2.2.2": convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24"),
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
var aTypes []ActionType
for _, a := range plan.Actions {
if a.Host == "1.1.1.1" {
aTypes = append(aTypes, a.Type)
}
}
assert.Equal(t, []ActionType{ActionInstallFirewall}, aTypes)
}
func TestBuildPlan_NftUnavailable_ReturnsError(t *testing.T) {
desired := desiredWithPodman()
desired.DefaultDenyContainers = true
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {
Host: "1.1.1.1",
NftAvailable: false,
},
"2.2.2.2": {
Host: "2.2.2.2",
NftAvailable: false,
},
},
}
_, err := BuildPlan(desired, current)
require.Error(t, err)
assert.Contains(t, err.Error(), "nft binary not available")
}
func TestBuildPlan_DefaultDenyRequiresPodman(t *testing.T) {
desired := desiredTwoHosts()
desired.DefaultDenyContainers = true // InstallPodman left false
_, err := BuildPlan(desired, MeshState{Servers: map[string]*ServerState{}})
require.Error(t, err)
assert.Contains(t, err.Error(), "--default-deny requires --podman")
}
func TestBuildPlan_DefaultDenyDriftReinstalls(t *testing.T) {
desired := desiredWithPodman()
desired.DefaultDenyContainers = true
// Both hosts converged in mode A (default-deny OFF) — must reinstall to flip on.
srvA := convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24")
srvA.DefaultDenyActive = false
srvA.NftAvailable = true
srvB := convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24")
srvB.DefaultDenyActive = false
srvB.NftAvailable = true
current := MeshState{Servers: map[string]*ServerState{"1.1.1.1": srvA, "2.2.2.2": srvB}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
for _, h := range []string{"1.1.1.1", "2.2.2.2"} {
var found bool
for _, a := range plan.Actions {
if a.Host == h && a.Type == ActionInstallFirewall {
found = true
break
}
}
assert.True(t, found, "expected ActionInstallFirewall for %s", h)
}
}
func TestBuildPlan_DefaultDenyConverged(t *testing.T) {
desired := desiredWithPodman()
desired.DefaultDenyContainers = true
srvA := convergedServer("1.1.1.1", "AAAAAAAA=", "BBBBBBBB=", "100.64.0.1", "10.210.0.0/24")
srvA.DefaultDenyActive = true
srvA.NftAvailable = true
srvA.FirewallUnitSha256 = sha256Hex([]byte(FirewallServiceUnit("wg0",
[]string{"default"}, []*net.IPNet{mustParseCIDR("10.210.0.0/24")}, true)))
srvB := convergedServer("2.2.2.2", "BBBBBBBB=", "AAAAAAAA=", "100.64.0.2", "10.210.1.0/24")
srvB.DefaultDenyActive = true
srvB.NftAvailable = true
srvB.FirewallUnitSha256 = sha256Hex([]byte(FirewallServiceUnit("wg0",
[]string{"default"}, []*net.IPNet{mustParseCIDR("10.210.1.0/24")}, true)))
current := MeshState{Servers: map[string]*ServerState{"1.1.1.1": srvA, "2.2.2.2": srvB}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.True(t, plan.IsEmpty(), "expected empty plan, got: %+v", plan.Actions)
}
func TestBuildPlan_SurfacesWarnings(t *testing.T) {
desired := desiredTwoHosts()
current := MeshState{
Servers: map[string]*ServerState{
"1.1.1.1": {Host: "1.1.1.1", WireGuardMgmtIP: net.ParseIP("100.64.0.5").To4()},
"2.2.2.2": {Host: "2.2.2.2", WireGuardMgmtIP: net.ParseIP("100.64.0.5").To4()},
},
}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
assert.NotEmpty(t, plan.Warnings, "expected warning for duplicate mgmt IP")
}
func TestBuildPlan_MultiNamespacePlansPerNamespace(t *testing.T) {
desired := desiredWithPodman()
desired.Namespaces = []string{DefaultNamespace, "alpha"}
current := MeshState{Servers: map[string]*ServerState{}}
plan, err := BuildPlan(desired, current)
require.NoError(t, err)
// Two hosts × two namespaces = four create-podman-net actions.
var creates []PlannedAction
for _, a := range plan.Actions {
if a.Type == ActionCreatePodmanNet {
creates = append(creates, a)
}
}
assert.Len(t, creates, 4)
namespaces := map[string]bool{}
for _, a := range creates {
namespaces[a.Namespace] = true
}
assert.True(t, namespaces[DefaultNamespace])
assert.True(t, namespaces["alpha"])
// SubnetAssignments is namespace → host → subnet.
assert.NotNil(t, plan.SubnetAssignments[DefaultNamespace])
assert.NotNil(t, plan.SubnetAssignments["alpha"])
assert.NotEqual(t, plan.SubnetAssignments[DefaultNamespace]["1.1.1.1"].String(),
plan.SubnetAssignments["alpha"]["1.1.1.1"].String(),
"namespaces must carve disjoint subnets")
}
func TestBuildPlan_PodmanRequiresNamespace(t *testing.T) {
desired := desiredTwoHosts()
desired.InstallPodman = true
// no namespaces set
_, err := BuildPlan(desired, MeshState{Servers: map[string]*ServerState{}})
require.Error(t, err)
assert.Contains(t, err.Error(), "namespace")
}
func TestBinaryVersionDrift(t *testing.T) {
tests := []struct {
name string
desiredVersion string
installed bool
haveVersion string
wantDrift bool
}{
{"not installed", "nightly", false, "", true},
{"installed no marker", "nightly", true, "", true},
{"nightly always drifts", "nightly", true, "nightly", true},
{"pinned matches", "v1.2.3", true, "v1.2.3", false},
{"pinned mismatch", "v1.2.4", true, "v1.2.3", true},
{"pinned no marker", "v1.2.3", true, "", true},
}
for _, tt := range tests {
t.Run(tt.name, func(t *testing.T) {
got := binaryVersionDrift(tt.desiredVersion, tt.installed, tt.haveVersion)
assert.Equal(t, tt.wantDrift, got)
})
}
}
func TestBuildPlan_CooldVersionDrift(t *testing.T) {
desired := desiredWithPodman()
desired.InstallCoold = true
desired.CooldVersion = "v1.2.3"
desired.CorrosionVersion = "v1.2.3"
desired.CorrosionGossipPort = 8787
desired.CorrosionAPIPort = 8080
host := "1.1.1.1"
sn := mustParseCIDR("10.210.0.0/24")
fwHash := sha256Hex([]byte(FirewallServiceUnit("wg0", []string{"default"}, []*net.IPNet{sn}, false)))
state := &ServerState{
Host: host, Installed: true, KeysExist: true, Active: true,
PodmanInstalled: true, PodmanSocketActive: true, IPForwardEnabled: true,
FirewallActive: true, DefaultDenyActive: false, FirewallUnitSha256: fwHash,
CorrosionInstalled: true, CooldInstalled: true,
CorrosionVersion: "v1.2.3", CooldVersion: "v1.2.2", // coold is stale
CorrosionActive: true, CooldActive: true,
Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace, NetworkExists: true, ContainerSubnet: sn, Label: DefaultNamespace},
},
}
plan, err := BuildPlan(desired, MeshState{Servers: map[string]*ServerState{host: state}})
require.NoError(t, err)
types := make(map[ActionType]bool)
for _, a := range plan.Actions {
if a.Host == host {
types[a.Type] = true
}
}
assert.True(t, types[ActionInstallCoold], "stale coold version should trigger install-coold")
assert.False(t, types[ActionInstallCorrosion], "matching corrosion version should not trigger install")
}
func TestBuildPlan_CooldNightlyAlwaysDrifts(t *testing.T) {
desired := desiredWithPodman()
desired.InstallCoold = true
desired.CooldVersion = "nightly"
desired.CorrosionVersion = "nightly"
desired.CorrosionGossipPort = 8787
desired.CorrosionAPIPort = 8080
host := "1.1.1.1"
sn := mustParseCIDR("10.210.0.0/24")
fwHash := sha256Hex([]byte(FirewallServiceUnit("wg0", []string{"default"}, []*net.IPNet{sn}, false)))
state := &ServerState{
Host: host, Installed: true, KeysExist: true, Active: true,
PodmanInstalled: true, PodmanSocketActive: true, IPForwardEnabled: true,
FirewallActive: true, DefaultDenyActive: false, FirewallUnitSha256: fwHash,
CorrosionInstalled: true, CooldInstalled: true,
CorrosionVersion: "nightly", CooldVersion: "nightly",
CorrosionActive: true, CooldActive: true,
Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace, NetworkExists: true, ContainerSubnet: sn, Label: DefaultNamespace},
},
}
plan, err := BuildPlan(desired, MeshState{Servers: map[string]*ServerState{host: state}})
require.NoError(t, err)
types := make(map[ActionType]bool)
for _, a := range plan.Actions {
if a.Host == host {
types[a.Type] = true
}
}
assert.True(t, types[ActionInstallCoold], "nightly tag always triggers install-coold")
assert.True(t, types[ActionInstallCorrosion], "nightly tag always triggers install-corrosion")
}
+318
View File
@@ -0,0 +1,318 @@
package wireguard
import (
"context"
"fmt"
"net"
"strconv"
"strings"
"github.com/coollabsio/coolify-cli/internal/ssh"
)
// Probe SSHes into host and reads its current WireGuard + Podman state.
// All commands use `|| true` so a missing package or interface never
// causes a non-zero exit that would abort the probe.
func Probe(ctx context.Context, runner ssh.Runner, host, user string, port int, iface string, namespaces []string) (*ServerState, error) {
state := &ServerState{
Host: host,
Interface: iface,
Namespaces: map[string]*NamespaceServerState{},
}
// 1. Check if WireGuard is installed.
stdout, _, _ := runner.Run(ctx, host, user, port,
`dpkg-query -W -f='${Status}' wireguard 2>/dev/null | grep -c 'install ok installed' || echo 0`)
if strings.TrimSpace(stdout) == "1" {
state.Installed = true
}
// 2. Read public key.
stdout, _, _ = runner.Run(ctx, host, user, port,
`cat /etc/wireguard/publickey 2>/dev/null || true`)
if pk := strings.TrimSpace(stdout); pk != "" {
state.PublicKey = pk
state.KeysExist = true
}
// 3. Parse the config file for management IP and peer list.
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`cat /etc/wireguard/%s.conf 2>/dev/null || true`, iface))
if strings.TrimSpace(stdout) != "" {
parseConfigFile(state, stdout)
}
// 4. Check if WG interface is currently up.
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`wg show %s dump 2>/dev/null || true`, iface))
if strings.TrimSpace(stdout) != "" {
state.Active = true
}
// 5. Podman package installed.
stdout, _, _ = runner.Run(ctx, host, user, port,
`dpkg-query -W -f='${Status}' podman 2>/dev/null | grep -c 'install ok installed' || echo 0`)
if strings.TrimSpace(stdout) == "1" {
state.PodmanInstalled = true
}
// 6. podman.socket active.
stdout, _, _ = runner.Run(ctx, host, user, port,
`systemctl is-active podman.socket 2>/dev/null || true`)
if strings.TrimSpace(stdout) == "active" {
state.PodmanSocketActive = true
}
// 7. Per-namespace podman network state.
if state.PodmanInstalled {
for _, ns := range namespaces {
nss := &NamespaceServerState{Namespace: ns}
netName := PodmanNetworkFor(ns)
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`podman network exists %s 2>/dev/null && echo yes || echo no`, netName))
if strings.TrimSpace(stdout) == "yes" {
nss.NetworkExists = true
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`podman network inspect %s -f '{{(index .Subnets 0).Subnet}}' 2>/dev/null || true`, netName))
if s := strings.TrimSpace(stdout); s != "" {
if _, n, err := net.ParseCIDR(s); err == nil {
nss.ContainerSubnet = n
}
}
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`podman network inspect %s -f '{{.DNSEnabled}}' 2>/dev/null || true`, netName))
if strings.TrimSpace(stdout) == "true" {
nss.DNSEnabled = true
}
stdout, _, _ = runner.Run(ctx, host, user, port,
fmt.Sprintf(`podman network inspect %s -f '{{index .Labels "io.coolify.namespace"}}' 2>/dev/null || true`, netName))
nss.Label = strings.TrimSpace(stdout)
}
state.Namespaces[ns] = nss
}
}
// 8. IP forwarding enabled.
stdout, _, _ = runner.Run(ctx, host, user, port,
`sysctl -n net.ipv4.ip_forward 2>/dev/null || echo 0`)
if strings.TrimSpace(stdout) == "1" {
state.IPForwardEnabled = true
}
// 9. coolify-mesh-fw.service active.
stdout, _, _ = runner.Run(ctx, host, user, port,
`systemctl is-active coolify-mesh-fw.service 2>/dev/null || true`)
if strings.TrimSpace(stdout) == "active" {
state.FirewallActive = true
}
// 9a. Firewall unit hash — detects drift when the desired namespace set
// changes (FORWARD jumps gain/lose subnets).
stdout, _, _ = runner.Run(ctx, host, user, port,
`sha256sum /etc/systemd/system/coolify-mesh-fw.service 2>/dev/null | awk '{print $1}' || true`)
if h := strings.TrimSpace(stdout); h != "" {
state.FirewallUnitSha256 = h
}
// 10. Default-deny scaffold present (COOLIFY-INTRA chain ends in DROP).
stdout, _, _ = runner.Run(ctx, host, user, port,
`iptables -nL COOLIFY-INTRA 2>/dev/null | grep -q DROP && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.DefaultDenyActive = true // will be AND-ed with BridgeTableExists below
}
// 10a. nft binary available.
stdout, _, _ = runner.Run(ctx, host, user, port,
`command -v nft >/dev/null 2>&1 && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.NftAvailable = true
}
// 10b. nft bridge table for intra-namespace default-deny present.
stdout, _, _ = runner.Run(ctx, host, user, port,
`nft list table bridge coolify_bridge >/dev/null 2>&1 && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.BridgeTableExists = true
}
state.DefaultDenyActive = state.DefaultDenyActive && state.BridgeTableExists
// 11. Corrosion binary installed.
stdout, _, _ = runner.Run(ctx, host, user, port,
`test -x /usr/local/bin/corrosion && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.CorrosionInstalled = true
}
// 12. Corrosion systemd service active.
stdout, _, _ = runner.Run(ctx, host, user, port,
`systemctl is-active corrosion 2>/dev/null || true`)
if strings.TrimSpace(stdout) == "active" {
state.CorrosionActive = true
}
// 13. Corrosion config hash (empty when missing).
stdout, _, _ = runner.Run(ctx, host, user, port,
`sha256sum /etc/corrosion/config.toml 2>/dev/null | awk '{print $1}' || true`)
if h := strings.TrimSpace(stdout); h != "" {
state.CorrosionConfigHash = h
}
// 14. Corrosion schema file present.
stdout, _, _ = runner.Run(ctx, host, user, port,
`test -f /etc/corrosion/schemas/coolify.sql && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.CorrosionSchemaExists = true
}
// 14a. sha256 of remote schema file (empty when absent). Used to detect
// schema revisions so a new schema triggers re-write + DB reset.
stdout, _, _ = runner.Run(ctx, host, user, port,
`sha256sum /etc/corrosion/schemas/coolify.sql 2>/dev/null | awk '{print $1}' || true`)
if h := strings.TrimSpace(stdout); h != "" {
state.CorrosionSchemaSha256 = h
}
// 15. Coold binary installed.
stdout, _, _ = runner.Run(ctx, host, user, port,
`test -x /usr/local/bin/coold && echo yes || echo no`)
if strings.TrimSpace(stdout) == "yes" {
state.CooldInstalled = true
}
// 15a. version marker for corrosion (empty when absent / pre-migration).
stdout, _, _ = runner.Run(ctx, host, user, port,
`cat /usr/local/bin/corrosion.version 2>/dev/null || true`)
state.CorrosionVersion = strings.TrimSpace(stdout)
// 15b. version marker for coold (empty when absent / pre-migration).
stdout, _, _ = runner.Run(ctx, host, user, port,
`cat /usr/local/bin/coold.version 2>/dev/null || true`)
state.CooldVersion = strings.TrimSpace(stdout)
// 15c. sha256 of remote coold.service unit (empty when absent).
stdout, _, _ = runner.Run(ctx, host, user, port,
`sha256sum /etc/systemd/system/coold.service 2>/dev/null | awk '{print $1}' || true`)
if h := strings.TrimSpace(stdout); h != "" {
state.CooldUnitSha256 = h
}
// 16. Coold systemd service active.
stdout, _, _ = runner.Run(ctx, host, user, port,
`systemctl is-active coold 2>/dev/null || true`)
if strings.TrimSpace(stdout) == "active" {
state.CooldActive = true
}
return state, nil
}
// Reconstruct runs Probe on every host in parallel and assembles a MeshState.
func Reconstruct(
ctx context.Context,
runner ssh.Runner,
hosts []string,
user string,
port int,
iface string,
namespaces []string,
concurrency int,
) (MeshState, error) {
results := ssh.ForEachServer(ctx, hosts, concurrency, func(ctx context.Context, host string) (*ServerState, error) {
return Probe(ctx, runner, host, user, port, iface, namespaces)
})
mesh := MeshState{Servers: make(map[string]*ServerState, len(hosts))}
var errs []string
for _, r := range results {
if r.Err != nil {
errs = append(errs, fmt.Sprintf("%s: %v", r.Host, r.Err))
mesh.Servers[r.Host] = &ServerState{Host: r.Host, Interface: iface, Namespaces: map[string]*NamespaceServerState{}}
continue
}
mesh.Servers[r.Host] = r.Result
}
if len(errs) > 0 {
return mesh, fmt.Errorf("probe errors:\n %s", strings.Join(errs, "\n "))
}
return mesh, nil
}
// parseConfigFile extracts WireGuard management IP, listen port, and peer list
// from the text content of /etc/wireguard/<iface>.conf.
func parseConfigFile(state *ServerState, content string) {
var (
inInterface bool
inPeer bool
currentPeer Peer
)
for _, line := range strings.Split(content, "\n") {
line = strings.TrimSpace(line)
if line == "" || strings.HasPrefix(line, "#") {
continue
}
switch strings.ToLower(line) {
case "[interface]":
inInterface = true
inPeer = false
continue
case "[peer]":
if inPeer {
state.Peers = append(state.Peers, currentPeer)
currentPeer = Peer{}
}
inInterface = false
inPeer = true
continue
}
key, value, ok := strings.Cut(line, "=")
if !ok {
continue
}
key = strings.TrimSpace(key)
value = strings.TrimSpace(value)
if inInterface {
switch strings.ToLower(key) {
case "address":
// Parse the host portion of "<ip>/<prefix>"; this is the
// actual management IP, not the network address.
ip, _, err := net.ParseCIDR(value)
if err == nil {
state.WireGuardMgmtIP = ip.To4()
}
case "listenport":
if p, err := strconv.Atoi(value); err == nil {
state.ListenPort = p
}
}
}
if inPeer {
switch strings.ToLower(key) {
case "publickey":
currentPeer.PublicKey = value
case "endpoint":
currentPeer.Endpoint = value
case "allowedips":
for _, a := range strings.Split(value, ",") {
currentPeer.AllowedIPs = append(currentPeer.AllowedIPs, strings.TrimSpace(a))
}
case "presharedkey":
currentPeer.PresharedKey = value
case "persistentkeepalive":
if n, err := strconv.Atoi(value); err == nil {
currentPeer.PersistentKeepalive = n
}
}
}
}
if inPeer && currentPeer.PublicKey != "" {
state.Peers = append(state.Peers, currentPeer)
}
}
+219
View File
@@ -0,0 +1,219 @@
package wireguard
import (
"context"
"os"
"path/filepath"
"strings"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
// fakeReconRunner is a deterministic ssh.Runner for reconstruct unit tests.
type fakeReconRunner struct {
responses map[string]string
}
func (f *fakeReconRunner) Run(_ context.Context, _, _ string, _ int, cmd string) (string, string, error) {
for substr, resp := range f.responses {
if strings.Contains(cmd, substr) {
return resp, "", nil
}
}
return "", "", nil
}
func readFixture(t *testing.T, name string) string {
t.Helper()
path := filepath.Join("..", "..", "test", "fixtures", "wg", name)
b, err := os.ReadFile(path)
require.NoError(t, err, "missing fixture %s", name)
return string(b)
}
func TestParseConfigFile_Full(t *testing.T) {
content := readFixture(t, "wg0.conf")
state := &ServerState{}
parseConfigFile(state, content)
require.NotNil(t, state.WireGuardMgmtIP)
assert.Equal(t, "100.64.0.1", state.WireGuardMgmtIP.String())
assert.Equal(t, 51820, state.ListenPort)
require.Len(t, state.Peers, 1)
assert.Equal(t, "BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBK=", state.Peers[0].PublicKey)
assert.Equal(t, "203.0.113.11:51820", state.Peers[0].Endpoint)
assert.Equal(t, 25, state.Peers[0].PersistentKeepalive)
}
func TestParseConfigFile_Empty(t *testing.T) {
state := &ServerState{}
parseConfigFile(state, "")
assert.Nil(t, state.WireGuardMgmtIP)
assert.Empty(t, state.Peers)
}
func TestParseConfigFile_MultiplePeers(t *testing.T) {
content := `[Interface]
Address = 100.64.0.1/32
ListenPort = 51820
PrivateKey = aaa
[Peer]
PublicKey = BBB=
AllowedIPs = 100.64.0.2/32, 10.210.1.0/24
Endpoint = 1.2.3.4:51820
PersistentKeepalive = 25
[Peer]
PublicKey = CCC=
AllowedIPs = 100.64.0.2/32, 10.210.2.0/24
Endpoint = 1.2.3.5:51820
PersistentKeepalive = 25
`
state := &ServerState{}
parseConfigFile(state, content)
require.Len(t, state.Peers, 2)
assert.Equal(t, "BBB=", state.Peers[0].PublicKey)
assert.Equal(t, "CCC=", state.Peers[1].PublicKey)
}
func TestParseConfigFile_IgnoresComments(t *testing.T) {
content := `# This is a comment
[Interface]
# Another comment
Address = 100.64.0.5/32
ListenPort = 51820
PrivateKey = xxx
`
state := &ServerState{}
parseConfigFile(state, content)
require.NotNil(t, state.WireGuardMgmtIP)
assert.Equal(t, "100.64.0.5", state.WireGuardMgmtIP.String())
assert.Empty(t, state.Peers)
}
func TestParseConfigFile_CaseInsensitiveKeys(t *testing.T) {
content := `[interface]
address = 100.64.0.10/32
listenport = 12345
privatekey = xxx
`
state := &ServerState{}
parseConfigFile(state, content)
require.NotNil(t, state.WireGuardMgmtIP)
assert.Equal(t, "100.64.0.10", state.WireGuardMgmtIP.String())
assert.Equal(t, 12345, state.ListenPort)
}
func TestMeshState_AssignedMgmtIPs(t *testing.T) {
mesh := MeshState{
Servers: map[string]*ServerState{
"a": {Host: "a", WireGuardMgmtIP: []byte{100, 64, 0, 1}},
"b": {Host: "b", WireGuardMgmtIP: nil},
"c": {Host: "c", WireGuardMgmtIP: []byte{100, 64, 0, 3}},
},
}
ips := mesh.AssignedMgmtIPs()
assert.Len(t, ips, 2)
assert.Contains(t, ips, "a")
assert.NotContains(t, ips, "b")
assert.Contains(t, ips, "c")
}
func TestMeshState_AssignedContainerSubnets(t *testing.T) {
mesh := MeshState{
Servers: map[string]*ServerState{
"a": {Host: "a", Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace, ContainerSubnet: mustParseCIDR("10.210.0.0/24")},
"alpha": {Namespace: "alpha", ContainerSubnet: mustParseCIDR("10.220.0.0/24")},
}},
"b": {Host: "b", Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace}, // ContainerSubnet nil
}},
"c": {Host: "c", Namespaces: map[string]*NamespaceServerState{
DefaultNamespace: {Namespace: DefaultNamespace, ContainerSubnet: mustParseCIDR("10.210.2.0/24")},
}},
},
}
subs := mesh.AssignedContainerSubnets()
// Nested: namespace → host → subnet.
assert.Contains(t, subs[DefaultNamespace], "a")
assert.NotContains(t, subs[DefaultNamespace], "b")
assert.Contains(t, subs[DefaultNamespace], "c")
assert.Contains(t, subs["alpha"], "a")
}
func TestTruncateKey(t *testing.T) {
tests := []struct {
input string
want string
}{
{"", ""},
{"short", "short"},
{"12345678", "12345678"},
{"123456789", "12345678..."},
{"AAAAAAAABBBBBBBB", "AAAAAAAA..."},
}
for _, tt := range tests {
assert.Equal(t, tt.want, truncateKey(tt.input), "input: %q", tt.input)
}
}
func TestProbe_NftAvailableAndBridgeTableExists_True(t *testing.T) {
runner := &fakeReconRunner{
responses: map[string]string{
"dpkg-query": "1\n",
"wg show": "",
"cat /etc/wireguard/": "",
"wg pubkey": "",
"ip -4 -o addr show": "",
"systemctl is-active wg-quick": "active\n",
"podman --version": "podman version 4.9.0\n",
"systemctl is-active podman.socket": "active\n",
"sysctl net.ipv4.ip_forward": "net.ipv4.ip_forward = 1\n",
"podman network inspect": `[{"name":"coolify-default-mesh","subnets":[{"subnet":"10.210.0.0/24","gateway":"10.210.0.1"}],"dns_enabled":false,"labels":{"io.coolify.managed":"true","io.coolify.namespace":"default"}}]` + "\n",
"systemctl is-active coolify-mesh-fw": "active\n",
"sha256sum /etc/systemd/system/coolify-mesh-fw.service": "",
"iptables -nL COOLIFY-INTRA": "yes\n",
"command -v nft": "yes\n",
"nft list table bridge coolify_bridge": "yes\n",
"test -x /usr/local/bin/corrosion": "yes\n",
"systemctl is-active corrosion": "active\n",
"sha256sum /etc/corrosion/config.toml": "",
"test -x /usr/local/bin/coold": "yes\n",
"systemctl is-active coold": "active\n",
"cat /etc/coolify/coold-version": "",
"cat /etc/coolify/corrosion-version": "",
},
}
state, err := Probe(context.Background(), runner, "1.1.1.1", "root", 22, "wg0", []string{"default"})
require.NoError(t, err)
assert.True(t, state.NftAvailable, "NftAvailable should be true")
assert.True(t, state.BridgeTableExists, "BridgeTableExists should be true")
// DefaultDenyActive = COOLIFY-INTRA DROP && BridgeTableExists
assert.True(t, state.DefaultDenyActive, "DefaultDenyActive should be true when both conditions met")
}
func TestProbe_NftNotAvailable_BridgeTableAbsent(t *testing.T) {
runner := &fakeReconRunner{
responses: map[string]string{
"dpkg-query": "1\n",
"iptables -nL COOLIFY-INTRA": "yes\n",
"command -v nft": "no\n",
"nft list table bridge coolify_bridge": "no\n",
},
}
state, err := Probe(context.Background(), runner, "1.1.1.1", "root", 22, "wg0", []string{"default"})
require.NoError(t, err)
assert.False(t, state.NftAvailable, "NftAvailable should be false")
assert.False(t, state.BridgeTableExists, "BridgeTableExists should be false")
// DefaultDenyActive must be false even though COOLIFY-INTRA has DROP
assert.False(t, state.DefaultDenyActive, "DefaultDenyActive should be false when BridgeTableExists is false")
}
+400
View File
@@ -0,0 +1,400 @@
// Package wireguard implements the WireGuard mesh bootstrap logic for
// the coolify init command (alpha, Coolify v5).
package wireguard
import (
"net"
"sort"
)
// DefaultNamespace is the namespace used when the user does not pass
// --namespaces. It is also always present even in a multi-namespace setup —
// coold's config assumes a `default` entry.
const DefaultNamespace = "default"
// PodmanNetworkFor returns the podman bridge name backing namespace ns on
// every host. Derived as `coolify-<ns>-mesh` so the namespace is visible
// directly in `podman network ls`.
func PodmanNetworkFor(ns string) string {
return "coolify-" + ns + "-mesh"
}
// Peer represents a single WireGuard peer as seen in the config or
// from `wg show <iface> dump`.
type Peer struct {
PublicKey string
PresharedKey string // "(none)" when absent
Endpoint string // "ip:port" or empty
AllowedIPs []string
LatestHandshake int64 // Unix timestamp; 0 means no handshake yet
PersistentKeepalive int // seconds; 0 means disabled
}
// NamespaceServerState captures per-namespace podman state on one host. A
// ServerState carries one entry per namespace in the desired set.
type NamespaceServerState struct {
// Namespace is the logical namespace name (e.g. "default", "alpha").
Namespace string
// NetworkExists is true when the per-namespace podman bridge
// (coolify-<ns>-mesh) already exists on this host.
NetworkExists bool
// ContainerSubnet is the /<prefix> owned by the per-namespace bridge
// (read from `podman network inspect`). nil when not yet created.
ContainerSubnet *net.IPNet
// DNSEnabled is true when the per-namespace network has `dns_enabled=true`
// (netavark auto-starts aardvark-dns on the bridge gateway:53). coold owns
// that socket, so drift triggers ActionRecreatePodmanNet.
DNSEnabled bool
// Label is the `io.coolify.namespace` label on the network. Used only as
// an assertion that the network was created by us — label mismatch is
// treated like "the network exists but is not ours" and triggers recreate.
Label string
}
// ServerState holds the reconstructed WireGuard + Podman state for one server.
// It is built from live SSH probes and never cached to disk.
type ServerState struct {
// Host is the SSH address used to reach this server.
// It also serves as the WireGuard Endpoint value for peer configs.
Host string
// Installed is true when the wireguard package is present.
Installed bool
// KeysExist is true when /etc/wireguard/privatekey exists.
KeysExist bool
// PublicKey is the content of /etc/wireguard/publickey (trimmed).
// Empty when KeysExist is false.
PublicKey string
// WireGuardMgmtIP is the /32 management IP assigned to wg0 (parsed from
// the [Interface] Address line). Lives outside the container pool so the
// Podman bridge can own the full per-host /24 without conflict.
// nil when not yet assigned.
WireGuardMgmtIP net.IP
// ListenPort is the WireGuard listen port from the config.
ListenPort int
// Interface is the WireGuard interface name (e.g., "wg0").
Interface string
// Active is true when `wg show <iface>` returns output (interface up).
Active bool
// Peers lists the peers currently present in the config file.
Peers []Peer
// PodmanInstalled is true when the podman package is present.
PodmanInstalled bool
// PodmanSocketActive is true when podman.socket systemd unit is active.
PodmanSocketActive bool
// Namespaces maps namespace name → per-namespace podman state on this
// host. Populated by Probe for every namespace in the desired set.
Namespaces map[string]*NamespaceServerState
// IPForwardEnabled is true when net.ipv4.ip_forward == 1.
IPForwardEnabled bool
// FirewallActive is true when coolify-mesh-fw.service is active.
FirewallActive bool
// DefaultDenyActive is true when the COOLIFY-INTRA chain exists and
// terminates in DROP (the default-deny scaffold is in place).
DefaultDenyActive bool
// FirewallUnitSha256 is the sha256 of /etc/systemd/system/coolify-mesh-fw.service
// (hex), or empty when absent. Used to detect unit drift when the desired
// set of namespace subnets changes.
FirewallUnitSha256 string
// BridgeTableExists is true when `nft list table bridge coolify_bridge`
// succeeds on this host (nft bridge-family deny scaffold is in place).
BridgeTableExists bool
// NftAvailable is true when `nft --version` exits 0 on this host.
NftAvailable bool
// CorrosionInstalled is true when /usr/local/bin/corrosion exists and is executable.
CorrosionInstalled bool
// CorrosionActive is true when the corrosion systemd service is active.
CorrosionActive bool
// CorrosionConfigHash is the sha256 of /etc/corrosion/config.toml, or empty
// when the file is absent. Used to detect drift when peer list changes.
CorrosionConfigHash string
// CorrosionSchemaExists is true when /etc/corrosion/schemas/coolify.sql exists.
CorrosionSchemaExists bool
// CorrosionSchemaSha256 is the sha256 of /etc/corrosion/schemas/coolify.sql
// (hex), or empty when absent. Used by BuildPlan to detect schema drift so
// a new schema revision triggers re-write + corrosion restart + DB reset.
CorrosionSchemaSha256 string
// CooldInstalled is true when /usr/local/bin/coold exists and is executable.
CooldInstalled bool
// CooldActive is true when the coold systemd service is active.
CooldActive bool
// CorrosionVersion is the content of /usr/local/bin/corrosion.version
// (trimmed), or empty when absent. Matches the version tag passed to
// CorrosionInstallCommand (e.g. "nightly", "v1.2.3").
CorrosionVersion string
// CooldVersion is the content of /usr/local/bin/coold.version (trimmed),
// or empty when absent.
CooldVersion string
// CooldUnitSha256 is the sha256 of /etc/systemd/system/coold.service (hex),
// or empty when absent. Used by BuildPlan to detect generator changes
// (e.g. Requires→Wants) that would otherwise be invisible.
CooldUnitSha256 string
}
// MeshState is the reconstructed state across all servers in the mesh.
type MeshState struct {
// Servers maps host → *ServerState.
Servers map[string]*ServerState
}
// AssignedMgmtIPs returns a map of host → net.IP for all servers that
// already have a WG management IP assigned.
func (m *MeshState) AssignedMgmtIPs() map[string]net.IP {
out := make(map[string]net.IP, len(m.Servers))
for host, s := range m.Servers {
if s.WireGuardMgmtIP != nil {
out[host] = s.WireGuardMgmtIP
}
}
return out
}
// AssignedContainerSubnets returns the per-(namespace, host) subnets that are
// already assigned on remote podman networks. The result is nested:
// `out[namespace][host] = subnet`.
func (m *MeshState) AssignedContainerSubnets() map[string]map[string]*net.IPNet {
out := map[string]map[string]*net.IPNet{}
for host, s := range m.Servers {
if s == nil {
continue
}
for ns, nss := range s.Namespaces {
if nss == nil || nss.ContainerSubnet == nil {
continue
}
if out[ns] == nil {
out[ns] = map[string]*net.IPNet{}
}
out[ns][host] = nss.ContainerSubnet
}
}
return out
}
// FirewallSubnets returns the sorted-by-namespace list of this host's
// container subnets across all namespaces (one /prefix per namespace). Used
// by the firewall service unit generator.
func (s *ServerState) FirewallSubnets() []*net.IPNet {
var out []*net.IPNet
names := make([]string, 0, len(s.Namespaces))
for n := range s.Namespaces {
names = append(names, n)
}
sort.Strings(names)
for _, n := range names {
if ns := s.Namespaces[n]; ns != nil && ns.ContainerSubnet != nil {
out = append(out, ns.ContainerSubnet)
}
}
return out
}
// DesiredMesh describes the target WireGuard + Podman configuration.
type DesiredMesh struct {
// Hosts lists the SSH addresses of all servers (also used as WG endpoints).
Hosts []string
// Interface is the WireGuard interface name (default "wg0").
Interface string
// MgmtPool is the address pool from which per-host /32 management IPs
// are carved and assigned to wg0 (default 100.64.0.0/16 — RFC 6598 CGNAT).
MgmtPool *net.IPNet
// ContainerPool is the address pool from which per-(namespace, host)
// container subnets are carved (default 10.210.0.0/16). One pool is
// shared across all namespaces so subnets cannot overlap.
ContainerPool *net.IPNet
// ContainerPrefix is the prefix length of each per-host, per-namespace
// container subnet (default 24, giving each host 254 usable container IPs
// per namespace).
ContainerPrefix int
// ListenPort is the WireGuard UDP listen port (default 51820).
ListenPort int
// InstallPodman, when true, installs Podman, enables its socket, creates
// the per-namespace bridge networks, installs firewall rules, and enables
// IP forwarding on each server.
InstallPodman bool
// Namespaces lists every namespace the mesh should carry. Ordered —
// deterministic iteration produces stable subnet assignments. At least
// one entry (typically "default") is always expected.
Namespaces []string
// DefaultDenyContainers, when true (and InstallPodman is true), installs
// default-deny iptables rules for ALL container traffic on the host's
// container subnets (intra-host AND cross-host via wg0). The v5 control
// plane manages the explicit allow-list in the COOLIFY-ALLOW chain.
DefaultDenyContainers bool
// InstallCoold, when true, downloads corrosion + coold from GitHub releases
// to each host, writes their configs/unit files, and enables both services.
// Requires InstallPodman (coold depends on podman.socket).
InstallCoold bool
// CooldVersion is the release tag to download (e.g. "nightly", "v1.2.3").
CooldVersion string
// CorrosionVersion is the release tag to download for corrosion.
CorrosionVersion string
// CorrosionGossipPort is the SWIM gossip UDP port (default 8787).
CorrosionGossipPort int
// CorrosionAPIPort is the corrosion HTTP API port bound to 127.0.0.1 (default 8080).
CorrosionAPIPort int
// CentralHost is the SSH address of the central VM that runs scheduler.
// Empty string disables phases 4+5 (scheduler setup).
// Must be an element of Hosts.
CentralHost string
// SchedulerVersion is the release tag for scheduler (e.g. "nightly").
SchedulerVersion string
// EnableBuilder, when true and BuilderHosts is empty, installs buildah/
// git and the builder binary on every host in Hosts and advertises
// "builder" in each host's JWT caps claim. When BuilderHosts is non-
// empty it wins and EnableBuilder is ignored. Requires a non-empty
// CentralHost (scheduler issues the JWT) and InstallPodman (buildah needs
// podman's containers-storage).
EnableBuilder bool
// BuilderHosts is the explicit list of hosts that should carry the
// builder capability. Empty slice means "fall back to EnableBuilder".
// Hosts not present in this set get `caps:["coold"]` only and the
// builder binary is not installed on them.
BuilderHosts []string
// BuilderCapacity caps concurrent builds per host. 0 falls back to 2 (the
// coold builder adapter's own default).
BuilderCapacity int
// BuilderCPUQuota is the systemd CPUQuota applied to each build subprocess
// (e.g. "200%" for two full cores). Empty string falls back to coold's
// own default ("200%").
BuilderCPUQuota string
// BuilderMemoryMax is the systemd MemoryMax applied to each build
// subprocess (e.g. "2G"). Empty string falls back to coold's own default
// ("2G").
BuilderMemoryMax string
// BuilderTimeoutSecs is the hard per-build wall-clock timeout in seconds.
// 0 falls back to coold's own default (1800).
BuilderTimeoutSecs int
// Intent selects the action filter applied after BuildPlan computes the
// raw action list. IntentBootstrap (default, zero value) emits every
// applicable action (today's behavior). IntentExtend limits destructive
// and version-bump actions to NewHosts only; existing hosts get just the
// peer-refresh actions required to route traffic to the new peer.
// IntentUpgrade emits only binary-fetch + service-restart actions
// cluster-wide.
Intent Intent
// NewHosts is the subset of Hosts that are brand-new to the mesh on this
// run. Only meaningful when Intent == IntentExtend. Empty = treat every
// host as existing (no-op safe mode).
NewHosts []string
// AllowReplace unlocks destructive-replace actions on existing hosts in
// extend mode (e.g. ActionRecreatePodmanNet). Never unlocks the wipe-DB
// branch of ActionWriteCorrosionSchema.
AllowReplace bool
// AllowNightly lets the upgrade intent accept version tag "nightly".
// Upgrade mode otherwise rejects nightly because it forces a re-install
// on every run instead of only when the pinned version changes.
AllowNightly bool
}
// Intent selects the action filter applied by BuildPlan to match the caller's
// operation (first-time bootstrap vs. adding servers vs. bumping agent
// versions). See DesiredMesh.Intent.
type Intent string
const (
// IntentBootstrap allows every action. Matches pre-subcommand-split
// behavior and is the default for DesiredMesh (zero value).
IntentBootstrap Intent = ""
// IntentExtend runs the full install on hosts in NewHosts and limits
// existing hosts to peer-refresh actions (WG config rewrite + service
// reload + corrosion config rewrite + firewall unit reinstall on drift).
IntentExtend Intent = "extend"
// IntentUpgrade only emits binary-fetch actions + the service-restart
// actions that follow them.
IntentUpgrade Intent = "upgrade"
)
// BuilderHostSet returns the set of hosts that should carry the builder
// capability given EnableBuilder + BuilderHosts. Hosts in the result are a
// subset of Hosts.
func (d *DesiredMesh) BuilderHostSet() map[string]bool {
set := make(map[string]bool, len(d.Hosts))
if len(d.BuilderHosts) > 0 {
allow := make(map[string]struct{}, len(d.BuilderHosts))
for _, h := range d.BuilderHosts {
allow[h] = struct{}{}
}
for _, h := range d.Hosts {
if _, ok := allow[h]; ok {
set[h] = true
}
}
return set
}
if d.EnableBuilder {
for _, h := range d.Hosts {
set[h] = true
}
}
return set
}
// HasBuilderCap reports whether host should advertise the builder capability.
func (d *DesiredMesh) HasBuilderCap(host string) bool {
return d.BuilderHostSet()[host]
}
// SortedNamespaces returns the desired namespaces in deterministic order.
func (d *DesiredMesh) SortedNamespaces() []string {
out := append([]string(nil), d.Namespaces...)
sort.Strings(out)
return out
}
+88
View File
@@ -0,0 +1,88 @@
package wireguard
import (
"reflect"
"sort"
"testing"
)
func TestBuilderHostSet_EnableBuilderAppliesToAll(t *testing.T) {
d := &DesiredMesh{
Hosts: []string{"a", "b", "c"},
EnableBuilder: true,
}
got := d.BuilderHostSet()
want := map[string]bool{"a": true, "b": true, "c": true}
if !reflect.DeepEqual(got, want) {
t.Fatalf("want %v, got %v", want, got)
}
}
func TestBuilderHostSet_ExplicitListWinsOverEnable(t *testing.T) {
d := &DesiredMesh{
Hosts: []string{"a", "b", "c"},
EnableBuilder: true,
BuilderHosts: []string{"b"},
}
got := d.BuilderHostSet()
want := map[string]bool{"b": true}
if !reflect.DeepEqual(got, want) {
t.Fatalf("want %v, got %v", want, got)
}
}
func TestBuilderHostSet_FiltersToServersOnly(t *testing.T) {
// A --builder-hosts entry not present in --servers is dropped.
d := &DesiredMesh{
Hosts: []string{"a", "b"},
BuilderHosts: []string{"a", "z"},
}
got := d.BuilderHostSet()
want := map[string]bool{"a": true}
if !reflect.DeepEqual(got, want) {
t.Fatalf("want %v, got %v", want, got)
}
}
func TestBuilderHostSet_DefaultDisabled(t *testing.T) {
d := &DesiredMesh{Hosts: []string{"a"}}
if len(d.BuilderHostSet()) != 0 {
t.Fatalf("want empty set, got %v", d.BuilderHostSet())
}
if d.HasBuilderCap("a") {
t.Fatalf("HasBuilderCap should be false by default")
}
}
func TestBuilderHostSet_EnableBuilderFalse_NoBuilderHosts(t *testing.T) {
d := &DesiredMesh{
Hosts: []string{"a"},
EnableBuilder: false,
}
if len(d.BuilderHostSet()) != 0 {
t.Fatalf("want empty set, got %v", d.BuilderHostSet())
}
}
func TestBuilderHostSet_Stable(t *testing.T) {
// Test that calling twice produces the same set (sanity — no side effects).
d := &DesiredMesh{
Hosts: []string{"a", "b"},
BuilderHosts: []string{"a"},
}
a := keys(d.BuilderHostSet())
b := keys(d.BuilderHostSet())
sort.Strings(a)
sort.Strings(b)
if !reflect.DeepEqual(a, b) {
t.Fatalf("unstable: %v vs %v", a, b)
}
}
func keys(m map[string]bool) []string {
out := make([]string, 0, len(m))
for k := range m {
out = append(out, k)
}
return out
}
+341
View File
@@ -0,0 +1,341 @@
package wireguard
import (
"encoding/binary"
"fmt"
"net"
"sort"
)
// Warning describes a non-fatal conflict discovered during IP allocation.
type Warning struct {
Host string
Reason string
}
// MachineIP returns the host address within a per-host subnet — the first
// usable IP (network address + 1). For example, 10.210.5.0/24 → 10.210.5.1.
//
// Used for the Podman bridge gateway. WireGuard does NOT use this — wg0
// gets a separate /32 from the management pool (see AllocateMgmtIPs).
func MachineIP(subnet *net.IPNet) net.IP {
return uint32ToIP(ipToUint32(subnet.IP.To4()) + 1)
}
// Allocate assigns a per-host subnet (of size hostPrefix) to every host in
// hosts, carving them from pool.
//
// Rules:
// - Duplicate host names in hosts → error (user input bug).
// - Existing subnet within pool with correct prefix → kept unchanged (stable).
// - Existing subnet outside pool or wrong prefix → warning, reassign.
// - Two existing hosts with the same subnet → first (alphabetical) kept,
// second gets a warning and is reassigned.
// - New hosts receive the lowest free subnet in pool.
//
// Returns (assignments, warnings, error).
func Allocate(
pool *net.IPNet,
hostPrefix int,
existing map[string]*net.IPNet,
hosts []string,
) (map[string]*net.IPNet, []Warning, error) {
// 1. Dedup hosts.
hostCount := make(map[string]int, len(hosts))
for _, h := range hosts {
hostCount[h]++
}
for h, n := range hostCount {
if n > 1 {
return nil, nil, fmt.Errorf("duplicate host in --servers: %s", h)
}
}
pool4 := pool.IP.To4()
if pool4 == nil {
return nil, nil, fmt.Errorf("only IPv4 pools are supported")
}
result := make(map[string]*net.IPNet, len(hosts))
usedNetworks := make(map[uint32]bool)
var warnings []Warning
subnetClaim := make(map[uint32]string)
// 2. Seed from existing — sorted for deterministic conflict resolution.
existingHosts := make([]string, 0, len(existing))
for h := range existing {
existingHosts = append(existingHosts, h)
}
sort.Strings(existingHosts)
// Pool bounds (used for both validation and iteration).
pool4Network := ipToUint32(pool4)
poolOnes, poolBits := pool.Mask.Size()
poolHostBits := poolBits - poolOnes
pool4Broadcast := pool4Network | (uint32(1)<<uint(poolHostBits) - 1)
for _, host := range existingHosts {
subnet := existing[host]
if subnet == nil {
continue
}
subnet4 := subnet.IP.To4()
ones, _ := subnet.Mask.Size()
if subnet4 == nil || !pool.Contains(subnet4) || ones != hostPrefix {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("existing subnet %s is not a /%d inside pool %s, reassigning", subnet, hostPrefix, pool),
})
continue
}
networkU32 := ipToUint32(subnet4)
// For /32 mgmt IPs, reject pool's network address (.0) and broadcast
// (.255.255) — many tools refuse them as host addresses.
if hostPrefix == 32 && (networkU32 == pool4Network || networkU32 == pool4Broadcast) {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("existing mgmt IP %s is the pool network or broadcast address, reassigning", subnet4),
})
continue
}
if claimant, exists := subnetClaim[networkU32]; exists {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("duplicate subnet %s (already claimed by %s), reassigning", subnet, claimant),
})
continue
}
subnetClaim[networkU32] = host
usedNetworks[networkU32] = true
result[host] = cloneIPNet(subnet)
}
// 3. Iterate the pool to assign new hosts.
hostSubnetSize := 32 - hostPrefix
step := uint32(1) << uint(hostSubnetSize)
nextFreeSubnet := func() (*net.IPNet, error) {
// For /32 allocations (mgmt IPs), skip both the pool network address
// (.0) and the pool broadcast address (.255.255) since many tools
// refuse them as host IPs. For larger subnets (e.g. /24), the bridge
// inside the subnet handles its own .0/.broadcast — we only need to
// not start the iterator at the broadcast itself.
start := pool4Network
end := pool4Broadcast
if hostPrefix == 32 {
start = pool4Network + 1
// end stays at broadcast; loop is u < end so broadcast is excluded.
}
for u := start; u < end; u += step {
if !usedNetworks[u] {
mask := net.CIDRMask(hostPrefix, 32)
return &net.IPNet{IP: uint32ToIP(u), Mask: mask}, nil
}
}
return nil, fmt.Errorf("pool %s is exhausted (no free /%d subnets)", pool, hostPrefix)
}
for _, host := range hosts {
if _, already := result[host]; already {
continue
}
subnet, err := nextFreeSubnet()
if err != nil {
return nil, warnings, fmt.Errorf("allocating subnet for %s: %w", host, err)
}
usedNetworks[ipToUint32(subnet.IP.To4())] = true
result[host] = subnet
}
return result, warnings, nil
}
// AllocateNamespaced assigns a per-host /<hostPrefix> subnet for every
// (namespace, host) pair in `namespaces × hosts`, carving them from a single
// shared pool. Stable: existing valid assignments are preserved so re-runs
// reproduce the same subnets. Invalid or duplicate existing assignments
// produce a warning and get reassigned to the next free block.
//
// Iteration order is deterministic (namespaces then hosts as passed in),
// which keeps warnings and subnet layout reproducible for tests.
//
// Returns nested map[namespace][host] = *net.IPNet.
func AllocateNamespaced(
pool *net.IPNet,
hostPrefix int,
existing map[string]map[string]*net.IPNet,
namespaces []string,
hosts []string,
) (map[string]map[string]*net.IPNet, []Warning, error) {
pool4 := pool.IP.To4()
if pool4 == nil {
return nil, nil, fmt.Errorf("only IPv4 pools are supported")
}
// Dedup hosts (user input bug).
hostCount := make(map[string]int, len(hosts))
for _, h := range hosts {
hostCount[h]++
}
for h, n := range hostCount {
if n > 1 {
return nil, nil, fmt.Errorf("duplicate host in --servers: %s", h)
}
}
// Dedup namespaces.
nsCount := make(map[string]int, len(namespaces))
for _, ns := range namespaces {
nsCount[ns]++
}
for ns, n := range nsCount {
if n > 1 {
return nil, nil, fmt.Errorf("duplicate namespace in --namespaces: %s", ns)
}
}
pool4Network := ipToUint32(pool4)
poolOnes, poolBits := pool.Mask.Size()
poolHostBits := poolBits - poolOnes
pool4Broadcast := pool4Network | (uint32(1)<<uint(poolHostBits) - 1)
result := make(map[string]map[string]*net.IPNet, len(namespaces))
for _, ns := range namespaces {
result[ns] = make(map[string]*net.IPNet, len(hosts))
}
usedNetworks := make(map[uint32]bool)
subnetClaim := make(map[uint32]string) // "ns/host" for conflict messages
var warnings []Warning
// 1. Seed from existing assignments in deterministic order.
nsSorted := append([]string(nil), namespaces...)
sort.Strings(nsSorted)
for _, ns := range nsSorted {
hostMap, ok := existing[ns]
if !ok {
continue
}
hostKeys := make([]string, 0, len(hostMap))
for h := range hostMap {
hostKeys = append(hostKeys, h)
}
sort.Strings(hostKeys)
for _, host := range hostKeys {
subnet := hostMap[host]
if subnet == nil {
continue
}
subnet4 := subnet.IP.To4()
ones, _ := subnet.Mask.Size()
if subnet4 == nil || !pool.Contains(subnet4) || ones != hostPrefix {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("existing subnet %s in namespace %q is not a /%d inside pool %s, reassigning", subnet, ns, hostPrefix, pool),
})
continue
}
networkU32 := ipToUint32(subnet4)
if claimant, dup := subnetClaim[networkU32]; dup {
warnings = append(warnings, Warning{
Host: host,
Reason: fmt.Sprintf("duplicate subnet %s in namespace %q (already claimed by %s), reassigning", subnet, ns, claimant),
})
continue
}
subnetClaim[networkU32] = ns + "/" + host
usedNetworks[networkU32] = true
result[ns][host] = cloneIPNet(subnet)
}
}
// 2. Assign remaining (ns, host) pairs in input order.
hostSubnetSize := 32 - hostPrefix
step := uint32(1) << uint(hostSubnetSize)
nextFree := func() (*net.IPNet, error) {
for u := pool4Network; u < pool4Broadcast; u += step {
if !usedNetworks[u] {
return &net.IPNet{IP: uint32ToIP(u), Mask: net.CIDRMask(hostPrefix, 32)}, nil
}
}
return nil, fmt.Errorf("pool %s is exhausted (no free /%d subnets)", pool, hostPrefix)
}
for _, ns := range namespaces {
for _, host := range hosts {
if _, ok := result[ns][host]; ok {
continue
}
subnet, err := nextFree()
if err != nil {
return nil, warnings, fmt.Errorf("allocating subnet for %s/%s: %w", ns, host, err)
}
u := ipToUint32(subnet.IP.To4())
usedNetworks[u] = true
subnetClaim[u] = ns + "/" + host
result[ns][host] = subnet
}
}
return result, warnings, nil
}
// AllocateMgmtIPs assigns a /32 management IP to every host in hosts from pool.
// Wraps Allocate by promoting/demoting between net.IP and *net.IPNet.
func AllocateMgmtIPs(
pool *net.IPNet,
existing map[string]net.IP,
hosts []string,
) (map[string]net.IP, []Warning, error) {
wrapped := make(map[string]*net.IPNet, len(existing))
for h, ip := range existing {
ip4 := ip.To4()
if ip4 == nil {
continue
}
wrapped[h] = &net.IPNet{IP: ip4, Mask: net.CIDRMask(32, 32)}
}
subnets, warns, err := Allocate(pool, 32, wrapped, hosts)
if err != nil {
return nil, warns, err
}
out := make(map[string]net.IP, len(subnets))
for h, n := range subnets {
out[h] = cloneIP(n.IP.To4())
}
return out, warns, nil
}
// ipToUint32 converts a 4-byte IP to a uint32 for arithmetic.
func ipToUint32(ip net.IP) uint32 {
return binary.BigEndian.Uint32(ip.To4())
}
// uint32ToIP converts a uint32 back to a net.IP.
func uint32ToIP(u uint32) net.IP {
ip := make(net.IP, 4)
binary.BigEndian.PutUint32(ip, u)
return ip
}
// cloneIP returns a copy of ip so that mutations don't affect the caller.
func cloneIP(ip net.IP) net.IP {
c := make(net.IP, len(ip))
copy(c, ip)
return c
}
// cloneIPNet returns a deep copy of n.
func cloneIPNet(n *net.IPNet) *net.IPNet {
return &net.IPNet{
IP: cloneIP(n.IP),
Mask: append(net.IPMask(nil), n.Mask...),
}
}
+223
View File
@@ -0,0 +1,223 @@
package wireguard
import (
"net"
"testing"
"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
)
func mustParseCIDR(s string) *net.IPNet {
_, n, err := net.ParseCIDR(s)
if err != nil {
panic(err)
}
return n
}
func TestMachineIP(t *testing.T) {
tests := []struct {
subnet string
want string
}{
{"10.210.0.0/24", "10.210.0.1"},
{"10.210.5.0/24", "10.210.5.1"},
{"10.210.255.0/24", "10.210.255.1"},
{"192.168.0.0/24", "192.168.0.1"},
}
for _, tt := range tests {
n := mustParseCIDR(tt.subnet)
got := MachineIP(n)
assert.Equal(t, tt.want, got.String(), "subnet=%s", tt.subnet)
}
}
func TestAllocateMgmtIPs_Basic(t *testing.T) {
pool := mustParseCIDR("100.64.0.0/16")
hosts := []string{"h1", "h2", "h3"}
got, warns, err := AllocateMgmtIPs(pool, nil, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
// Allocation skips pool network (.0.0) — starts at .0.1.
assert.Equal(t, "100.64.0.1", got["h1"].String())
assert.Equal(t, "100.64.0.2", got["h2"].String())
assert.Equal(t, "100.64.0.3", got["h3"].String())
}
func TestAllocateMgmtIPs_StableReuse(t *testing.T) {
pool := mustParseCIDR("100.64.0.0/16")
existing := map[string]net.IP{
"h1": net.ParseIP("100.64.0.42"),
}
hosts := []string{"h1", "h2"}
got, warns, err := AllocateMgmtIPs(pool, existing, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
assert.Equal(t, "100.64.0.42", got["h1"].String())
assert.Equal(t, "100.64.0.1", got["h2"].String())
}
func TestAllocateMgmtIPs_RejectsPoolNetworkAndBroadcast(t *testing.T) {
pool := mustParseCIDR("100.64.0.0/16")
existing := map[string]net.IP{
"hN": net.ParseIP("100.64.0.0"), // pool network
"hB": net.ParseIP("100.64.255.255"), // pool broadcast
}
hosts := []string{"hN", "hB"}
got, warns, err := AllocateMgmtIPs(pool, existing, hosts)
require.NoError(t, err)
assert.Len(t, warns, 2)
for _, h := range hosts {
ip := got[h].String()
assert.NotEqual(t, "100.64.0.0", ip, h)
assert.NotEqual(t, "100.64.255.255", ip, h)
}
}
func TestAllocateMgmtIPs_OutOfPool_Warns(t *testing.T) {
pool := mustParseCIDR("100.64.0.0/16")
existing := map[string]net.IP{
"h1": net.ParseIP("10.210.0.1"), // outside pool
}
hosts := []string{"h1"}
got, warns, err := AllocateMgmtIPs(pool, existing, hosts)
require.NoError(t, err)
require.Len(t, warns, 1)
assert.True(t, pool.Contains(got["h1"]), "reassigned IP must be inside pool")
}
func TestAllocate_PerHostSubnets(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
hosts := []string{"h1", "h2", "h3"}
got, warns, err := Allocate(pool, 24, nil, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
assert.Equal(t, "10.210.0.0/24", got["h1"].String())
assert.Equal(t, "10.210.1.0/24", got["h2"].String())
assert.Equal(t, "10.210.2.0/24", got["h3"].String())
}
func TestAllocate_StableReuse(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
existing := map[string]*net.IPNet{
"h1": mustParseCIDR("10.210.5.0/24"),
}
hosts := []string{"h1", "h2"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
// h1 keeps its existing subnet.
assert.Equal(t, "10.210.5.0/24", got["h1"].String())
// h2 gets the lowest free subnet (0 since 5 is taken).
assert.Equal(t, "10.210.0.0/24", got["h2"].String())
}
func TestAllocate_FillsGaps(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
existing := map[string]*net.IPNet{
"h1": mustParseCIDR("10.210.0.0/24"),
"h2": mustParseCIDR("10.210.2.0/24"),
}
hosts := []string{"h1", "h2", "h3"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
assert.Empty(t, warns)
assert.Equal(t, "10.210.0.0/24", got["h1"].String())
assert.Equal(t, "10.210.2.0/24", got["h2"].String())
// Gap at .1 is filled.
assert.Equal(t, "10.210.1.0/24", got["h3"].String())
}
func TestAllocate_DuplicateSubnet_Warns(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
// Both ha and hb claim 10.210.5.0/24; ha wins (alphabetical).
existing := map[string]*net.IPNet{
"ha": mustParseCIDR("10.210.5.0/24"),
"hb": mustParseCIDR("10.210.5.0/24"),
}
hosts := []string{"ha", "hb"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
require.Len(t, warns, 1)
assert.Equal(t, "hb", warns[0].Host)
assert.Contains(t, warns[0].Reason, "duplicate subnet")
// ha keeps 10.210.5.0/24; hb is reassigned.
assert.Equal(t, "10.210.5.0/24", got["ha"].String())
assert.NotEqual(t, "10.210.5.0/24", got["hb"].String())
}
func TestAllocate_OutOfPool_Warns(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
existing := map[string]*net.IPNet{
"h1": mustParseCIDR("192.168.0.0/24"), // outside pool
}
hosts := []string{"h1"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
require.Len(t, warns, 1)
assert.Equal(t, "h1", warns[0].Host)
assert.Contains(t, warns[0].Reason, "not a /24 inside pool")
// h1 is reassigned to a pool address.
assert.True(t, pool.Contains(got["h1"].IP), "reassigned IP must be inside pool")
}
func TestAllocate_WrongPrefix_Warns(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
existing := map[string]*net.IPNet{
"h1": mustParseCIDR("10.210.0.0/16"), // wrong prefix (/16 instead of /24)
}
hosts := []string{"h1"}
got, warns, err := Allocate(pool, 24, existing, hosts)
require.NoError(t, err)
require.Len(t, warns, 1)
assert.Contains(t, warns[0].Reason, "not a /24 inside pool")
ones, _ := got["h1"].Mask.Size()
assert.Equal(t, 24, ones, "reassigned subnet must have /24 prefix")
}
func TestAllocate_DuplicateHost_Errors(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
hosts := []string{"1.1.1.1", "1.1.1.1"}
_, _, err := Allocate(pool, 24, nil, hosts)
require.Error(t, err)
assert.Contains(t, err.Error(), "duplicate host")
}
func TestAllocate_PoolExhaustion(t *testing.T) {
// /28 pool with /28 subnets — only one slot.
pool := mustParseCIDR("10.0.0.0/28")
hosts := []string{"h1", "h2"}
_, _, err := Allocate(pool, 28, nil, hosts)
require.Error(t, err)
assert.Contains(t, err.Error(), "exhausted")
}
func TestAllocate_EmptyHosts(t *testing.T) {
pool := mustParseCIDR("10.210.0.0/16")
got, warns, err := Allocate(pool, 24, nil, nil)
require.NoError(t, err)
assert.Empty(t, warns)
assert.Empty(t, got)
}
+310
View File
@@ -51,6 +51,7 @@ All commands support `--format` flag:
Aliases are derived from the CLI command tree:
- `coolify app env` | `coolify app envs` | `coolify app environment`
- `coolify app previews` | `coolify app preview`
- `coolify app start` | `coolify app deploy`
- `coolify app storage` | `coolify app storages`
- `coolify app` | `coolify apps` | `coolify application` | `coolify applications`
@@ -851,6 +852,15 @@ Parameters:
required: false
default: 100
Command: coolify app previews delete <app_uuid> <pr_id>
Description: Delete a preview deployment for an application. First argument is the application UUID, second is the pull request ID.
Parameters:
- name: --force
type: boolean
description: Skip confirmation prompt
required: false
default: false
Command: coolify app restart <uuid>
Description: Restart a running application.
Parameters: (None)
@@ -1714,6 +1724,129 @@ Parameters:
required: false
default: 0
Command: coolify firewall
Description: [ALPHA] Manage cross-host container allow rules (Coolify v5)
Parameters:
- name: --all-namespaces
type: boolean
description: Operate across every mesh namespace on each host (list/containers fan out; allow/revoke still require a specific --namespace)
required: false
default: false
- name: --concurrency
type: integer
description: Maximum number of parallel SSH connections
required: false
default: 10
- name: --coold-port
type: integer
description: TCP port coold's REST API listens on (bound to the WG mgmt IP)
required: false
default: 8443
- name: --coold-token
type: string
description: Bearer token override for coold REST API (also reads COOLIFY_COOLD_TOKEN env). When unset, CLI reads /etc/coolify/api-token over SSH per host.
required: false
- name: --namespace
type: string
description: Namespace the command operates against (must match a namespace created by `coolify init`)
required: false
default: default
- name: --servers
type: stringSlice
description: Comma-separated server IPs (required)
required: true
- name: --ssh-key
type: string
description: Path to SSH private key used to connect to servers (required)
required: true
- name: --ssh-passphrase-prompt
type: boolean
description: Prompt for SSH key passphrase (also reads COOLIFY_SSH_PASSPHRASE env var)
required: false
default: false
- name: --ssh-port
type: integer
description: SSH port
required: false
default: 22
- name: --ssh-timeout
type: string
description: SSH connection timeout (e.g. 30s, 1m)
required: false
default: 30s
- name: --ssh-user
type: string
description: SSH username
required: false
default: root
- name: --wg-interface
type: string
description: WireGuard interface name on remote hosts (must match --wg-interface at init)
required: false
default: wg0
Command: coolify firewall allow
Description: Add an allow rule (from container → to container:port)
Parameters:
- name: --bidirectional
type: boolean
description: Also install the reverse rule on the source host (default: one-way; conntrack handles replies)
required: false
default: false
- name: --from
type: string
description: Source container (name, short-id, raw IP, or host:name) — required
required: false
- name: --port
type: integer
description: Destination port (required unless --proto is empty)
required: false
default: 0
- name: --proto
type: string
description: Protocol (tcp, udp, or empty for any)
required: false
default: tcp
- name: --to
type: string
description: Destination container (name, short-id, raw IP, or host:name) — required
required: false
Command: coolify firewall containers
Description: List containers on the Coolify mesh bridge across all servers
Parameters: (None)
Command: coolify firewall list
Description: List installed allow rules across all servers
Parameters: (None)
Command: coolify firewall revoke
Description: Remove an allow rule
Parameters:
- name: --bidirectional
type: boolean
description: Also install the reverse rule on the source host (default: one-way; conntrack handles replies)
required: false
default: false
- name: --from
type: string
description: Source container (name, short-id, raw IP, or host:name) — required
required: false
- name: --port
type: integer
description: Destination port (required unless --proto is empty)
required: false
default: 0
- name: --proto
type: string
description: Protocol (tcp, udp, or empty for any)
required: false
default: tcp
- name: --to
type: string
description: Destination container (name, short-id, raw IP, or host:name) — required
required: false
Command: coolify github branches <app_uuid> <owner/repo>
Description: List branches for a repository
Parameters: (None)
@@ -1859,6 +1992,183 @@ Parameters:
description: GitHub Webhook Secret
required: false
Command: coolify init
Description: [ALPHA] Initialize WireGuard mesh for Coolify v5
Parameters:
- name: --builder-capacity
type: integer
description: Concurrent builds accepted per host (COOLD_BUILDER_CAPACITY).
required: false
default: 2
- name: --builder-cpu-quota
type: string
description: cgroup CPU quota for each build subprocess (COOLD_BUILDER_CPU_QUOTA).
systemd CPUQuota format; "200%" = two full cores.
required: false
default: 200%
- name: --builder-hosts
type: stringSlice
description: Explicit subset of --servers to enroll with the builder capability.
Takes precedence over --enable-builder. Empty (default) means fall back to
--enable-builder for the whole cluster.
required: false
- name: --builder-memory-max
type: string
description: cgroup memory cap for each build subprocess (COOLD_BUILDER_MEMORY_MAX).
systemd MemoryMax format; e.g. "2G", "512M".
required: false
default: 2G
- name: --builder-timeout-secs
type: integer
description: Hard wall-clock timeout per build in seconds (COOLD_BUILDER_TIMEOUT_SECS).
required: false
default: 1800
- name: --central
type: string
description: SSH address of the central VM that will run the scheduler (and later Laravel).
Must be one of the --servers entries. When set, phases 4+5 install the scheduler on that host
and push a per-host JWT to every other server. Leave empty to skip scheduler setup.
required: false
- name: --concurrency
type: integer
description: Maximum number of parallel SSH connections
required: false
default: 10
- name: --container-pool
type: string
description: Shared container address pool — each (namespace, host) pair gets a /<container-prefix> from here, owned by that namespace's Podman bridge
required: false
default: 10.210.0.0/16
- name: --container-prefix
type: integer
description: Prefix length of each per-host, per-namespace container subnet
required: false
default: 24
- name: --coold-version
type: string
description: Release tag to download for coold (e.g. "nightly", "v1.2.3"). nightly always re-installs on every apply.
required: false
default: nightly
- name: --corrosion-api-port
type: integer
description: Corrosion HTTP API port (bound to 127.0.0.1)
required: false
default: 8080
- name: --corrosion-gossip-port
type: integer
description: Corrosion SWIM gossip port (bound to the wg0 mgmt IP)
required: false
default: 8787
- name: --corrosion-version
type: string
description: Release tag to download for corrosion (e.g. "nightly", "v1.2.3"). nightly always re-installs on every apply.
required: false
default: nightly
- name: --enable-builder
type: boolean
description: Cluster-wide shorthand: enable the builder capability on every host
(requires --central). Ignored when --builder-hosts is set.
required: false
default: true
- name: --namespaces
type: stringSlice
description: Comma-separated list of namespaces to create on each host. Each namespace is a separate Podman bridge network (coolify-<ns>-mesh) with its own /<container-prefix> per host
required: false
default: [default]
- name: --scheduler-version
type: string
description: Release tag to download for scheduler (e.g. "nightly", "v1.2.3").
required: false
default: nightly
- name: --servers
type: stringSlice
description: Comma-separated server IPs (required)
required: true
- name: --skip-default-deny
type: boolean
description: Skip installing the default-deny firewall scaffold. By default, both cross-host and intra-host (same bridge) container traffic is blocked; coold manages the allow list at runtime
required: false
default: false
- name: --ssh-key
type: string
description: Path to SSH private key used to connect to servers (required)
required: true
- name: --ssh-passphrase-prompt
type: boolean
description: Prompt for SSH key passphrase (also reads COOLIFY_SSH_PASSPHRASE env var)
required: false
default: false
- name: --ssh-port
type: integer
description: SSH port
required: false
default: 22
- name: --ssh-timeout
type: string
description: SSH connection timeout (e.g. 30s, 1m)
required: false
default: 30s
- name: --ssh-user
type: string
description: SSH username
required: false
default: root
- name: --wg-interface
type: string
description: WireGuard interface name on the remote hosts
required: false
default: wg0
- name: --wg-listen-port
type: integer
description: WireGuard UDP listen port
required: false
default: 51820
- name: --wg-mgmt-pool
type: string
description: WireGuard management address pool — each host gets a /32 from here, assigned to wg0
required: false
default: 100.64.0.0/16
- name: --yes (-y)
type: boolean
description: Skip the interactive alpha confirmation prompt
required: false
default: false
Command: coolify init bootstrap
Description: First-time mesh install (all actions allowed)
Parameters: (None)
Command: coolify init extend
Description: Add new hosts to an existing mesh (existing hosts stay untouched)
Parameters:
- name: --allow-replace
type: boolean
description: Unlock destructive-replace actions on existing hosts (e.g. recreating a drifted podman bridge). Off by default — drifted existing hosts are surfaced as skipped actions instead.
required: false
default: false
- name: --new-hosts
type: stringSlice
description: Comma-separated subset of --servers that is brand-new this run (required). Only these hosts receive the full first-time install; all other hosts get peer-refresh only.
required: true
Command: coolify init plan
Description: Show WireGuard mesh changes without applying them
Parameters:
- name: --intent
type: string
description: Preview filter: "bootstrap" (all actions), "extend" (treat --new-hosts as fresh, existing hosts peer-refresh only), "upgrade" (version bumps only).
required: false
default: bootstrap
Command: coolify init upgrade
Description: Bump agent binary versions (coold / corrosion / scheduler / builder) on every host
Parameters:
- name: --allow-nightly
type: boolean
description: Permit --coold-version/--corrosion-version/--scheduler-version=nightly. Off by default because nightly re-installs on every run instead of only when the pinned version changes.
required: false
default: false
Command: coolify private-key add <key_name> <private_key_or_file>
Description: Add a private key
Parameters: (None)
+283
View File
@@ -0,0 +1,283 @@
#!/usr/bin/env bash
# End-to-end sanity test for the coolify mesh + firewall stack.
#
# 1. `coolify init apply` on two servers with two namespaces (default, alpha).
# 2. Start one nginx ("web-*") on SERVER_A and one alpine client ("client-*")
# on SERVER_B inside each namespace — static --ip, --dns <bridge-gw>,
# --restart=always so they survive reboot.
# Also start client2-default on SERVER_A (same bridge as web-default) to
# test intra-host nft bridge-family deny.
# 3. Verify cross-host traffic is DROPped by default (wget times out).
# 4. Verify intra-host same-bridge traffic is DROPped by default (nft plane).
# 5. Verify nft bridge table coolify_bridge present on both hosts.
# 6. `coolify firewall allow` per namespace (cross-host + intra-host).
# 7. Verify wget succeeds in both planes.
# 8. Re-run init apply to verify nft scaffold idempotency.
#
# Usage:
# SERVERS=1.2.3.4,5.6.7.8 scripts/e2e-mesh.sh
#
# Required env:
# SERVERS — exactly two SSH-reachable IPs, comma-separated.
# First = "host A" (web-* containers).
# Second = "host B" (client-* containers).
# Optional env:
# SSH_KEY — default ~/.ssh/id_ed25519-no-pass (no passphrase)
# SSH_USER — default root
# COOLIFY_SSH_PASSPHRASE — only if SSH_KEY is passphrase-protected;
# requires `sshpass` on PATH
#
# The script assumes `--container-pool` defaults (10.210.0.0/16, /24). With two
# hosts + two namespaces the allocator hands out 10.210.{0,1,2,3}.0/24; gateway
# is always .1, container IPs below are pinned to .10.
set -euo pipefail
SSH_KEY="${SSH_KEY:-$HOME/.ssh/id_ed25519-no-pass}"
SSH_USER="${SSH_USER:-root}"
SERVERS="${SERVERS:?set SERVERS=<host-a>,<host-b>}"
IFS=',' read -r SERVER_A SERVER_B EXTRA <<<"$SERVERS"
SERVER_A="${SERVER_A// /}"
SERVER_B="${SERVER_B// /}"
if [[ -z "$SERVER_A" || -z "$SERVER_B" || -n "${EXTRA:-}" ]]; then
echo "SERVERS must contain exactly two comma-separated IPs (got: $SERVERS)" >&2
exit 1
fi
: "${COOLIFY_SSH_PASSPHRASE:=}"
export COOLIFY_SSH_PASSPHRASE
REPO_ROOT="$(cd "$(dirname "${BASH_SOURCE[0]}")/.." && pwd)"
cd "$REPO_ROOT"
# Namespace → gateway IP on each host (matches allocator output).
GW_A_DEFAULT=10.210.0.1
GW_B_DEFAULT=10.210.1.1
GW_A_ALPHA=10.210.2.1
GW_B_ALPHA=10.210.3.1
# Container IPs (all pinned to .10 in each /24).
IP_WEB_DEFAULT=10.210.0.10 # host A, namespace default
IP_CLIENT_DEFAULT=10.210.1.10 # host B, namespace default
IP_WEB_ALPHA=10.210.2.10 # host A, namespace alpha
IP_CLIENT_ALPHA=10.210.3.10 # host B, namespace alpha
# Intra-host client on same bridge as web-default (host A, namespace default).
IP_CLIENT2_DEFAULT=10.210.0.11 # host A, namespace default
NGINX_IMAGE=docker.io/library/nginx:alpine
ALPINE_IMAGE=docker.io/library/alpine
SSH_OPTS=(-i "$SSH_KEY" -o StrictHostKeyChecking=accept-new -o ConnectTimeout=10 -o BatchMode=yes)
say() { printf '\n\033[1;36m==> %s\033[0m\n' "$*"; }
warn() { printf '\033[1;33m%s\033[0m\n' "$*" >&2; }
fail() { printf '\033[1;31m%s\033[0m\n' "$*" >&2; exit 1; }
# Use sshpass if passphrase was supplied; otherwise lean on ssh-agent / keyless.
ssh_exec() {
local host="$1"; shift
if [[ -n "$COOLIFY_SSH_PASSPHRASE" ]]; then
SSHPASS="$COOLIFY_SSH_PASSPHRASE" sshpass -P "passphrase" -e \
ssh "${SSH_OPTS[@]}" "$SSH_USER@$host" "$@"
else
ssh "${SSH_OPTS[@]}" "$SSH_USER@$host" "$@"
fi
}
cli() {
if [[ -n "$COOLIFY_SSH_PASSPHRASE" ]]; then
go run ./coolify "$@" --ssh-key "$SSH_KEY" --ssh-user "$SSH_USER"
else
go run ./coolify "$@" --ssh-key "$SSH_KEY" --ssh-user "$SSH_USER"
fi
}
# assert_blocked <host> <container> <target-ip-or-hostname>
assert_blocked() {
local host="$1" client="$2" target="$3"
if ssh_exec "$host" "podman exec $client wget -T 4 -qO- http://$target" >/dev/null 2>&1; then
fail "expected timeout for $client@$host$target but request succeeded"
fi
printf ' blocked: %s@%s → %s ✓\n' "$client" "$host" "$target"
}
# assert_flows <host> <container> <target-ip-or-hostname>
assert_flows() {
local host="$1" client="$2" target="$3"
if ! ssh_exec "$host" "podman exec $client wget -T 5 -qO- http://$target" | grep -q 'nginx'; then
fail "$client@$host$target failed to reach nginx"
fi
printf ' OK: %s@%s → %s ✓\n' "$client" "$host" "$target"
}
# ─── 1. init apply ────────────────────────────────────────────────────────────
say "1/8 coolify init apply on $SERVERS (namespaces: default, alpha)"
cli init apply \
--servers "$SERVERS" \
--namespaces default,alpha \
--yes
# ─── 2. containers ────────────────────────────────────────────────────────────
say "2/8 creating containers with --ip / --dns / --restart=always"
run_container() {
local host="$1" name="$2" network="$3" ip="$4" gw="$5" image="$6"; shift 6
ssh_exec "$host" "podman rm -f $name >/dev/null 2>&1 || true"
ssh_exec "$host" "podman run -d --name $name \
--network $network --ip $ip --dns $gw --restart=always \
$image $*"
}
# host A: nginx servers
run_container "$SERVER_A" web-default coolify-default-mesh "$IP_WEB_DEFAULT" "$GW_A_DEFAULT" "$NGINX_IMAGE"
run_container "$SERVER_A" web-alpha coolify-alpha-mesh "$IP_WEB_ALPHA" "$GW_A_ALPHA" "$NGINX_IMAGE"
# host B: alpine clients (sleep forever so we can exec into them)
run_container "$SERVER_B" client-default coolify-default-mesh "$IP_CLIENT_DEFAULT" "$GW_B_DEFAULT" "$ALPINE_IMAGE" sleep infinity
run_container "$SERVER_B" client-alpha coolify-alpha-mesh "$IP_CLIENT_ALPHA" "$GW_B_ALPHA" "$ALPINE_IMAGE" sleep infinity
# host A: 2nd client on same bridge as web-default — tests intra-host nft plane
run_container "$SERVER_A" client2-default coolify-default-mesh "$IP_CLIENT2_DEFAULT" "$GW_A_DEFAULT" "$ALPINE_IMAGE" sleep infinity
# ─── 3. cross-host default-deny ───────────────────────────────────────────────
say "3/8 confirming default-deny blocks cross-host traffic (expect timeouts)"
assert_blocked "$SERVER_B" client-default web-default.default.coolify.internal
assert_blocked "$SERVER_B" client-alpha web-alpha.alpha.coolify.internal
# ─── 4. intra-host same-bridge default-deny (nft bridge plane) ────────────────
say "4/8 confirming intra-host same-bridge traffic blocked (nft bridge plane)"
# Raw IP intentional — DNS via bridge gateway also crosses the nft bridge hook;
# using raw IP isolates the firewall check from DNS-path correctness.
assert_blocked "$SERVER_A" client2-default "$IP_WEB_DEFAULT"
# ─── 5. nft table present on both hosts ───────────────────────────────────────
say "5/8 verifying nft bridge table coolify_bridge present on both hosts"
for host in "$SERVER_A" "$SERVER_B"; do
ssh_exec "$host" "nft list table bridge coolify_bridge" >/dev/null \
|| fail "nft table coolify_bridge missing on $host"
printf ' present: %s ✓\n' "$host"
done
# ─── 6. allow rules ───────────────────────────────────────────────────────────
say "6/8 adding allow rules (cross-host + intra-host)"
cli firewall allow \
--servers "$SERVERS" \
--namespace default \
--from client-default --to web-default --port 80
cli firewall allow \
--servers "$SERVERS" \
--namespace alpha \
--from client-alpha --to web-alpha --port 80
# Intra-host allow: client2-default → web-default on host A.
# Rule lands on host A (destination-host ownership); passing both servers is
# idempotent on the non-owner side.
cli firewall allow \
--servers "$SERVERS" \
--namespace default \
--from client2-default --to web-default --port 80
# ─── 7. verify flow ───────────────────────────────────────────────────────────
say "7/8 verifying HTTP flows in both planes"
# Cross-host (iptables FORWARD plane)
assert_flows "$SERVER_B" client-default web-default.default.coolify.internal
assert_flows "$SERVER_B" client-alpha web-alpha.alpha.coolify.internal
# Intra-host (nft bridge plane) — raw IP, same rationale as step 4
assert_flows "$SERVER_A" client2-default "$IP_WEB_DEFAULT"
# ─── 8. re-apply idempotency ──────────────────────────────────────────────────
say "8/10 re-running init apply — verifies nft scaffold idempotency (chain already exists regression)"
cli init apply \
--servers "$SERVERS" \
--namespaces default,alpha \
--yes
# ─── 9. builder smoke test (static build) ─────────────────────────────────────
# Requires --central to have been passed to init apply. The script above does
# not pass --central, so builder capability may be disabled — gate on a marker
# file or just skip when /etc/coolify/jwt.priv is absent.
if ssh_exec "$SERVER_A" "test -f /etc/coolify/jwt.priv" >/dev/null 2>&1; then
say "9/10 builder smoke test — POST /v1/build/dispatch, expect localhost image on central"
# Scheduler UDS; central runs scheduler as root so the default 0600 socket is
# reachable for ssh-exec'd curl without group setup.
SCHEDULER_SOCK="/run/coolify/scheduler.sock"
UDS_CURL="curl -sS --unix-socket $SCHEDULER_SOCK"
REQ_ID="e2e-$(date +%s)"
BUILD_PAYLOAD="{\"request_id\":\"$REQ_ID\",\"command\":{\"type\":\"static_build\",\"repo_url\":\"https://github.com/coollabsio/static-test-site\",\"git_ref\":\"main\",\"target_image\":\"localhost/e2e-$REQ_ID\"}}"
ACK=$(ssh_exec "$SERVER_A" "$UDS_CURL -w '\\n%{http_code}' -X POST -H 'Content-Type: application/json' --data '$BUILD_PAYLOAD' http://localhost/v1/build/dispatch")
echo "$ACK" | tail -n1 | grep -q '^202$' || fail "dispatch did not return 202: $ACK"
DEADLINE=$(($(date +%s)+180))
RESP=""
while :; do
OUT=$(ssh_exec "$SERVER_A" "$UDS_CURL -w '\\n%{http_code}' 'http://localhost/v1/build/result/$REQ_ID?timeout_ms=25000'")
CODE=$(echo "$OUT" | tail -n1)
RESP=$(echo "$OUT" | sed '$d')
[[ "$CODE" == "200" ]] && break
[[ "$CODE" != "408" && "$CODE" != "404" ]] && fail "build result unexpected $CODE: $RESP"
[[ $(date +%s) -ge $DEADLINE ]] && fail "builder smoke timed out after 180s"
done
echo "$RESP" | grep -q '"status":"ok"' || fail "builder smoke returned error: $RESP"
IMG_HOST=""
for host in "$SERVER_A" "$SERVER_B"; do
if ssh_exec "$host" "buildah images 2>/dev/null | grep -q localhost/e2e-$REQ_ID"; then
IMG_HOST="$host"; break
fi
done
[[ -n "$IMG_HOST" ]] || fail "image localhost/e2e-$REQ_ID not found on any host"
printf ' OK: build succeeded; image on %s ✓\n' "$IMG_HOST"
# ─── 10. cancel test ────────────────────────────────────────────────────────
say "10/10 cancel test — dispatch then POST /v1/build/:id/cancel; expect scope killed and cancel response"
CAN_ID="e2e-cancel-$(date +%s)"
CAN_BUILD="{\"request_id\":\"$CAN_ID\",\"command\":{\"type\":\"static_build\",\"repo_url\":\"https://github.com/torvalds/linux\",\"git_ref\":\"master\",\"target_image\":\"localhost/$CAN_ID\"}}"
ACK=$(ssh_exec "$SERVER_A" "$UDS_CURL -w '\\n%{http_code}' -X POST -H 'Content-Type: application/json' --data '$CAN_BUILD' http://localhost/v1/build/dispatch")
echo "$ACK" | tail -n1 | grep -q '^202$' || fail "cancel-test dispatch did not return 202: $ACK"
SCOPE_HOST=""
for _ in 1 2 3 4 5 6 7 8 9 10; do
sleep 2
for host in "$SERVER_A" "$SERVER_B"; do
if ssh_exec "$host" "systemctl list-units --no-legend --plain 'coolify-build-*.service' 2>/dev/null | grep -q $CAN_ID"; then
SCOPE_HOST="$host"; break 2
fi
done
done
[[ -n "$SCOPE_HOST" ]] || fail "scope coolify-build-$CAN_ID.service never appeared"
printf ' scope running on %s ✓\n' "$SCOPE_HOST"
ssh_exec "$SERVER_A" "$UDS_CURL -X POST http://localhost/v1/build/$CAN_ID/cancel" >/dev/null
DEADLINE=$(($(date +%s)+30))
RESP=""
while :; do
OUT=$(ssh_exec "$SERVER_A" "$UDS_CURL -w '\\n%{http_code}' 'http://localhost/v1/build/result/$CAN_ID?timeout_ms=10000'")
CODE=$(echo "$OUT" | tail -n1)
RESP=$(echo "$OUT" | sed '$d')
[[ "$CODE" == "200" ]] && break
[[ "$CODE" != "408" && "$CODE" != "404" ]] && fail "cancel result unexpected $CODE: $RESP"
[[ $(date +%s) -ge $DEADLINE ]] && fail "cancel response timed out"
done
echo "$RESP" | grep -q '"stage":"cancel"' || fail "expected stage=cancel in response, got: $RESP"
if ssh_exec "$SCOPE_HOST" "systemctl is-active coolify-build-$CAN_ID.service >/dev/null 2>&1"; then
fail "scope still active after cancel: coolify-build-$CAN_ID.service"
fi
printf ' OK: cancel SIGTERM killed cgroup; stage=cancel ✓\n'
else
warn "skipping steps 9/10 (builder smoke + cancel): --central was not passed to init apply, so builder capability is not enabled"
fi
say "all checks passed"
+2 -2
View File
@@ -6,7 +6,7 @@
set -e # Exit on error
# Configuration
REPO="coollabsio/coolify-cli"
REPO="IranAccess/coolify-cli/"
BINARY_NAME="coolify"
GLOBAL_INSTALL_DIR="/usr/local/bin"
USER_INSTALL_DIR="$HOME/.local/bin"
@@ -125,7 +125,7 @@ detect_platform() {
get_latest_version() {
echo "Fetching latest release version..." >&2
local latest_version
latest_version=$(curl -sSf "https://api.github.com/repos/${REPO}/releases/latest" | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
latest_version=$(curl -sSf "https://api.gitamin.ir/repos/${REPO}/releases/latest" | grep '"tag_name":' | sed -E 's/.*"([^"]+)".*/\1/')
if [ -z "$latest_version" ]; then
error_exit "Failed to fetch latest release version from GitHub"
+51
View File
@@ -0,0 +1,51 @@
[Unit]
Description=Coolify mesh firewall rules
After=wg-quick@wg0.service network-online.target
Wants=network-online.target
[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/bin/sh -c "/usr/sbin/iptables -t nat -C POSTROUTING -s 10.210.0.0/24 -o wg0 -j RETURN 2>/dev/null || /usr/sbin/iptables -t nat -I POSTROUTING -s 10.210.0.0/24 -o wg0 -j RETURN"
ExecStart=/bin/sh -c "/usr/sbin/iptables -t nat -C POSTROUTING -s 10.220.0.0/24 -o wg0 -j RETURN 2>/dev/null || /usr/sbin/iptables -t nat -I POSTROUTING -s 10.220.0.0/24 -o wg0 -j RETURN"
# Remove blanket ACCEPT from prior mode-A run.
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -s 10.210.0.0/24 -j ACCEPT 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -d 10.210.0.0/24 -j ACCEPT 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -s 10.220.0.0/24 -j ACCEPT 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -D FORWARD -d 10.220.0.0/24 -j ACCEPT 2>/dev/null || true"
# Create chains (idempotent).
ExecStart=/bin/sh -c "/usr/sbin/iptables -N COOLIFY-ALLOW 2>/dev/null || true"
ExecStart=/bin/sh -c "/usr/sbin/iptables -N COOLIFY-INTRA 2>/dev/null || true"
# Flush COOLIFY-INTRA so order is deterministic on every restart.
ExecStart=/usr/sbin/iptables -F COOLIFY-INTRA
ExecStart=/usr/sbin/iptables -A COOLIFY-INTRA -j COOLIFY-ALLOW
ExecStart=/usr/sbin/iptables -A COOLIFY-INTRA -j DROP
# Repopulate COOLIFY-ALLOW from coold's canonical snapshot. File is rewritten
# by coold on every rule mutate, so it is the source of truth across reboots
# and service restarts. Flush first because 'iptables-restore --noflush'
# leaves existing chain contents in place and would otherwise duplicate every
# rule on re-run.
ExecStart=/bin/sh -c "[ -s /etc/coolify/allow.rules ] && /usr/sbin/iptables -F COOLIFY-ALLOW && /usr/sbin/iptables-restore --noflush < /etc/coolify/allow.rules || true"
# Conntrack early-accept at top of FORWARD (idempotent).
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT 2>/dev/null || /usr/sbin/iptables -I FORWARD 1 -m conntrack --ctstate ESTABLISHED,RELATED -j ACCEPT"
# Top-level FORWARD jumps for every namespace's subnet (both directions).
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -d 10.210.0.0/24 -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -d 10.210.0.0/24 -j COOLIFY-INTRA"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -s 10.210.0.0/24 -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -s 10.210.0.0/24 -j COOLIFY-INTRA"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -d 10.220.0.0/24 -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -d 10.220.0.0/24 -j COOLIFY-INTRA"
ExecStart=/bin/sh -c "/usr/sbin/iptables -C FORWARD -s 10.220.0.0/24 -j COOLIFY-INTRA 2>/dev/null || /usr/sbin/iptables -A FORWARD -s 10.220.0.0/24 -j COOLIFY-INTRA"
# Bridge-family nft scaffold — intra-namespace default-deny.
ExecStart=/bin/sh -c "nft list table bridge coolify_bridge >/dev/null 2>&1 || nft add table bridge coolify_bridge"
ExecStart=/bin/sh -c "nft add chain bridge coolify_bridge coolify_allow '{ }' 2>/dev/null || true"
ExecStart=/bin/sh -c "nft delete chain bridge coolify_bridge forward 2>/dev/null || true"
ExecStart=/bin/sh -c "nft delete chain bridge coolify_bridge coolify_intra 2>/dev/null || true"
ExecStart=/bin/sh -c "nft -f /etc/coolify/bridge-fw.nft"
ExecStart=/bin/sh -c "[ -s /etc/coolify/allow.nft ] && nft -f /etc/coolify/allow.nft || true"
[Install]
WantedBy=multi-user.target
+11
View File
@@ -0,0 +1,11 @@
[Interface]
Address = 100.64.0.1/32
ListenPort = 51820
PrivateKey = aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789abcde=
[Peer]
# 203.0.113.11
PublicKey = BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBK=
AllowedIPs = 100.64.0.2/32, 10.210.1.0/24
Endpoint = 203.0.113.11:51820
PersistentKeepalive = 25
View File
+2
View File
@@ -0,0 +1,2 @@
aBcDeFgHiJkLmNoPqRsTuVwXyZ0123456789abcde= AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAK= 51820 off
BBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBBK= (none) 203.0.113.11:51820 10.8.0.2/32 1700000000 92 180 25