Aligns remaining test files with the pattern established in 6e80c95.
Using require halts the test immediately on fatal errors instead of
continuing with invalid state.
Add revive var-naming rule with skipPackageNameChecks to suppress
package-name lint violations. Add explicit default cases to switch
statements in wireguard/intent.go for exhaustiveness. Upgrade
assert.NoError to require.NoError in firewall tests to halt on error.
Split the monolithic `apply` subcommand into three purpose-built commands:
- `bootstrap`: first-time mesh install, keeps interactive alpha gate
- `extend`: adds new hosts to an existing mesh, peer-refresh only on existing hosts
- `upgrade`: bumps agent binaries across fleet, leaves mesh config untouched
Intent filtering lives in `internal/wireguard/intent.go` (ValidateIntent +
filterByIntent). Suppressed actions surface on plan.Skipped so operators see
what would have fired and why.
Also renames broker → scheduler (service + tests) to match its actual role.
Drop Redis as a broker dependency. Broker now exposes an HTTP listener
on a Unix domain socket at /run/coolify/broker.sock instead of reading
from Redis streams.
- Remove RedisInstallCommand and redis.go entirely
- Remove ActionInstallRedis from plan and apply phases
- Drop redisURL param from BrokerServiceUnit; add BrokerUnixSocketPath
constant; systemd unit gains RuntimeDirectory=coolify (creates socket dir)
- e2e smoke tests switch from redis-cli XADD/LPOP to curl --unix-socket
against /v1/build/dispatch, /v1/build/result/:id, /v1/build/:id/cancel
These tests exercise coold/broker/builder internals over Redis+SSH and
don't touch any coolify-cli code. Moving them to the coold workspace
keeps the test code next to the binaries it validates; coolify-cli's
responsibility stays on provisioning.
Replacement lives under coold/e2e-tests/ as a Rust integration test
crate gated by #[ignore] so default cargo test skips it.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replaces the ad-hoc smoke additions in scripts/e2e-mesh.sh with a
proper Go test harness under ./test/e2e, gated behind the `e2e` build
tag so normal `go test ./...` skips it.
Covers the routing and survival guarantees exercised by hand so far:
TestPinToBuilderHost - pinned dispatch to a
builder-capable host
TestPinToCooldOnlyHostReturns503 - cap-missing negative
TestUnknownHostIdReturns503 - unknown host negative
TestLoadBalancePicksBuilderHost - host_id=none picks the
builder-capable host
TestBuildCancelEmitsStageCancel - cancel via build:cmd delivers
code=499 stage=cancel
TestCooldRestartAdoptsInFlightBuild - systemctl restart coold
mid-build; unit survives;
new coold adopts; cancel
flows through the adopted
stream; workdir cleaned
Tests drive Redis via ssh + redis-cli on the central host and assert
on-host state via `buildah images` and `systemctl is-active`. No
broker/coold code is imported — the harness exercises the black-box
contract.
Run:
BUILDER_HOST=... COOLD_ONLY_HOST=... \
BUILDER_MGMT=100.64.0.1 COOLD_ONLY_MGMT=100.64.0.2 \
CENTRAL_HOST=... SSH_KEY=... \
go test -tags e2e -v -timeout 15m ./test/e2e/...
Live run: 6 tests pass in ~32s against 78.47.80.33 + 159.69.186.231.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Replace the cluster-wide --enable-builder bool with a per-host subset
controlled by --builder-hosts=<ip>,<ip>. Semantics:
* --builder-hosts empty + --enable-builder=true: every host in
--servers gets the builder capability (previous behavior)
* --builder-hosts non-empty: only listed hosts get the capability;
--enable-builder is ignored
* --builder-hosts entries not in --servers are dropped
DesiredMesh.BuilderHostSet() + HasBuilderCap(host) compute the final
set and are used by:
* phase 3 (install-builder): only on builder-capable hosts
* phase 5 (JWT caps + coold BuilderConfig): per-host caps claim,
COOLD_BUILDER_* env only when enabled
* plan.go (ActionInstallBuilder): planned only for enrolled hosts
Adds 6 unit tests for BuilderHostSet covering empty/all/subset cases
and regenerates llms-full.txt for the new flag.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
When --enable-builder is set, populate BuilderConfig.DenyNets with
the mesh management pool (default 100.64.0.0/16) and the container
pool (default 10.210.0.0/16). coold emits these as
COOLD_BUILDER_DENY_NETS, which the builder adapter expands into
systemd IPAddressDeny entries for every build subprocess.
This keeps the policy in sync with the operator's actual --wg-mgmt-pool
and --container-pool choices without hard-coding RFC1918 defaults.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Mirrors the coold-side refactor that merges builder traffic onto coold's
gRPC stream. The provisioner no longer installs a separate builder
systemd unit, mints a builder JWT, or exposes a second broker listener:
- --install-builder → --enable-builder (capability toggle, not a daemon
install). --builder-version removed; the builder binary tracks the
coold release.
- Phase 6 (builder service + builder JWT) deleted. Phase 5 now mints
the host JWT with a `caps` claim ("coold" always; "builder" when
enabled) and rewrites the coold unit with COOLD_BUILDER_* env.
- Phase 3 picks up a single extra step when EnableBuilder is true:
install buildah/git and drop the builder binary at
/usr/local/bin/builder (short-lived subprocess, no unit file).
- internal/services: BrokerServiceUnit drops the builder bind arg;
CooldServiceUnit gains an optional *BuilderConfig; builder.go keeps
only install + workdir constants; jwt.go has a single MintHostJWT.
- e2e-mesh.sh adds steps 9+10 — push build:cmd through Redis and
assert the resulting image, then dispatch and cancel a slow build
and assert the scope is killed with stage=cancel.
- llms-full.txt regenerated to reflect the flag rename.
Breaking: pairs with the coold commit that deletes :6444 and
builder.proto. Deploy in lockstep.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Switch error-path assertions in resolve_test.go from assert to require
so test stops immediately on unexpected error/success. Remove nil check
in config_test.go — cobra never returns nil from NewConfigCommand.
Rename `FirewallFlags` to `Flags` and `bindFirewallFlags` to `bindFlags`
within the firewall package — the `Firewall` prefix is redundant inside
the `firewall` package.
Drop the unused error return from `discoverNamespacesOnHosts`; the
function accumulates per-host errors into `ServerResult` slices and has
no package-level error path, so the third return value was always nil.
Also switches test assertions from `assert.Error/NoError` to
`require.Error/NoError` where the test cannot continue meaningfully on
failure, and adds broker service tests.
Promote golang-jwt/jwt/v5, mattn/go-isatty, golang.org/x/crypto, and
golang.org/x/term from indirect to direct dependencies in go.mod.
Fix data races in firewall test fakes by guarding calls slice with sync.Mutex.
Reformat struct literals and map literals across cmd, internal/wireguard,
and internal/firewall for consistent column alignment.
Phase 5 was filtering central out via hostsExcluding(), leaving the
coold instance on the central VM without broker env vars and without a
host-jwt. That breaks single-server deploys (only one host, which is
also central) and leaves central's own coold as a standalone API-only
process in fleet mode.
Run phase 5 on desired.Hosts directly so central also receives a JWT
and gets COOLD_BROKER_URL/COOLD_HOST_JWT_PATH injected. Drop
hostsExcluding() since it has no other callers.
Verified end-to-end on a single-server bed: `coolify init apply
--servers X --central X` now produces a working broker <-> coold dispatch
path on the same box.
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Adds phases 4 + 5 to `coolify init apply` for bootstrapping the v5 central
transport plane without Laravel:
- Phase 4 (central-only): apt-install Redis, download coolify-broker from
GitHub releases, generate an EC P-256 JWT keypair under /etc/coolify/, and
enable coolify-broker.service bound to the wg0 mgmt IP:6443.
- Phase 5 (per non-central host): read jwt.priv from central, mint a 1-year
ES256 JWT (sub = host wg0 IP), write it to /etc/coolify/host-jwt, rewrite
coold.service with COOLD_BROKER_URL + COOLD_HOST_JWT_PATH, restart coold.
New service generators under internal/services:
- broker.go — unit, install command, JWT keypair setup
- redis.go — apt install + enable
- jwt.go — golang-jwt/jwt/v5 ES256 minting
coold.go gains CooldServiceUnitWithBroker + BrokerConfig so the unit can
carry broker env vars on non-central hosts. DesiredMesh gains CentralHost +
BrokerVersion; empty CentralHost skips phases 4+5 (existing behavior).
Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Bridge interface names (e.g. "coolify-default-mesh") exceed Linux
IFNAMSIZ=16, so iifname/oifname matching silently fails at the kernel
level. Switch renderBridgeScaffold to accept []*net.IPNet and emit
`ip saddr`/`ip daddr` set rules instead.
Also fix nft chain-declaration order: coolify_intra must be declared
before the forward chain's jump rules reference it, as nft validates
jump targets at add-rule time.
Add `mkdir -p /etc/coolify` before bridge scaffold write so `cat >.tmp`
doesn't ENOENT on fresh hosts where coold hasn't run yet.
Add 3 new test steps that exercise the nft bridge-family deny plane
introduced alongside the iptables FORWARD scaffold:
- Step 4: assert intra-host same-bridge traffic is blocked by default
(client2-default on host A cannot reach web-default on same bridge).
Uses raw IP to isolate from DNS-path — DNS to bridge gateway also
crosses the nft hook.
- Step 5: assert `nft list table bridge coolify_bridge` succeeds on
both hosts after init apply.
- Step 7: assert intra-host flow opens after coold dual-write (both
iptables and nft coolify_allow planes receive the rule).
- Step 8: re-run init apply and assert exit 0 — catches
"chain already exists" regression from non-idempotent nft scaffold.
Also add intra-host allow rule in step 6 (client2-default → web-default
:80 via coold REST), and spawn client2-default on host A in step 2.
Refactor: parameterize assert_blocked/assert_flows with host as first
arg (was hardcoded to SERVER_B); update all existing callsites.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Probe 10a/10b detect nft binary availability and the coolify_bridge nft
table on each host. DefaultDenyActive is now gated on BridgeTableExists
so a host with only the iptables chain (no bridge table) triggers
reinstall. BuildPlan validates NftAvailable per host before computing
actions, surfacing a clear error instead of a silent nft shell failure.
Update --skip-default-deny help text to reflect the full scope of the
default-deny scaffold (cross-host and intra-host).
- Pre-delete forward + coolify_intra chains before nft -f to prevent
"chain already exists" error on second apply (Fix 1)
- Move nft delete table before blanket iptables ACCEPT in mode-A to
close the window where bridge traffic could be dropped (Fix 2)
- Replace hardcoded nft path/table strings with BridgeTableName,
BridgeScaffoldPath, BridgeAllowRulesPath constants (Fix 3)
- Pre-allocate ifNames slice with make([]string, 0, len(sorted)) (Fix 4)
- Add BridgeTableName, BridgeAllowRulesPath, BridgeScaffoldPath consts
- Add namespaces []string param to FirewallServiceUnit and InstallFirewallCommand
- Emit nft bridge table/chain scaffold in default-deny mode; tear it down in permissive mode
- Write /etc/coolify/bridge-fw.nft atomically on apply (delete it in permissive mode)
- Add BridgeTableExists and NftAvailable fields to ServerState
- Order coold after coolify-mesh-fw.service so the bridge scaffold is in place before coold starts
Switch coold/corrosion installation from uploading local binaries via
SSH to downloading from GitHub releases on each remote host.
- Remove --coold-binary / --corrosion-binary flags and elfcheck
- Add --coold-version / --corrosion-version flags (default: nightly)
- Add CooldInstallCommand / CorrosionInstallCommand with arch detection
- nightly tag always re-downloads; pinned tags skip if already installed
- Drop FileSha256 pre-flight checks (no longer needed)
- Add tests for version substitution and arch detection in install cmds
Introduce per-namespace Podman bridges (`coolify-<ns>-mesh`) and
subnet allocation so a single mesh cluster can carry multiple isolated
container networks carved from a shared `--container-pool`.
- Add `cmd/common/meshnet.go`: shared `MeshNetFlags`, `PodmanNetworkFor`,
`ValidateNamespace`, and flag-binding helpers used by both `init` and
`firewall` sub-commands.
- Replace flat `PodmanNetworkName` field on `FirewallFlags` with
`Namespace` + `AllNamespaces`; `--all-namespaces` fans out discovery
across every `io.coolify.managed=true` bridge on each host.
- Thread `Namespace` into `AllowRule`, `ComputeID`, and coold REST
payloads so rules are scoped per namespace.
- Extend WireGuard planner (`internal/wireguard/plan.go`,
`subnet.go`) to allocate one deterministically-ordered subnet per
host per namespace; `AllowedIPs` now lists every peer's namespace
subnets, keeping `wg0.conf` stable across re-runs.
- Pass `COOLD_NAMESPACES=<ns>:<network>:<gw>,...` env to coold so it
can bind DNS and track rules per namespace.
- Add `scripts/e2e-mesh.sh` for end-to-end multi-namespace smoke test.
- Update CLAUDE.md architecture docs to reflect namespace layout.
Add WGInterface field to FirewallFlags with --wg-interface flag (default
from DefaultWGInterface). Thread iface parameter through CooldApply,
CooldRevoke, CooldList, and CooldListAll so the WireGuard interface name
is configurable instead of hardcoded to wg0.
Also replace hardcoded "coolify-mesh" strings with PodmanNetworkName
where applicable.
Remove --podman, --default-deny, --install-coold flags. Podman stack,
coold/corrosion agents, and default-deny iptables scaffold now always
install. --skip-default-deny opts out of firewall scaffold for testing.
Drop direct iptables manipulation over SSH. Firewall allow/revoke/list
now POST/DELETE/GET against coold's REST API via SSH-bounced curl.
- Add internal/firewall/coold_client.go with CooldApply, CooldRevoke,
CooldList and per-host bearer-token resolution (reads
/etc/coolify/api-token over SSH when no override given)
- Delete apply.go, list.go, persist.go — coold owns kernel rules,
persistence (allow.rules snapshot), and the systemd unit
- Add --coold-port (default 8443) and --coold-token persistent flags
- Update CLAUDE.md and CONTROL_PLANE.md to reflect coold-owned surface
and outbound WSS/gRPC dial architecture
Add comprehensive docs for the `coolify firewall` cross-host allow-rule
test harness (alpha, v5) in CLAUDE.md: subcommands, flags, rule lifecycle,
reboot persistence via coolify-mesh-allow.service, and testing patterns.
Update CONTROL_PLANE.md to clarify the three-layer ownership model
(central DB for metadata/audit, coold/CLI for raw kernel rules, init for
chain scaffold), document /etc/coolify/allow.rules file format and the
pre→post-coold handoff strategy (same file format, coold takes over as
writer with no migration step).
Add `coolify firewall` command tree (alpha) for managing iptables
COOLIFY-ALLOW rules across SSH-reachable servers in the coolify-mesh
Podman network.
New subcommands:
- containers: discover running containers across all servers
- list: show installed allow rules
- allow: add src→dst:port allow rule
- revoke: remove an allow rule
Extract shared SSH-mesh flags (--servers, --ssh-key, --ssh-user,
--ssh-port, --concurrency, --ssh-timeout) into cmd/common.SSHMeshFlags
so both `init` and `firewall` reuse the same flag set. Trim duplicated
flag definitions from cmd/init/flags.go accordingly.
Internal packages added:
- internal/firewall/rule.go: AllowRule model + iptables rule rendering
- internal/firewall/discover.go: fan-out container discovery via podman ps
- internal/firewall/list.go: fan-out rule listing via iptables-save
- internal/firewall/apply.go: apply/revoke rules over SSH
- internal/firewall/persist.go: rule persistence helpers
- internal/models/firewall.go: ContainerRow / AllowRuleRow display models
Full unit-test coverage added for all new packages.
Add `application previews delete` subcommand to delete PR preview
deployments. Includes service method, CLI command with confirmation
prompt and --force flag, and full test coverage.
Add --disable-dns to podman network create so netavark never starts
aardvark-dns on the bridge gateway IP:53 — coold owns that socket for
cluster-wide service discovery (CONTROL_PLANE.md §5).
- CooldServiceUnit takes bridgeGatewayIP param; injects
COOLD_BRIDGE_GATEWAY_IP and COOLD_DNS_ZONE env vars into systemd unit
- podmanNetRecreateCmd drops and recreates network to fix pre-alpha
drift where dns_enabled=true; phase2 detects via PodmanDNSEnabled
- Add namespace column to service_endpoints schema (reserved for
per-app isolation / multi-tenant scoping)
- Pass containerAssignments to phase3Server
- Document port 53 conflict handling layers in CONTROL_PLANE.md
Replace boolean `healthy` column with `state` (liveness) and `health`
(readiness) columns in the CR-SQLite schema.
Add sha256-based schema drift detection: Probe reads the remote schema
file hash into CorrosionSchemaSha256; BuildPlan triggers
ActionWriteCorrosionSchema when hash mismatches; phase3Server stops
corrosion and wipes the DB before writing the new schema so CR-SQLite
can re-bootstrap cleanly.
Fix systemd activation: use `enable` + `restart` instead of
`enable --now` so already-active services pick up new config without a
separate reload step.
Add --install-coold flag to `coolify init` that uploads and installs the
corrosion (gossip/CRSQLite) and coold (host agent) binaries on each node.
- New internal/services package: pure config generators for corrosion
TOML, CoolifySchemaSQL, and coold systemd unit; ELF64/aarch64 validator
- wireguard.DesiredMesh gains InstallCoold, binary paths/shas, and port fields
- apply/plan wire pre-flight checks: ELF arch validation + SHA-256 hashing
before any SSH connection is opened
- SSH client gains helpers used by the wireguard apply layer
Replace env-injection placeholder with full DNS-via-coold design.
Covers Corrosion schema, embedded DNS server pseudocode, resolution
flow, backend movement, health/staleness, failure modes, and REST API
surface for the service discovery subsystem.
Replace per-rule systemd dropin approach with coold-owned DB + batch apply.
Adds division of labour table, updated API surface (/v1), nftables set optimization
note, and scale comparison (iptables-restore vs nft vs per-rule).
Document coold as the security/audit layer between Coolify control
plane and the podman socket. Add architecture diagram showing the
communication flow. Update all references from direct podman socket
access to coold REST API over wg0.
Also add comment to enablePodmanSocketCmd clarifying the socket stays
Unix-only and is never exposed on TCP.
Add `coolify init plan` and `coolify init apply` commands for
bootstrapping a WireGuard full-mesh overlay between servers.
- SSH fanout to reconstruct current WireGuard state per host
- Plan engine diffs desired vs actual mesh (peers, IPs, firewall)
- Apply executes plan idempotently over SSH with concurrency control
- Podman install + coolify-mesh bridge network setup
- iptables firewall rules with optional default-deny container policy
- Subnet allocators for mgmt pool (100.64.0.0/16) and container pool (10.210.0.0/16)
- CONTROL_PLANE.md spec for v5 control plane responsibilities
Document that version is injected at build time via ldflags and that
the post-release update-version CI job handles committing the bump to
internal/version/checker.go. Remove the manual version bump step from
the pre-release checklist.
Refactor `coolify docs llms` to emit two AI-oriented artifacts:
- `llms.txt` as a concise operating guide
- `llms-full.txt` as the exhaustive command and flag catalog
Update tests to cover quick/full generation, document both files in README,
and adjust CI to fail when either generated file is out of date.