1.9.0 added server.timeout = 300s to reap dead mobile connections (B3). But
Node's socket timeout fires on INACTIVITY, and a paused audio stream is
inactive (no bytes flow while backpressured) -- so a pause longer than the
timeout had the server destroy the stream's connection, forcing a reconnect
on resume. On both web and iOS that surfaced as 'I pause, then have to focus
the app for it to play again' after a multi-minute pause; pre-1.9.0 had no
such timeout, so paused streams survived (the exact D1 risk the spec flagged).
Reap genuinely dead/half-open peers (mobile network gone without FIN/RST) via
TCP keepalive instead: server.timeout = 0, and each connection gets
setKeepAlive(true, 30s) so the OS drops a socket once probes fail while a
paused-but-alive stream keeps answering and stays connected.
Production showed 24 unique-constraint violations on
DiscoveryAlbum(userId, weekStartDate, rgMbid) in 18h: the scan-completion and
reconciliation paths can both create Discovery records for the same album in
the same week, so the second create threw, rolled back the transaction, and
dropped that album's DiscoveryTrack records. Upsert makes it idempotent --
an existing record is left untouched and the track loop fills any gaps.
Audio engine rewrite, audiobook session model, podcast auto-refresh recovery,
functional settings, and the stream/QoL hardening from this cycle. Full notes
in CHANGELOG.md.
- Library auto-sync cron skips enqueuing when a scan is already active/waiting,
so it can't stack a redundant full rescan behind a manual or webhook scan.
- Subsonic star.view is now best-effort: it attempts every id, skips missing
tracks (P2003), logs genuine failures, and never early-returns mid-loop
(which left some tracks starred while reporting failure). It reports an error
only when a real failure occurred and nothing got starred.
- refreshPodcastFeed upserts episodes on (podcastId, guid) instead of
find-then-create, closing a TOCTOU race between the manual refresh route and
the auto-refresh job that could throw on the unique constraint.
- Onboarding: rename the shadowing 'user' var in the recovery path for clarity.
Review found the allowlist (9 prefixes) missed ~15 real cache namespaces
(homepage:, mixes:, search:, discovery:, colors:, preview:, fanart:, songlink:,
genres:, radio:, album:, artist:, playlists:, ...), so 'Clear Caches' was a
partial clear that would leave most caches stale. Inverted to a denylist that
spares only the operational namespaces (bull:, sess:, audio:, clap:,
enrichment, lock:, sse:) and clears everything else -- complete and drift-proof
as new caches are added. Verified read-only against production: clears 5210
cache keys, protects all 190 operational keys (queues, control plane).
Review found the 15-min grace was effectively inert: the phase parked an
entity as 'enriching'/'_queued' even when the add() no-op'd against a failed
jobId still held within the grace window -- removing it from selection until a
process restart, so the advertised auto-retry never happened. Each phase now
checks queue.getJob(jobId) and only enqueues + parks when the slot is actually
free; a held slot is skipped, leaving the entity selectable so it backs off
and genuinely retries once the grace clean frees the slot. Adds a test
asserting a held slot is skipped (no re-add, no park).
- Subsonic star.view swallowed every error and returned success, so a
third-party app could star a track that never saved. Now only a P2003 FK
violation (track legitimately missing) is absorbed; any other error is
logged and returns a Subsonic error. Scrobble play-log failures are logged
instead of silently discarded.
- The podcasts page sorted by author/title with a raw localeCompare on an
optional field, so one feed with no author crashed the whole page via the
error boundary. Comparators are now null-guarded.
- The audio analyzer re-logged the same 'N tracks permanently failed' warning
every idle cycle (~50s) forever; it now logs only when the count changes.
Two settings the UI presented as working did nothing. The transcode cache
size slider was saved to the DB but only ever read from the TRANSCODE_CACHE_MAX_GB
env var, which the save path never wrote -- so the slider was inert even
across the restart its own hint told the user to perform. It's now written to
.env on save, matching the restart-required contract.
The 'Auto sync library' toggle had zero readers because no periodic library
scan existed at all (scans were webhook/manual only). Adds a library-sync cron
(every 6h, gated on the autoSync setting) that enqueues a full scan so music
added outside the download pipeline is picked up automatically.
The podcast dedup-on-failure trap was live on three more queues. The artist
and mood-tags phases never cleaned their queues at all, so a failed job's
jobId marker blocked re-queue until BullMQ's 24h removeOnFail age expired --
far slower than the worker's documented intent to re-pick-up a failed track.
The admin vibe start/retry routes cleaned only completed jobs, so 'Retry
failed embeddings' silently dropped tracks with a lingering failed job.
Automatic phases now clean completed (grace 0, immediately reusable on
success) and failed (15-min grace, so a permanently-failing entity retries
on a backoff instead of every 5s cycle). The manual admin retry routes clean
failed immediately -- the user asked to retry now. Adds a 3-test regression
suite asserting the grace-0-completed / grace-positive-failed split.
The Clear Caches button never did anything: the handler used the node-redis
v4 scan signature (options object + { cursor, keys } result) against our
ioredis client, whose scan takes positional args and returns [cursor, keys].
Every call threw and cleared nothing -- which is why clearing the cache did
not dislodge the wedged podcast jobs.
Even had it run, "delete every key except sess:" would have wiped live
BullMQ queue state (bull:*, 200+ keys) and the enrichment/audio/clap control
plane. Replace that with an allowlist of genuine rebuildable caches
(MusicBrainz, cover art, Last.fm, Wikidata, Deezer, iTunes, hero images) and
delete in chunks. Verified read-only against production: clears ~5130 cache
keys, preserves all bull:/audio:/enrichment:/clap:/sess: keys.
BullMQ keeps the jobId dedup marker for failed jobs, not just completed
ones. The podcast and vibe refresh phases cleaned only "completed", so a
single failed (or Redis-corrupted, data-less) job kept its jobId marker
forever -- every later add() with that jobId silently no-op'd and the entity
never refreshed again. In production all 4 podcasts were frozen since a job
corruption event; the worker was throwing findUnique({ id: undefined }) on
data-less jobs.
Fix:
- podcast + vibe phases clean BOTH "completed" and "failed" so a failed
job's jobId is reusable.
- podcast phase optimistically advances lastRefreshed for the selected feeds
before queuing -- refreshPodcastFeed only advances it on success/304, so
this gives a failing feed a real backoff window instead of being re-queued
every cycle.
- podcast worker guards against corrupt/data-less jobs (clear error instead
of a confusing Prisma undefined-id throw).
Adds a 5-test regression suite asserting the failed-set clean and the
claim-before-queue ordering. Production Redis cleared of the poisoned jobs.
Delete the per-user stream eviction that truncated actively-playing streams
(B1/B10); add server socket timeouts so dead peers cannot accumulate (B3);
run transcodes through the real queue with a 120s watchdog kill (B6); bound
the ABS proxy at 15s and cache track resolution for seeks (B2); replace the
1-year cache header with private/1h/must-revalidate plus conditional 304s
(B4); key the transcode cache on mtime equality + source size (B5); align
all range-serving surfaces on 416-or-ignore semantics per RFC 9110 (B8/B11);
fix the podcast stream rate-limit exemption (B7); release the play-log claim
on failed inserts (B12); cache audiobook track maps at sync time and expose
tracks/trackCount on list+series endpoints with an explicit tracksUnavailable
signal (FE1 backend half); fix the play-adjacent writer that left numTracks
NULL. Drop the never-read musicPath from AudioStreamingService.
- A seek past a file whose stored size is wrong made Audiobookshelf return 416,
which axios surfaced as a 500. The service now lets 416 through and the route
sends a clean 416 (Content-Range forwarded, upstream stream destroyed) instead
of piping the upstream error body into the audio element.
- Sync now skips items with more than 1000 audio files: those are mis-cataloged
libraries imported as one book (the source of multi-thousand-hour, tens-of-GB
records that broke seeking). Track count, not duration -- legitimate omnibus
editions legitimately run 50-65h.
Two follow-ups from review of the critical-path trim:
- A synchronous in-process claim gates the now-background play-logging so two
concurrent stream requests for the same track can't both insert a Play row
inside the 30s window (the fire-and-forget change had widened that race).
- The no-settings-row quality fallback is now "original", matching the schema
default, instead of "medium" -- a user without a settings row no longer gets a
pointless first-play transcode.
Measured from real device traces: fresh track start was ~2.3s vs ~25ms to
resume an already-loaded track. Part of that was the stream route doing
sequential DB work before the first byte -- a recent-play lookup, a play insert,
and a settings read, all awaited up front.
Fetch the track row and the quality setting in parallel (one round-trip, not
two), and fire the play-history logging in the background instead of awaiting it.
Neither needs to gate playback. The bulk of the remaining latency is client-side
buffering of multi-hour audiobook files seeking to a saved offset, tracked
separately.
The preview hook only stops spinning when the request resolves or rejects. The
RSS parse had a 30s timeout and the client had none, so a slow/dead feed left
the spinner up 30s+ with no error -- the "infinite loading" in #168 (the v1.7.13
fix only handled the error path, not the hang).
- Frontend: previewPodcast aborts after 20s, surfacing the existing error UI.
- Backend: the two preview RSS parses are bounded to 8s (non-critical, already
falls through to partial data), so a slow feed returns the podcast quickly.
Adds a soulseekMode (p2p|slskd) setting to route Soulseek through an external slskd REST instance, so slskd mode needs no Kima-side Soulseek credentials. Includes the review fixes: https transport, reconnect on backend change, slskdUrl validation, mode-aware connection test, queue position, bounded size cache. Closes#164. By gossip31.
The first #197 fix only hardened the pub/sub subscriber; a 3-model review panel
found it incomplete. This closes the rest:
- publish() now runs on a dedicated soft-options connection (enableOfflineQueue,
infinite retries) instead of the strict shared client -- that strict publish
was still throwing the same "Stream isn't writeable" error under load.
- subscriber lifecycle: terminal "end" drops the cache, a failed psubscribe
disconnects the half-open socket instead of leaking it; transient drops
self-heal via auto-reconnect.
- both subscribe and publish are time-bounded so an unreachable Redis fails the
request instead of hanging indefinitely.
- analyzer failures ({success:false, embedding:null}, no error field) are now
rejected cleanly instead of passing null into the pgvector cast (500).
- the analyzer publishes a failure response on internal exceptions so the caller
fails fast instead of waiting out the full 15s timeout.
Reviewed by Opus/Sonnet/Haiku panels twice (original confirmed INCOMPLETE,
rewrite SHIP-WITH-CHANGES); surviving findings applied, two rejected with reason
(no publisher churn on transient error; keep setMaxListeners(0) to not re-trigger
the warning flood).
The reporter's 200k-track failure may also involve Redis memory pressure or
Python-analyzer saturation, which this makes tolerable but does not itself
resolve -- pending their redis INFO.
ensureSubscriber duplicated the parent Redis client, inheriting
enableOfflineQueue:false + maxRetriesPerRequest:0, so psubscribe threw 'Stream
isn't writeable' when the subscriber socket wasn't connected yet -- and the
rejected promise was cached, breaking vibe text search permanently until restart
(worsens with library size). The subscriber now gets its own offline queue +
retries, resets the cached promise on rejection, and drops it on 'end' so the
next request reconnects.
T7: deleteRejectedAlbum and the clear endpoint do an atomic claim-then-delete
(updateMany where status=ACTIVE inside a transaction) so a concurrent /like
cannot lose its album; /like is symmetric and returns 409 on a lost claim. Multi
-step deletes are transaction-wrapped (torn-state fix), a pre-check guards files
before the out-of-tx Lidarr delete, and the owned-Album lookup is filtered to
location=DISCOVER so a same-rgMbid LIBRARY album is never deleted.
T8: cancelled batch marked failed, not completed, so /current does not treat it
as a successful empty week.
T9: /generate and the cron drop a completed/failed BullMQ job hash before
re-enqueue (silent-drop fix), and the cron enqueue takes the distributed lock.
The retry IIFE force-completed the batch and queued a discover-retry-unavailable
scan that scanProcessor ignores, so retried albums downloaded but never entered
the playlist. Now hands off to checkBatchCompletion (Lidarr wait, completion
scan, buildFinalPlaylist + reconcile, final status) and adds a top-level catch
that marks the batch failed on a background crash.
The buildFinalPlaylist catch logged the error but never updated the batch row,
leaving it stuck in scanning until the 30-min sweep. Now sets status=failed with
a 'Playlist build failed' errorMessage (distinct from the no-tracks
short-circuit). Test asserts the catch specifically fires via that discriminator.
/current and /retry resolve the view week from the latest completed batch
(bounded, with a stale flag) so records whose weekStart drifted are no longer
invisible; /batch-status reports the last terminal batch so the client can
detect a completion it missed. Cron moves to Monday 05:00 and both cron and
manual /generate derive the BullMQ dedup key from resolveGenerationWeekStart so
the batch week and dedup key cannot diverge. Adds a supertest route test for the
data-loss fallback path.
Adds lib/discoveryWeek.ts (resolveGenerationWeekStart, resolveViewWeek,
weekStartKey) as the single source of truth for week boundaries, and points
generation at it so a Sunday run tags the upcoming week instead of the ending
one. Pins TZ=UTC in jest.config so the date tests are host-independent.
BullMQ's jobId dedup keeps a marker hash in Redis after a job completes,
indefinitely. The removeOnComplete: { age: 3600 } setting removes the
completed-list entry but not the dedup marker, so subsequent adds with the
same jobId silently no-op forever.
In executePodcastRefreshPhase, every 5-second enrichment tick tried to
re-add stale podcasts with jobId podcast-${id}, but BullMQ saw the marker
from the original subscription's refresh, treated it as duplicate, and
dropped the add. The worker never ran, lastRefreshed never updated, the
SQL query kept returning the same stale rows, and no new episodes ever
appeared.
Fix: call podcastQueue.clean(0, 0, 'completed') before the add loop so
jobIds are reusable. Matches the pattern already in place on the vibe
queue per memory notes. Same anti-pattern likely affects the artist and
track queues but is invisible there because their SQL queries naturally
exclude already-enriched rows -- left for a follow-up audit.
When download.events emits 'error' (connection closed before transfer
completes), the promise wrapping the download lifecycle now resolves
with { success: false } immediately instead of hanging until timeout.
Mirrors the cleanup sequence used by the stream error and timeout paths.
Bundles:
- feat: persist UMAP map positions to DB (3216af1)
- fix: split galaxy camera sessionStorage by mode (da4e8d5)
- fix: remove iOS auto-resume that routed audio to speaker on earbud
disconnect (7b41b91)
- fix: Deezer fallback for podcast search during iTunes outages (22f6613)
- chore: remove PodcastIndex scaffold and unused dep (e6f6dd1)
Review-driven polish:
- umapProjection: 98% coverage threshold so active enrichment does not
force full recompute; stable ORDER BY in hydrate path; detach
persistPositions so UMAP response is not blocked on the UPDATE;
extract shared TRACK_METADATA_COLUMNS SQL fragment.
- podcasts/search/deezer: extract mergeAndDedupePodcasts helper (4x
dedupe sites collapsed); type previewDeezerPodcast with Express
Request/Response; cache-comment searchPodcasts.
- drive-by: drop dead redisClient import in deezer.ts; unused
podcastId destructure in podcasts.ts; unused req in search.ts /genres.
The vibe map projection was only cached in Redis with a 24h TTL. On
every expiry (or container restart) the full UMAP worker ran again,
taking ~30s on an 8k-track library. Positions are deterministic once
the embedding set is fixed, so they belong in the DB.
Schema: add nullable map_x, map_y to track_embeddings. Metadata-only
ALTER on PG 11+, no table rewrite, zero downtime.
Service:
- doCompute() now persists positions via a single UPDATE ... FROM
UNNEST(...) that handles 8-15k rows in one statement.
- computeMapProjection() tries Redis, then DB, then UMAP. When the DB
has full coverage the UMAP worker is skipped entirely and the map
hydrates from a single join in ~100ms.
- appendTrackToProjection() writes the new KNN-interpolated position
to the DB so it survives Redis expiry.
- hydrateFromDb() returns null unless coverage is 100%, so a newly
enriched track without a position still triggers full recompute.
iTunes Search API had an outage that broke all podcast discovery.
Fan out to both Deezer and iTunes in parallel via Promise.allSettled
with title-based dedupe. Deezer results fill gaps when iTunes is
down; iTunes results preferred when available (have feedUrl for
subscription). Preview endpoint resolves Deezer-only podcasts via
iTunes name lookup for feed URL.
The podcastindex service references SystemSettings columns that
don't exist in the Prisma schema -- it would throw on first call.
No caller invoked it except the reset-cache hook in
systemSettings.ts, which is also removed. Drops podcast-index-api
from dependencies (last published 2021).
Four small fixes surfaced in the pre-release review pass:
1. Preview stream upstream cleanup on client disconnect
Both new Deezer preview proxy endpoints (/artists/preview/.../stream
and /playlists/.../preview/stream) now register res.on('close', ...)
to destroy the upstream axios stream when the client goes away.
Matches the existing audiobook stream pattern. Prevents upstream
TCP connection leaks when a user cancels a preview mid-flight.
2. Collapse IDOR message oracle in getOwnedPendingTrack
The pending-track-not-found (404) and wrong-playlist (404) branches
returned distinguishable error messages, letting an authenticated
user probe existence of other users' pending track IDs by supplying
their own playlist ID. Both branches now return the same generic
'Pending track not found' message.
3. Remove dead request-id guard in useTrackPreview
After the refactor to direct stream URLs in PR #178, the check
`if (requestId !== previewRequestIdRef.current) return` runs
immediately after the increment and is definitionally false.
Removed the check and the now-unused ref declaration.
4. Array guard for music-metadata discsubtitle field
music-metadata normally returns discsubtitle as a string, but
some tag formats (e.g., Vorbis with multiple DISCSUBTITLE frames)
can surface arrays. Added an explicit Array.isArray branch so a
future shape change doesn't silently write stringified arrays
to the database.
Found during pre-release review.
Prisma's default ASC sort uses Postgres NULLS LAST, so the compound
order `[discNumber asc, trackNo asc]` added in PR #170 reorders
tracks in mixed-state libraries: any album where some tracks have
been rescanned post-migration (getting discNumber) and others have
not (staying NULL) ends up with the numbered-disc tracks first and
the NULL-disc tracks dumped at the end, breaking the original
trackNo order that users expected.
Fix: use `{ sort: 'asc', nulls: 'first' }` on discNumber in all six
orderBy call sites (library/albums, offline, share, and the three
places in subsonic/library). This keeps NULL-disc tracks sorting
before disc-1 tracks, preserving pre-migration behavior for
all-NULL albums (the common case right after upgrade) and only
slightly reordering partially-rescanned multi-disc albums (rare,
and only until a full rescan completes).
Found during pre-release review.