Progress was saved every 30s off the "timeupdate" event, but iOS throttles and
suspends that event when the PWA is backgrounded (screen off) -- the normal way
people listen to audiobooks. So a long screen-off session was never
checkpointed, and an app update (or crash) reverted to the moment the screen was
locked. The saved data was never lost; it just stopped advancing in the
background.
Replace the timeupdate-driven save with a 15s wall-clock setInterval that runs
while playing (started on "play", stopped on "pause"/"ended"), independent of
the media event iOS throttles. saveAudiobookProgress already de-dupes an
unchanged position and the tick is gated on isPlaying(), so paused/stalled ticks
are no-ops. Applies to podcasts too.
Playback that an iOS interruption (call/notification) pauses now resumes when
the interruption ends, the behaviour other apps have.
- Track play intent separately from audio.paused: set on play/tryResume/
swapAndPlay, cleared only by explicit pause/stop/cleanup. The native "pause"
event an interruption fires does NOT clear it.
- The AudioContext statechange listener resumes on an interrupted -> running
transition when intent is set and the element is paused. Gated hard: only that
transition (not the initial bridge resume or a background suspend), never
within 1.5s of an audio-route change (the v1.7.12 unplug-to-speaker
regression), and never while a stall reload owns the resume.
- Repair the trace auto-upload: it POSTed to a requireAuth route without the
Bearer token and swallowed the 401, so no iOS trace was ever captured. It now
sends the token, so device testing finally yields real event data.
Reviewed by Opus/Sonnet passes. Known limits to confirm on-device: only fires if
WebKit returns the context to "running" (a context stuck "interrupted" -- the
force-quit symptom -- is not addressed here).
The preview hook only stops spinning when the request resolves or rejects. The
RSS parse had a 30s timeout and the client had none, so a slow/dead feed left
the spinner up 30s+ with no error -- the "infinite loading" in #168 (the v1.7.13
fix only handled the error path, not the hang).
- Frontend: previewPodcast aborts after 20s, surfacing the existing error UI.
- Backend: the two preview RSS parses are bounded to 8s (non-critical, already
falls through to partial data), so a slow feed returns the podcast quickly.
Adds a soulseekMode (p2p|slskd) setting to route Soulseek through an external slskd REST instance, so slskd mode needs no Kima-side Soulseek credentials. Includes the review fixes: https transport, reconnect on backend change, slskdUrl validation, mode-aware connection test, queue position, bounded size cache. Closes#164. By gossip31.
Complements #204 (gossip31's pre-decode ffmpeg gate). The pre-decode gate
catches corrupt files that SIGSEGV the decoder, but a worker that dies on any
other native fault (e.g. an Essentia analysis crash after a clean decode) still
left the track in 'processing' and got re-queued by the stale-cleanup sweep
WITHOUT incrementing analysisRetryCount -- so it could loop forever and never
reach the mark-failed/quarantine path.
_cleanup_stale_processing now increments analysisRetryCount when it resets a
crashed track, and marks tracks that have passed MAX_RETRIES as 'failed' (with a
reason) so they quarantine and surface in the permanently-failed accounting
instead of sitting in 'processing' limbo. Defense in depth behind the gate.
Adds an ffmpeg integrity probe before MonoLoader so corrupt files that SIGSEGV Essentia become a normal load failure (and flow into the existing retry/quarantine) instead of crash-looping the worker. By gossip31.
The model-download layer failed three recent builds (06-04 x2, 06-05) with curl
exit 28: --max-time caps the whole operation including retries, so a slow
GitHub-runner transfer trips it, and --retry does not retry a timeout. Switched
all 12 downloads to --retry-all-errors (retries timeouts/transient HTTP),
stall-based abort (--speed-limit 1024 --speed-time 60) instead of a hard total
cap, 5 retries, and -f so a bad HTTP response fails fast instead of saving a
corrupt model. The transformers==5.8.1 pin is unaffected and confirmed building.
The reporter's redis INFO shows a healthy instance (33MB used, no maxmemory
limit, noeviction, zero evictions/rejected connections), ruling out the
memory-pressure hypothesis. The connection-readiness race the fix addresses is
the actual cause, so the hedge is removed.
The first #197 fix only hardened the pub/sub subscriber; a 3-model review panel
found it incomplete. This closes the rest:
- publish() now runs on a dedicated soft-options connection (enableOfflineQueue,
infinite retries) instead of the strict shared client -- that strict publish
was still throwing the same "Stream isn't writeable" error under load.
- subscriber lifecycle: terminal "end" drops the cache, a failed psubscribe
disconnects the half-open socket instead of leaking it; transient drops
self-heal via auto-reconnect.
- both subscribe and publish are time-bounded so an unreachable Redis fails the
request instead of hanging indefinitely.
- analyzer failures ({success:false, embedding:null}, no error field) are now
rejected cleanly instead of passing null into the pgvector cast (500).
- the analyzer publishes a failure response on internal exceptions so the caller
fails fast instead of waiting out the full 15s timeout.
Reviewed by Opus/Sonnet/Haiku panels twice (original confirmed INCOMPLETE,
rewrite SHIP-WITH-CHANGES); surviving findings applied, two rejected with reason
(no publisher churn on transient error; keep setMaxListeners(0) to not re-trigger
the warning flood).
The reporter's 200k-track failure may also involve Redis memory pressure or
Python-analyzer saturation, which this makes tolerable but does not itself
resolve -- pending their redis INFO.
The scheduled nightly off main failed (2026-06-04, and 2026-06-02 the same
way): a transformers release newer than 5.8.1 references torch.float8_e8m0fnu
(a dtype added in torch 2.7) at import time, so `from transformers import
BertModel` crashes against the pinned torch==2.5.1 and the Dockerfile
fail-fast check exits 1. The unpinned `transformers>=4.30.0` let pip resolve
to that bad release. Recent branch builds only passed because BuildKit reused
a cached pip layer from before it published.
Pinned to 5.8.1 -- the exact version running in prod against torch 2.5.1+cpu.
Bump only alongside a torch bump.
On the clean 439fa68 bridge baseline (band-aids reverted), add the two
high-confidence stability fixes the resume bug actually needs:
- setAudioSessionPlayback gains a `force` arg; play() now re-claims the
iOS "playback" session category on every explicit resume, not just the
first. The one-time latch was why iOS, after an earbud/Control-Center
interruption, left the session with whatever app grabbed it (a
sleep-sounds app started playing through it).
- A statechange listener on the bridge AudioContext re-claims the session
when the OS ends an interruption and the context returns to running. It
never calls play() -- auto-resume on a route change is the v1.7.12
earbud-unplug-to-speaker regression.
Reviewed by two independent passes; their findings fixed here: play() now
actually passes force=true (the reclaim was a no-op without it); the
statechange listener + AudioContext are torn down in destroy() (no leak);
em-dash normalized.
Deliberately NOT re-adding the silent-playback watchdog (part of the
reverted band-aid stack) -- the debug instrumentation will show whether an
interrupted-context resume is still silent, and any further recovery will
be a minimal targeted fix on evidence, not another speculative layer.
Reverts the daf6210 -> 7be3322 -> 1a9f6f4 cascade that piled onto the
bridge. Root regression was daf6210: it awaited setupAudioContextBridge
and bailed play()/tryResume with needs-resume whenever the context was
not "running" -- which forfeited the iOS user-gesture token AND returned
before audio.play() ever ran. So earbud/lock-screen resume went silent
or dead-ended on a Tap-to-resume prompt the lock screen cannot show, and
iOS eventually handed the audio session to another app. 7be3322 and
1a9f6f4 were band-aids on that regression.
Keeps 439fa68 (the bridge) so backgrounded/screen-off playback still
survives, and keeps the debug ring-buffer instrumentation. play() and
tryResume return to the baseline: fire the context resume in parallel,
always attempt audio.play(), preserve the gesture.
Temporary diagnostic for the earbud-resume bug: the installed iOS PWA has no URL
bar to set ?ios_debug=1 or reach /debug/ios-log, so capture is enabled
unconditionally on iOS standalone and the buffer auto-POSTs (debounced 3s) to
/api/debug/ios-log after each event burst. Revert once the resume bug is fixed.
ensureSubscriber duplicated the parent Redis client, inheriting
enableOfflineQueue:false + maxRetriesPerRequest:0, so psubscribe threw 'Stream
isn't writeable' when the subscriber socket wasn't connected yet -- and the
rejected promise was cached, breaking vibe text search permanently until restart
(worsens with library size). The subscriber now gets its own offline queue +
retries, resets the cached promise on rejection, and drops it on 'end' so the
next request reconnects.
The smoke spec asserted the play/pause button state immediately after Play all,
racing the first audio load on a cold container (player stayed 'Not Playing').
Poll audio currentTime > 0 first. Surfaced while running the suite pre-v1.7.16.
The MediaSession 'play' action called controller.play(), which awaits the
AudioContext bridge BEFORE audio.play(). That await forfeits the iOS
user-activation token from the earbud click, so an interrupted/suspended
AudioContext never resumes -- and play() then returns (not throws) on
ctx-not-running, so the handler's reloadAndPlay() fallback never fired. Result:
earbud resume produced no audio, no native 'playing' event, no playbackState
update, and after repeated no-audio play actions iOS reassigned the audio
session to the next app.
Adds resumeFromGesture(): fires the context resume without awaiting it, calls
audio.play() synchronously in the gesture tail (mirrors swapAndPlay), and on any
rejection reloads the source to re-grab the hardware session instead of a silent
needs-resume. Wired only into the explicit MediaSession 'play' action, so it
cannot auto-resume on an ambiguous pause/route-change (the v1.7.12 earbud-unplug
-> speaker regression stays fixed). play()/tryResume()/pause/silent-watchdog
untouched. Diagnosed via 4-lens + adversary review (SHIP-AS-IS).
Requires on-device confirmation (?ios_debug=1); cannot be unit-verified.
The one-time swipe hint crowded the bottom edge above the mini-player and read
as clutter; the swipe behavior is intentional and discoverable enough without it.
Removes the hint state, markup, the markHintSeen calls (swipe behavior unchanged),
the now-unused useCallback import, and the hint-in keyframe.
The desktop sidebar was rebuilt as UnifiedPanel but the externally-registered
settings content (discover settings gear, lyrics) was never ported -- clicking
the discover gear opened the panel to the activity feed instead of the settings,
because UnifiedPanel never read settingsContent or handled set-activity-panel-tab.
It now listens for that event, renders the registered settingsContent (which
carries its own header + back button), and resets to the feed on collapse. Fixes
discover settings and lyrics on desktop. Pre-existing since the sidebar rewrite.
The scrollMargin useLayoutEffect only ran on [rows.length], so the offset went
stale when layout above the list reflowed (e.g. the responsive hero crossing the
md breakpoint, ~52px). Masked today by the 12-row overscan, but wrong and fragile
if overscan is tuned. Added a ResizeObserver re-measure. (ultrareview bug_003)
The global :focus-visible rule set border-radius:2px on the focused ELEMENT
(not the outline), and being unlayered it overrode every Tailwind rounded-*,
collapsing circular play buttons, pills, and rounded modals to near-square
corners on keyboard focus. Removed the line; modern browsers already round the
outline via outline-offset. (ultrareview bug_005)
Regression from daf6210: the silent-playback watchdog was armed on the
song-to-song swapAndPlay transition and decided 'silent' purely from whether a
timeupdate EVENT arrived within 2.5s. iOS throttles that event when the PWA is
backgrounded, so after almost every song the watchdog wrongly paused healthy
playback and surfaced a Tap-to-resume error.
Fix: judge liveness by audio.currentTime advancement (the decode clock keeps
moving under event throttling, and is what the stall watchdog already trusts),
not the timeupdate event or AudioContext.state (which lies in both directions on
iOS). Never tear down while document is hidden -- defer to foreground. Disarm on
native playing/pause and on foreground. To keep daf6210's genuine deep-suspension
prompt (which tryResume misses because it short-circuits on !paused), add
isPlayingButContextSuspended() and prompt from handleForeground.
Affects installed iOS PWA only (isIosStandalone gate). Requires on-device
confirmation (phone backgrounded/pocketed); not reproducible at a desk.
T7: deleteRejectedAlbum and the clear endpoint do an atomic claim-then-delete
(updateMany where status=ACTIVE inside a transaction) so a concurrent /like
cannot lose its album; /like is symmetric and returns 409 on a lost claim. Multi
-step deletes are transaction-wrapped (torn-state fix), a pre-check guards files
before the out-of-tx Lidarr delete, and the owned-Album lookup is filtered to
location=DISCOVER so a same-rgMbid LIBRARY album is never deleted.
T8: cancelled batch marked failed, not completed, so /current does not treat it
as a successful empty week.
T9: /generate and the cron drop a completed/failed BullMQ job hash before
re-enqueue (silent-drop fix), and the cron enqueue takes the distributed lock.
The retry IIFE force-completed the batch and queued a discover-retry-unavailable
scan that scanProcessor ignores, so retried albums downloaded but never entered
the playlist. Now hands off to checkBatchCompletion (Lidarr wait, completion
scan, buildFinalPlaylist + reconcile, final status) and adds a top-level catch
that marks the batch failed on a background crash.
The buildFinalPlaylist catch logged the error but never updated the batch row,
leaving it stuck in scanning until the 30-min sweep. Now sets status=failed with
a 'Playlist build failed' errorMessage (distinct from the no-tracks
short-circuit). Test asserts the catch specifically fires via that discriminator.
/current and /retry resolve the view week from the latest completed batch
(bounded, with a stale flag) so records whose weekStart drifted are no longer
invisible; /batch-status reports the last terminal batch so the client can
detect a completion it missed. Cron moves to Monday 05:00 and both cron and
manual /generate derive the BullMQ dedup key from resolveGenerationWeekStart so
the batch week and dedup key cannot diverge. Adds a supertest route test for the
data-loss fallback path.
Adds lib/discoveryWeek.ts (resolveGenerationWeekStart, resolveViewWeek,
weekStartKey) as the single source of truth for week boundaries, and points
generation at it so a Sunday run tags the upcoming week instead of the ending
one. Pins TZ=UTC in jest.config so the date tests are host-independent.
- import/playlist: fix invalid Tailwind class bg-#0a0a0a -> bg-[#0a0a0a] so Refresh button is visible
- audiobooks: strip dead ?tab=system query param from settings link, keep #audiobookshelf anchor
- queue: remove GripVertical false affordance and unused import -- Up/Down buttons are the real reorder mechanism
- radio: add isError + Retry states for genre and decade station queries; sections no longer silently disappear on failure
- podcasts: add toast.error on handleRefreshAll catch using existing useToast pattern
- Rename primary CTA: "Initialize Generation" -> "Build This Week's Playlist"
- Add pre-generation disclosure grid (Finds / Downloads / Cleans up) surfacing
the /music/discovery download behaviour before the user triggers generation
- Two-phase progress labels mapped to real backend status values: "scanning"
-> "Finding artists..." and "downloading" -> "Downloading albums (N / M)..."
- Move disk-impact context cards into the empty state in the page's own visual
style; HowItWorks kept as fuller reference detail below the playlist
- Settings gear: add aria-label, title, and visible "Settings" text label;
min-h-[44px] touch target; focus-visible ring
- Load-error retry: expose loadError from useDiscoverData, show a Retry button
when the initial fetch fails instead of silently landing on an empty state
- HowItWorks styling: bg-[var(--bg-secondary)] border-[var(--border-subtle)]
replacing hardcoded #111/50 and border-white/5
- Token cleanup: hardcoded #eab308/#f59e0b replaced with --color-brand tokens;
#a855f7 replaced with --color-discover throughout the empty state
- All five destructive reset ops (artists, mood tags, audio analysis, vibe
embeddings, full enrichment reset) now gate behind ConfirmDialog with
op-specific titles and cost warnings; window.confirm removed
- ConfirmDialog upgraded with Escape dismiss, focus trap, role=dialog,
aria-modal, aria-labelledby/describedby, and auto-focus on Cancel
- Sidebar nav + section title: "Cache & Automation" -> "Library Enrichment";
"Artwork" -> "AI & Artwork"; AIServicesSection title "Artwork Services"
-> "AI & Artwork" -- scroll-anchor ids and component names unchanged
- Destructive maintenance buttons corralled into a collapsible "Maintenance
Operations" disclosure within CacheSection (no new sidebar entry)
- Enrichment status text wrapped in aria-live="polite" aria-atomic="true"
- Re-run buttons and all action buttons raised to min-h-[44px] / py-2.5
for 44px touch targets (was ~28px py-1)
- Non-admin sidebar already filtered by SettingsSidebar via isAdmin; no
additional plumbing required
Shows a dismissible pill hint above the mini-player bar the first time a
user sees it with media loaded. Persists seen state in localStorage under
kima_miniplayer_hint_seen. Hint uses pointer-events:none so all swipe
gestures pass through unobstructed; only the dismiss button captures
pointer events. markHintSeen() is called additively in handleTouchEnd
after the existing swipe logic -- no thresholds or behaviors altered.
Switch from useWindowVirtualizer to useVirtualizer targeting the #main-content
scroll container (which is overflow-y-auto / flex-1, not the window). Measure
the list container's offset within that element after data loads via
useLayoutEffect so scrollMargin is accurate and all 1000 rows are reachable.