mirror of
https://github.com/navidrome/navidrome.git
synced 2026-06-19 07:37:15 +00:00
838ceee26d
* perf(subsonic): speed up artist search3 deep-offset pagination
Empty-query and FTS artist search (search3/search2) paginated via a
CROSS JOIN library_artist + DISTINCT in Phase 1 purely for library access
control. The DISTINCT forced a temp b-tree over the whole junction table on
every page, making deep offsets O(offset): ~200ms at offset 299k on 300k
artists.
Replace it with a join-free EXISTS predicate keyed on artist.id, backed by a
new covering index on library_artist(artist_id, library_id). EXISTS keeps
artist as the ordered driver and never fans out rowids, so Phase 1 stays a
plain ordered scan that LIMIT/OFFSET can short-circuit. Admin, headless, and
all-libraries users skip the filter entirely (the dominant case) for a flat
ordered walk over the primary key.
Measured on a 300k-artist / 1M-song library: admin/all-libs pagination is
~4.5-5.4x faster at depth (~180ms to ~33ms at offset 400k); restricted
subset users keep correct, gap-free pages while also getting faster.
The narrowing artist filter is applied at the subsonic layer only when the
request targets a strict subset of the user's libraries, so the common case
(and the admin fast-path) is never burdened with a redundant predicate.
* fix(subsonic): narrow artist search by library set, not count
narrowsArtistLibraries decided whether to add the subsonic-layer artist
narrowing filter by comparing len(requested) < len(accessible). musicFolderId
is not deduplicated, so duplicate IDs inflated the requested count: a user
requesting ?musicFolderId=1&musicFolderId=1&musicFolderId=2 against three
accessible libraries produced len([1,1,2])==3, which is not < 3, so the filter
was skipped and the user saw artists from the third library too.
Compare as set membership instead: the request narrows iff some accessible
library is absent from it (requested is always a subset of accessible, validated
upstream by selectedMusicFolderIds). This is immune to duplicate IDs. Add a
regression test that fails against the old length-based check.
Also consolidate the repeated EXISTS/no-DISTINCT/O(page) rationale that the
prior commit spread across five sites down to a single authoritative comment on
ArtistLibraryFilter, with the call sites referencing it.
* perf(subsonic): drop redundant library_artist covering index
The migration added an index on library_artist(artist_id, library_id) on the
theory that the restricted-subset artist-search EXISTS needed it to seek by
artist_id. Benchmarking on a 405k-artist / 5-library dataset showed no benefit:
the EXISTS subquery constrains both columns (artist_id = and library_id IN), so
SQLite already resolves it as a covering-index seek on the existing
(library_id, artist_id) UNIQUE autoindex. With the new index present the planner
still picks the autoindex and ignores it.
Drop the migration and correct the comment. Removing ~11MB of dead index plus
its write-amplification on every library_artist insert/delete, for zero query
gain.
* fix(scanner): mark artists missing when they lose their last library
Artist search Phase 1 filters on artist.missing and Phase 2 inner-joins
library_artist, so a non-missing artist with no library_artist row (an orphan)
takes a pagination slot in Phase 1 and then vanishes in Phase 2, shortening the
page and shifting deep offsets. The admin/headless search fast-path walks artist
unfiltered, so it is fully exposed to this.
Two paths created such orphans without updating artist.missing:
- RefreshStats deletes library_artist rows whose stats are '{}' (artist lost all
content in a library) after every scan. This is the common source.
- Library deletion cascades away the library's library_artist rows.
Mark newly-orphaned artists missing at both sources, so the shared
'missing = false' search filter excludes them immediately instead of waiting for
a later scan. In RefreshStats the update only runs when the cleanup actually
removed rows (the only way a new orphan can appear), so steady-state scans pay
nothing; measured ~160ms on 300k artists only when orphans can exist.
* refactor(subsonic): address review feedback on artist search filter
Code-review follow-ups to the artist search pagination change:
- ArtistLibraryFilter: short-circuit to a constant-false predicate when no
library IDs are given, avoiding a degenerate empty IN () subquery.
- ArtistLibraryFilter: add an inner LIMIT 1 to the correlated EXISTS so SQLite
cannot flatten it into a fan-out join (an artist in multiple of the user's
libraries would otherwise yield duplicate rowids and corrupt pagination).
- narrowsArtistLibraries: compare accessible-vs-requested as a set lookup
instead of slices.Contains in a loop.
- searchConfig.LibraryFilter: document that a join-free filter is now a
correctness requirement (DISTINCT was removed), not just a performance one.
* docs: trim verbose comments in artist search/orphan code
Condense the over-explained comments added in this PR to the essential 'why',
removing repeated cross-references and restatements of the adjacent code.
* fix(scanner): heal pre-existing orphan artists on full refresh
The orphan-marking added to RefreshStats only ran when its empty-stats cleanup
deleted rows, so it reconciled newly-created orphans but not ones already left
in the database by older versions (whose library_artist row was deleted before
this fix existed). Such legacy orphans would surface in the admin/headless search
fast-path as short/gappy pages.
Also run the orphan-marking on a full refresh (allArtists), so a full scan — which
upgrades commonly trigger and users can run manually — reconciles the backlog. No
migration needed; the runtime fixes prevent recurrence.
* perf(subsonic): extend artist search fast-path to all-library users
applyLibraryFilterToSearchQuery only skipped the library filter for admin and
headless processes. A regular (non-admin) user who can access every library has
the same result set as an admin, but was still given the EXISTS filter — an
O(offset) cost for a predicate that matches every non-missing artist anyway.
Skip the filter for them too, using a cheap library CountAll() (a count over the
tiny library table) compared against the user's library count. On any error it
falls back to the filtered path, which is correct, just slower.
* fix(scanner): log error as trailing arg, not explicit error key
Signed-off-by: Deluan <deluan@navidrome.org>
* test(scanner): e2e guard for orphan artists under PurgeMissing
Adds an end-to-end scanner test for the orphan-artist invariant fixed in
RefreshStats: with Scanner.PurgeMissing enabled, removing all of an artist's
files hard-deletes them, cascades away their media_file_artists rows, and
RefreshStats then drops the artist's emptied library_artist row. The test
asserts no non-missing artist is left without a library_artist row. Verified it
fails without the RefreshStats orphan-marking and passes with it.
* test(scanner): assert the orphaned artist is marked missing
The orphan e2e test only checked the aggregate no-orphan invariant
(orphanCount == 0), which a fully-deleted artist or an un-cleaned row would also
satisfy — so it could pass without exercising the fix. Assert Pink Floyd's row
specifically: missing=false before, missing=true after, and absent from the
non-missing results. Verified it fails without the RefreshStats orphan-marking.
* test(scanner): drop misleading non-missing-list assertion for orphan
GetAll has no default missing filter, but selectArtist inner-joins library_artist,
so an orphaned artist (no junction row) is excluded from the results whether or
not it is marked missing. The Not(ContainElement) check therefore passed for the
wrong reason. The direct floydMissing() == 1 query is the assertion that actually
validates the missing flag; keep that plus the orphan-count invariant and an
over-marking guard on The Beatles.
* test(scanner): document why orphan check reads the artist row directly
Clarify that GetAll cannot observe the orphan: selectArtist inner-joins
library_artist, so an artist with no junction row is excluded from results
whether or not it is marked missing. Asserting on GetAll would pass even without
the fix, so the test reads the artist row directly to check the missing flag.
* test(scanner): return descriptive artist state for clearer failures
floydState returns PRESENT/MISSING/NOT_FOUND instead of 0/1/-1, so a failure
reads '<string>: PRESENT to equal MISSING' rather than '0 to equal 1'.
* refactor(subsonic): keep artist library scoping in the repository
The search endpoint built a persistence-layer EXISTS predicate
(persistence.ArtistLibraryFilter) and injected it into artistOpts.Filters — the
only place the subsonic package reached into persistence, leaking a storage
detail up two layers.
Pass the same Eq{"library_id": ids} filter used for albums and songs, and let
the artist repository translate it to the join-free library_artist predicate
(scopeSearchToLibraries), where the junction knowledge belongs. The subset-vs-
fast-path decision moves there too, so narrowsArtistLibraries and the persistence
import are gone from the subsonic layer. Behavior is unchanged; coverage for the
translation moves to artist_repository_test.
* refactor(persistence): extract canonical markOrphansMissing helper
The 'mark non-missing artists with no library_artist row as missing' invariant
was hand-written as SQL in two places (RefreshStats and libraryRepository.Delete),
in two slightly different dialects (not exists vs id not in). Extract a single
artistRepository.markOrphansMissing method next to markMissing and call it from
both sites, so the invariant has one definition.
* fix(persistence): apply scoped library filter in both search phases
Two bugs from moving the artist library-scoping into the repository:
- Search() scoped opts.Filters for Phase 1 but still passed the original
(unscoped) options to selectArtist, so Phase 2 re-applied the raw
Eq{library_id} against the wrong columns and a restricted user's search
returned nothing. Pass the scoped opts to both phases.
- scopeSearchToLibraries dropped the filter unconditionally for admins, so an
admin explicitly narrowing via musicFolderId (e.g. search3?musicFolderId=2)
leaked content from other libraries. Compare the request against the user's
visible library set (all libraries for admin/headless), narrowing whenever it
is a strict subset.
Both regressions were caught by the server/e2e multi-library suite.
* fix(core): delete library and reconcile orphans in one transaction
libraryRepository.Delete runs the FK-cascade delete and the orphaned-artist
reconciliation (markOrphansMissing) as two writes on r.db. Called directly they
autocommit separately, so an interruption between them could leave non-missing
artists with no library_artist row — the orphan state the artist search
fast-path forbids. Wrap the deletion in ds.WithTx at the core wrapper so both
writes commit atomically; the watcher/scanner/broker side-effects stay
post-commit.
* refactor(persistence): unify artist search library scoping into one filter
Phase 1 previously applied two overlapping library predicates: cfg.LibraryFilter
(scoped to the user's libraries) AND options.Filters (the requested subset),
producing two correlated EXISTS subqueries per rowid even though the request is
always a subset of the user's libraries. And the 'does this user see everything'
decision was implemented twice (userHasAllLibraries via CountAll vs
scopeSearchToLibraries via set-membership), with applyLibraryFilterToSearchQuery
as a third scoping path.
Resolve the effective library scope once in Search() via searchScope (intersect
the requested set with the user's visible libraries; nil = fast-path), clear
opts.Filters, and realize that single scope as the only Phase-1 LibraryFilter.
The visibility logic is now one pipeline: requestedLibraryIDs + visibleLibraryIDs
+ userSeesAllLibraries. Behavior unchanged; one EXISTS instead of two on the hot
path, one source of truth for library visibility.
* fix(persistence): harden artist search against malformed library_id filter
Search consumed only an Eq{"library_id": []int} filter; an Eq whose library_id
value wasn't []int slipped through unconsumed and would reach Phase 1's bare
artist table (no library_id column) → SQL error. Recognize any Eq carrying a
library_id key (isLibraryIDFilter) and always consume it, falling back to the
user's visible scope for a malformed value. Non-library filters are still left
in place for doSearch.
* refactor(persistence): trim redundant comments and unexport artist library filter
The artist-search-pagination work left dense explanatory comments, with the
join-free / LIMIT-1 anti-flatten rationale and the orphan-artist mechanics each
restated in several places. Consolidate each rationale into one canonical home
(artistLibraryFilter for the EXISTS/LIMIT-1 trick, markOrphansMissing for the
orphan lifecycle) and have the other sites reference it instead of repeating it.
Also unexport ArtistLibraryFilter to artistLibraryFilter: its only caller is
searchCfg in the same package and no test references it, so it never needed to
be part of the package's exported surface.
Comments only plus the rename; no behavior change.
* refactor: add slice.ToSet and use it for the artist search subset check
searchScope's subset test compared the requested libraries against the visible
set with a nested slices.Contains, which is O(visible * requested). On an instance
with many libraries (e.g. 100 libraries, a user granted 99) and an explicit
musicFolderId request, that is ~9.8k comparisons; with a set it is ~200.
Add a small reusable slice.ToSet helper (a slice -> map[T]struct{} set, collapsing
duplicates) and use it to make the membership lookups O(1), restoring O(n+m) without
the throwaway struct{}{} literal that an inline ToMap would need. No behavior change.
* refactor(artist): move artistLibraryFilter to artist_repository
Signed-off-by: Deluan <deluan@navidrome.org>
---------
Signed-off-by: Deluan <deluan@navidrome.org>