Files
kima-hub/docs/soulseek-search-improvements.md
T

4.4 KiB

Soulseek Search Improvements - Test Results

Changes Applied

Task 1: Aggressive Track Title Normalization

  • Strip classical music metadata (movement numbers, opus numbers, key signatures)
  • Strip featuring artists (feat., ft., featuring variations)
  • Three normalization levels: aggressive, moderate, minimal

Task 2: Reverse Search Strategy Priority

  • Reordered from complex-first to simple-first
  • Priority order: artist-title-aggressive → artist-title-moderate → title-only-aggressive → album-title → artist-album-title

Task 3: Reduce Search Timeout

  • Changed from 45 seconds to 15 seconds per strategy
  • Based on research: slsk-batchdl uses 6s, community recommends 10-15s

Task 4: Lower Score Filter Threshold

  • Changed threshold from 20 to 5 (more lenient)
  • Increased max alternatives from 10 to 20
  • Added scoring system documentation

Task 5: Query Length Validation and Logging

  • Skip strategies producing >100 character queries
  • Warn for queries >80 characters
  • Enhanced debug logging with query lengths

Before vs After

Before Optimization

  • Search timeout: 45 seconds per strategy (225s max for 5 strategies)
  • Query example: "Joshua Kyan Aalampour Enemies to Lovers Butterfly Lovers Violin Concerto: I. Adagio Cantabile" (101 chars)
  • Strategy order: Complex first (artist+album+title)
  • Score threshold: 20 (strict)
  • Max alternatives: 10
  • Success rate: ~30% (baseline from discovery log 2026-02-10)
  • Alternatives found: 1-2 per search

After Optimization

  • Search timeout: 15 seconds per strategy (75s max for 5 strategies)
  • Query example: "Joshua Kyan Aalampour Butterfly Lovers" (39 chars) after aggressive normalization
  • Strategy order: Simple first (artist+title)
  • Score threshold: 5 (lenient)
  • Max alternatives: 20
  • Success rate: [To be measured]
  • Alternatives found: [To be measured]

Integration Test Plan

Test Steps

  1. Restart Backend

    cd /run/media/chevron7/Storage/Projects/kima/backend
    npm run dev
    
  2. Trigger Discovery Generation

  3. Monitor Backend Logs

    tail -f backend/logs/playlists/session.log | grep SOULSEEK
    
  4. Check Discovery Log

    tail -100 backend/data/logs/discovery/discovery-*.log | grep -E "Acquired|Failed"
    

Expected Patterns in Logs

Query Format:

[Search #1] Strategy "artist-title-aggressive": "Artist Title" (45 chars)
[Search #1] Found 12 files from 15 users in 15001ms
[Search #1] MATCH: 01 - Title.flac | FLAC | 25MB | User: user123 | Score: 145

No queries over 80 chars (or WARN logged if present) Searches complete in ~15s, not 45s More alternatives per search (targeting 5-10 per successful search)

Verification Checklist

  • Backend starts without errors
  • Discovery generation completes
  • Session logs show query lengths <80 chars for most tracks
  • Session logs show "artist-title-aggressive" tried first
  • Searches complete in ~15 seconds (check timestamps)
  • Discovery log shows improved success rate (target: 70%+)
  • Multiple alternatives found per successful search (target: 5-10)

Test Results

[Fill in after manual testing]

Success Rate

  • Tracks attempted: [count]
  • Tracks acquired: [count]
  • Success rate: [percentage]
  • Improvement vs baseline: [+X%]

Query Length Analysis

  • Average query length: [X chars]
  • Queries >80 chars: [count]
  • Queries >100 chars: [count] (should be 0)

Search Performance

  • Average search duration: [X seconds]
  • Fastest search: [X seconds]
  • Slowest search: [X seconds]

Alternatives Found

  • Average alternatives per successful search: [X]
  • Total unique sources found: [X]

Research Sources

Conclusions

[To be filled in after analyzing test results]

What Worked Well

[List successful optimizations]

Areas for Further Improvement

[List remaining issues or opportunities]

Recommendations

[Suggest next steps based on test results]