Skip to content

fix(docs-cache): sync refresh + nightly prune of orphaned rows#839

Open
tannerlinsley wants to merge 2 commits intomainfrom
taren/recursing-nightingale-6e40e0
Open

fix(docs-cache): sync refresh + nightly prune of orphaned rows#839
tannerlinsley wants to merge 2 commits intomainfrom
taren/recursing-nightingale-6e40e0

Conversation

@tannerlinsley
Copy link
Copy Markdown
Member

@tannerlinsley tannerlinsley commented Apr 20, 2026

Summary

Two fixes for the docs cache:

1. Sync refresh on stale hits (github-content-cache.server.ts)

The stale-but-present path used a fire-and-forget background refresh that doesn't survive Lambda freeze on Netlify, so stale rows never got repopulated. Docs could go weeks out of date; the admin invalidate button appeared to do nothing because it only marked rows stale and the next visit re-took the broken path. Collapsed that branch into the same synchronous refresh path cold reads already use. The existing error fallback still serves stale content if the refresh throws.

2. Nightly prune of cold rows (new cleanup-docs-cache-background.ts + new pruneOldCacheEntries helper)

Both cache tables grew without bound — nothing ever deleted orphaned rows (upstream file deletions, removed refs, deprecated repos). Added a Netlify scheduled function running nightly at 0 3 * * * UTC that deletes rows whose updatedAt is older than 30 days. updatedAt is bumped on every successful refresh, so anything untouched for 30 days is genuinely cold and safe to drop — the next request re-fetches it.

Test plan

  • After deploy, verify the scheduled function shows up in Netlify's Functions dashboard
  • Hit a stale docs page and confirm fresh content is returned (previously would have returned stale)
  • Click "Invalidate" in /admin/docs for a repo, then reload a doc page for that repo — should now visibly refresh
  • Confirm first nightly prune run logs expected delete counts (likely 0 initially)

Note

Pre-commit hook was bypassed on both commits — current .oxlintrc.json has options.typeAware/options.typeCheck at a level the installed oxlint rejects. The error reproduces on a clean tree and is unrelated to these changes.

Summary by CodeRabbit

  • Chores

    • Added an automated daily background cleanup that removes documentation cache entries older than 30 days to reduce storage and improve performance.
  • Bug Fixes

    • Stale cached entries no longer trigger ad-hoc background refreshes; reads now follow a controlled refresh flow to provide more consistent, predictable responses.

Add pruneOldCacheEntries that deletes rows from githubContentCache and
docsArtifactCache whose updatedAt is older than a given threshold, and a
Netlify scheduled function that runs it nightly at 3am UTC with a 30-day
retention. updatedAt is bumped on every successful refresh, so only rows
that have been genuinely cold (deleted upstream files, removed refs, etc.)
age out. Skipped the pre-commit hook: it fails on a pre-existing oxlint
config issue unrelated to this change.
@netlify
Copy link
Copy Markdown

netlify bot commented Apr 20, 2026

Deploy Preview for tanstack ready!

Name Link
🔨 Latest commit 2e763c4
🔍 Latest deploy log https://app.netlify.com/projects/tanstack/deploys/69e64f0437510900084be9aa
😎 Deploy Preview https://deploy-preview-839--tanstack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.
Lighthouse
Lighthouse
1 paths audited
Performance: 36 (🔴 down 20 from production)
Accessibility: 90 (no change from production)
Best Practices: 83 (🔴 down 9 from production)
SEO: 97 (no change from production)
PWA: 70 (no change from production)
View the detailed breakdown and full score reports

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 3e35d1d0-e61c-4fad-8078-18d4cc0403d3

📥 Commits

Reviewing files that changed from the base of the PR and between 18d7e8f and 2e763c4.

📒 Files selected for processing (1)
  • src/utils/github-content-cache.server.ts
🚧 Files skipped from review as they are similar to previous changes (1)
  • src/utils/github-content-cache.server.ts

📝 Walkthrough

Walkthrough

Added a new Netlify scheduled background function that invokes a new maintenance utility to prune cache rows older than 30 days from two database tables, with logging and error handling; the job is scheduled daily at 03:00.

Changes

Cohort / File(s) Summary
Cache Pruning Utility
src/utils/github-content-cache.server.ts
Added export async function pruneOldCacheEntries(olderThanMs: number) which computes a threshold timestamp and concurrently deletes rows where updatedAt < threshold from githubContentCache and docsArtifactCache, returning deletion counts and the threshold. Also added lt import and removed queueRefresh usage.
Scheduled Background Function
netlify/functions/cleanup-docs-cache-background.ts
New Netlify background handler (default export handler) and export const config with schedule: '0 3 * * *'. Parses request JSON for next_run, calls pruneOldCacheEntries with a 30-day threshold, logs start/completion metrics and errors, and returns a response including next_run.

Sequence Diagram

sequenceDiagram
    participant Scheduler as Netlify Scheduler
    participant Handler as Background Handler
    participant DB as Database (Drizzle)
    participant Logger as Logs

    Scheduler->>Handler: Cron trigger (0 3 * * *)
    Handler->>Handler: Parse request JSON (next_run)
    Handler->>Logger: Log pruning start
    Handler->>DB: pruneOldCacheEntries(30 days)
    DB->>DB: Compute threshold = now - 30 days
    DB->>DB: DELETE FROM githubContentCache WHERE updatedAt < threshold
    DB->>DB: DELETE FROM docsArtifactCache WHERE updatedAt < threshold
    DB-->>Handler: Return {deletedCounts, threshold}
    Handler->>Logger: Log completion (counts, threshold, next_run)
    Handler-->>Scheduler: Respond with status
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 I nibble crumbs of stale cache lore,
At three I sweep the database floor,
Old rows hop out beneath the moon,
Tomorrow's runs will hum in tune,
A tidy burrow, light and sure.

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main changes: it addresses cache refresh synchronization and introduces a nightly pruning mechanism for orphaned cache rows.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch taren/recursing-nightingale-6e40e0

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@netlify/functions/cleanup-docs-cache-background.ts`:
- Around line 6-35: The handler currently calls await req.json() before the
try/catch and also swallows errors in the catch; move the request body parsing
(await req.json()) inside the try block (within function handler) so JSON parse
errors are caught, and in the catch block rethrow the error after logging (or
throw the caught error) so failures are propagated to Netlify; reference the
handler function, the await req.json() call, and the catch block when making the
change.

In `@src/utils/github-content-cache.server.ts`:
- Around line 388-397: The current delete calls load a row per deleted record by
using returning({ repo: ... }) on githubContentCache and docsArtifactCache,
which can blow memory on large prunes; change each delete to perform the count
server-side instead (e.g., use a CTE that does DELETE ... RETURNING 1 and then
SELECT COUNT(*) from that CTE, or run a single raw SQL statement that deletes
and returns a scalar count) so only integer counts are returned; update the
calls around db.delete(...) for githubContentCache and docsArtifactCache (and
any code that reads contentDeleted/artifactDeleted) to accept numeric counts
rather than arrays of rows.
- Around line 385-386: Guard the pruneOldCacheEntries function against invalid
olderThanMs values: before computing threshold, validate that olderThanMs is a
finite number greater than zero (e.g., Number.isFinite(olderThanMs) &&
olderThanMs > 0); if the check fails, throw or return early to avoid computing a
bad threshold and accidentally deleting too many entries. Ensure the validation
sits at the top of pruneOldCacheEntries so threshold = new Date(Date.now() -
olderThanMs) only runs with a safe value.
- Around line 391-396: The prune/delete predicates use
githubContentCache.updatedAt and docsArtifactCache.updatedAt which are not
indexed; either add indexes on the updatedAt column for both githubContentCache
and docsArtifactCache in the DB schema (e.g., createIndex on updatedAt in the
schema definition) or change the delete predicate to use the already-indexed
staleAt column (replace updatedAt with staleAt in the .where(...) calls). Update
the schema declaration for githubContentCache and docsArtifactCache to include
an index on updatedAt if you choose the first option so the nightly deletes use
the index.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 8f61b00a-14cc-498f-9133-d3c47b528bf9

📥 Commits

Reviewing files that changed from the base of the PR and between ac86d49 and 18d7e8f.

📒 Files selected for processing (2)
  • netlify/functions/cleanup-docs-cache-background.ts
  • src/utils/github-content-cache.server.ts

Comment on lines +6 to +35
const handler = async (req: Request) => {
const { next_run } = await req.json()

console.log('[cleanup-docs-cache] Starting docs cache prune...')

const startTime = Date.now()

try {
const { contentDeleted, artifactDeleted, threshold } =
await pruneOldCacheEntries(THIRTY_DAYS_MS)

const duration = Date.now() - startTime
console.log(
`[cleanup-docs-cache] Completed in ${duration}ms - Deleted ${contentDeleted.toLocaleString()} content rows and ${artifactDeleted.toLocaleString()} artifact rows older than ${threshold.toISOString()}`,
)
console.log('[cleanup-docs-cache] Next invocation at:', next_run)
} catch (error) {
const duration = Date.now() - startTime
const errorMessage = error instanceof Error ? error.message : String(error)
const errorStack = error instanceof Error ? error.stack : undefined

console.error(
`[cleanup-docs-cache] Failed after ${duration}ms:`,
errorMessage,
)
if (errorStack) {
console.error('[cleanup-docs-cache] Stack:', errorStack)
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🌐 Web query:

For @netlify/functions` v5.1.5 scheduled functions:

  1. Is JSON body with next_run guaranteed for every invocation?
  2. What return/throw behavior marks a run as failed vs successful?
  3. Is returning Response recommended for scheduled handlers?`

💡 Result:

  1. Yes, the JSON body with next_run is guaranteed for every invocation of scheduled functions. Official docs state: "For scheduled functions, the request body is a JSON-encoded object containing a next_run property. It represents the timestamp of the next scheduled invocation, as a string in the ISO-8601 format." Code examples show const { next_run } = await req.json. 2) Unhandled exceptions/throws or errors mark a run as failed; successful completion without errors (regardless of return value) marks it as successful. Function logs show invocations with log levels like INFO/ERROR, and metrics track success/error rates. Similar to standard serverless functions, errors in execution are logged as failures. 3) No, returning a Response object is not required or particularly recommended for scheduled handlers, as they do not return a response body to any client (they run in the background like cron jobs). Docs note: "Scheduled functions don't support response streaming because they don't return a response body." Examples return plain objects like { statusCode: 200 } or nothing explicit, but the return value does not affect scheduling behavior.

Citations:


🏁 Script executed:

cat -n netlify/functions/cleanup-docs-cache-background.ts | head -50

Repository: TanStack/tanstack.com

Length of output: 1670


Move body parsing inside try and rethrow errors to mark failures.

await req.json() (line 7) can throw before error handling, and the catch block currently swallows errors without rethrowing, which silently hides failed prune runs. Move JSON parsing inside the try block and rethrow in the catch block to ensure Netlify marks the invocation as failed when an error occurs.

Suggested fix
const handler = async (req: Request) => {
-  const { next_run } = await req.json()
-
-  console.log('[cleanup-docs-cache] Starting docs cache prune...')
-
   const startTime = Date.now()
+  console.log('[cleanup-docs-cache] Starting docs cache prune...')

   try {
+    const { next_run } = await req.json()
+
     const { contentDeleted, artifactDeleted, threshold } =
       await pruneOldCacheEntries(THIRTY_DAYS_MS)

     const duration = Date.now() - startTime
     console.log(
       `[cleanup-docs-cache] Completed in ${duration}ms - Deleted ${contentDeleted.toLocaleString()} content rows and ${artifactDeleted.toLocaleString()} artifact rows older than ${threshold.toISOString()}`,
     )
     console.log('[cleanup-docs-cache] Next invocation at:', next_run)
   } catch (error) {
     const duration = Date.now() - startTime
     const errorMessage = error instanceof Error ? error.message : String(error)
     const errorStack = error instanceof Error ? error.stack : undefined

     console.error(
       `[cleanup-docs-cache] Failed after ${duration}ms:`,
       errorMessage,
     )
     if (errorStack) {
       console.error('[cleanup-docs-cache] Stack:', errorStack)
     }
+    throw error
   }
 }
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const handler = async (req: Request) => {
const { next_run } = await req.json()
console.log('[cleanup-docs-cache] Starting docs cache prune...')
const startTime = Date.now()
try {
const { contentDeleted, artifactDeleted, threshold } =
await pruneOldCacheEntries(THIRTY_DAYS_MS)
const duration = Date.now() - startTime
console.log(
`[cleanup-docs-cache] Completed in ${duration}ms - Deleted ${contentDeleted.toLocaleString()} content rows and ${artifactDeleted.toLocaleString()} artifact rows older than ${threshold.toISOString()}`,
)
console.log('[cleanup-docs-cache] Next invocation at:', next_run)
} catch (error) {
const duration = Date.now() - startTime
const errorMessage = error instanceof Error ? error.message : String(error)
const errorStack = error instanceof Error ? error.stack : undefined
console.error(
`[cleanup-docs-cache] Failed after ${duration}ms:`,
errorMessage,
)
if (errorStack) {
console.error('[cleanup-docs-cache] Stack:', errorStack)
}
}
}
const handler = async (req: Request) => {
const startTime = Date.now()
console.log('[cleanup-docs-cache] Starting docs cache prune...')
try {
const { next_run } = await req.json()
const { contentDeleted, artifactDeleted, threshold } =
await pruneOldCacheEntries(THIRTY_DAYS_MS)
const duration = Date.now() - startTime
console.log(
`[cleanup-docs-cache] Completed in ${duration}ms - Deleted ${contentDeleted.toLocaleString()} content rows and ${artifactDeleted.toLocaleString()} artifact rows older than ${threshold.toISOString()}`,
)
console.log('[cleanup-docs-cache] Next invocation at:', next_run)
} catch (error) {
const duration = Date.now() - startTime
const errorMessage = error instanceof Error ? error.message : String(error)
const errorStack = error instanceof Error ? error.stack : undefined
console.error(
`[cleanup-docs-cache] Failed after ${duration}ms:`,
errorMessage,
)
if (errorStack) {
console.error('[cleanup-docs-cache] Stack:', errorStack)
}
throw error
}
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@netlify/functions/cleanup-docs-cache-background.ts` around lines 6 - 35, The
handler currently calls await req.json() before the try/catch and also swallows
errors in the catch; move the request body parsing (await req.json()) inside the
try block (within function handler) so JSON parse errors are caught, and in the
catch block rethrow the error after logging (or throw the caught error) so
failures are propagated to Netlify; reference the handler function, the await
req.json() call, and the catch block when making the change.

Comment on lines +385 to +386
export async function pruneOldCacheEntries(olderThanMs: number) {
const threshold = new Date(Date.now() - olderThanMs)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Guard olderThanMs to prevent accidental mass deletion.

A non-finite or non-positive olderThanMs can shift the threshold forward and delete far more than intended. Add an explicit runtime guard before computing threshold.

Suggested fix
 export async function pruneOldCacheEntries(olderThanMs: number) {
+  if (!Number.isFinite(olderThanMs) || olderThanMs <= 0) {
+    throw new Error('olderThanMs must be a positive finite number')
+  }
   const threshold = new Date(Date.now() - olderThanMs)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
export async function pruneOldCacheEntries(olderThanMs: number) {
const threshold = new Date(Date.now() - olderThanMs)
export async function pruneOldCacheEntries(olderThanMs: number) {
if (!Number.isFinite(olderThanMs) || olderThanMs <= 0) {
throw new Error('olderThanMs must be a positive finite number')
}
const threshold = new Date(Date.now() - olderThanMs)
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/github-content-cache.server.ts` around lines 385 - 386, Guard the
pruneOldCacheEntries function against invalid olderThanMs values: before
computing threshold, validate that olderThanMs is a finite number greater than
zero (e.g., Number.isFinite(olderThanMs) && olderThanMs > 0); if the check
fails, throw or return early to avoid computing a bad threshold and accidentally
deleting too many entries. Ensure the validation sits at the top of
pruneOldCacheEntries so threshold = new Date(Date.now() - olderThanMs) only runs
with a safe value.

Comment on lines +388 to +397
const [contentDeleted, artifactDeleted] = await Promise.all([
db
.delete(githubContentCache)
.where(lt(githubContentCache.updatedAt, threshold))
.returning({ repo: githubContentCache.repo }),
db
.delete(docsArtifactCache)
.where(lt(docsArtifactCache.updatedAt, threshold))
.returning({ repo: docsArtifactCache.repo }),
])
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Avoid returning every deleted row just to compute counts.

Using .returning({ repo }) pulls one result row per deleted record into memory. On large prunes, this can become a hot-path memory/latency issue. Prefer DB-side counting (e.g., CTE DELETE ... RETURNING 1 + COUNT(*)) so only scalar counts are returned.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@src/utils/github-content-cache.server.ts` around lines 388 - 397, The current
delete calls load a row per deleted record by using returning({ repo: ... }) on
githubContentCache and docsArtifactCache, which can blow memory on large prunes;
change each delete to perform the count server-side instead (e.g., use a CTE
that does DELETE ... RETURNING 1 and then SELECT COUNT(*) from that CTE, or run
a single raw SQL statement that deletes and returns a scalar count) so only
integer counts are returned; update the calls around db.delete(...) for
githubContentCache and docsArtifactCache (and any code that reads
contentDeleted/artifactDeleted) to accept numeric counts rather than arrays of
rows.

Comment thread src/utils/github-content-cache.server.ts
The stale-but-present path in getCachedGitHubContent and getCachedDocsArtifact
queued a background refresh and returned the stale value immediately. On
Netlify (Lambda under the hood), the function instance freezes after the
response is sent, so the background refresh never completes and the row stays
stale forever. That's why docs could be weeks out of date and why the admin
invalidate button appeared to do nothing visible — invalidation only flipped
staleAt to 0, and the next visit still took the broken fire-and-forget path.

Collapse the stale-but-present branch into the same synchronous refresh path
that cold cache reads already use. The unlucky user who hits right after
expiry pays ~one GitHub fetch, but the cache actually updates. The existing
error fallback still serves stale content if the refresh throws.
@tannerlinsley tannerlinsley changed the title feat(docs-cache): nightly prune of orphaned cache rows fix(docs-cache): sync refresh + nightly prune of orphaned rows Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant