Skip to content

feat(jobs): Add data retention jobs#4128

Merged
TheodoreSpeaks merged 3 commits intostagingfrom
feat/auto-redaction
Apr 21, 2026
Merged

feat(jobs): Add data retention jobs#4128
TheodoreSpeaks merged 3 commits intostagingfrom
feat/auto-redaction

Conversation

@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks TheodoreSpeaks commented Apr 13, 2026

Summary

Add data retention jobs. 3 jobs created:

  1. Clean up soft deleted resources (7 days free, 30 days paid, customizable enterprise)
  2. Log retention cleanup (7 days free, infinite paid, customizable enterprise)
  3. Task cleanup (7 days free, infinite paid, customizable enterprise)

Type of Change

  • Bug fix
  • New feature
  • Breaking change
  • Documentation
  • Other: ___________

Testing

  • Tested locally. Validated that data is deleted from sim and copilot dbs. Validated that s3 buckets clean up data as well.

Checklist

  • Code follows project style guidelines
  • Self-reviewed my changes
  • Tests added/updated and passing
  • No new warnings introduced
  • I confirm that I have read and agree to the terms outlined in the Contributor License Agreement (CLA)

Screenshots/Videos

@vercel
Copy link
Copy Markdown

vercel bot commented Apr 13, 2026

The latest updates on your projects. Learn more about Vercel for GitHub.

Project Deployment Actions Updated (UTC)
docs Ready Ready Preview, Comment Apr 21, 2026 0:16am

Request Review

@waleedlatif1
Copy link
Copy Markdown
Collaborator

@TheodoreSpeaks let's consolidate the migrations into a single one, just delete the existing ones and run it once over all the changes in shcema.ts

@TheodoreSpeaks TheodoreSpeaks changed the title Feat/auto redaction (wip) feat(jobs): Add data retention jobs Apr 18, 2026
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@BugBot review

@cursor
Copy link
Copy Markdown

cursor bot commented Apr 18, 2026

PR Summary

High Risk
High risk because it introduces new automated hard-deletion paths across multiple tables and cloud storage/copilot backend, plus new scheduling/dispatch logic that could delete data broadly if retention scoping is wrong.

Overview
Adds a new data-retention system with three cleanup job types (logs, soft-deleted resources, and copilot/task data) that run via the async job queue/Trigger.dev and can also execute inline on the DB-backed queue.

Cron endpoints are added for cleanup-logs, cleanup-soft-deletes, and cleanup-tasks, and the existing logs cleanup API is refactored from inline deletion to job dispatch. Retention defaults are centralized in CLEANUP_CONFIG, with enterprise workspaces opting in via per-workspace retention columns and non-enterprise plans using fixed defaults.

Introduces an enterprise-only Data Retention settings page and GET/PUT /api/workspaces/[id]/data-retention endpoint (admin-gated, audited) to configure per-workspace retention. Includes new background tasks implementing batched DB deletes plus external cleanup (S3/file metadata and copilot backend cleanup), and a DB migration adding retention columns to workspace and updating/adding partial indexes to support cleanup queries.

Reviewed by Cursor Bugbot for commit f546653. Bugbot is set up for automated code reviews on this repo. Configure here.

Comment thread apps/sim/background/cleanup-soft-deletes.ts
Comment thread packages/db/migrations/meta/_journal.json
Comment thread packages/db/migrations/meta/0191_snapshot.json Outdated
Comment thread apps/sim/background/cleanup-tasks.ts
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@BugBot review

Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps bot commented Apr 18, 2026

Greptile Summary

Adds three data retention background jobs (soft-delete cleanup, log cleanup, task/chat cleanup) dispatched via Trigger.dev or an inline fallback, with an enterprise-gated UI and API for per-workspace configuration. The migration replaces full soft-delete indexes with partial indexes and adds three retention columns to workspace.

  • P1 — S3 objects orphaned for workspace_file rows: cleanupWorkspaceFileStorage only queries workspaceFiles (plural) for S3 cleanup, but CLEANUP_TARGETS also includes workspaceFile (singular), which has its own key column. When cleanupTable hard-deletes those rows the corresponding S3 objects are never removed.
  • P2 — task cleanup defaults mismatch: The PR description states "7 days free" for task cleanup, but getRetentionDefaultHours returns null for both free and paid tiers (confirmed by the in-code comment "No task cleanup"). Clarify whether this is intentional or a description oversight.

Confidence Score: 3/5

Not safe to merge as-is: the S3 cleanup gap will permanently orphan workspace_file objects in object storage on every cleanup run.

One confirmed P1 data-integrity bug (workspace_file S3 objects never deleted) that will silently accumulate orphaned cloud storage objects on each cron execution. Everything else — batching logic, auth, migration, Trigger.dev wiring, enterprise UI — is well-structured.

apps/sim/background/cleanup-soft-deletes.ts — cleanupWorkspaceFileStorage must also cover the workspaceFile (singular) table

Important Files Changed

Filename Overview
apps/sim/background/cleanup-soft-deletes.ts Batched soft-delete cleanup with partial-index support; P1 bug: workspaceFile S3 objects are never deleted when DB rows are purged
apps/sim/background/cleanup-tasks.ts Task/chat/run cleanup with correct deletion ordering; duplicate copilotChats query for feedback deletion (addressed in prior thread)
apps/sim/background/cleanup-logs.ts Batched execution log cleanup with S3 file handling; dynamic import inside loop (addressed in prior thread)
apps/sim/lib/billing/cleanup-dispatcher.ts Dispatches free/paid/enterprise cleanup jobs; taskCleanupHours returns null for free/paid despite PR description claiming 7-day free default
apps/sim/lib/cleanup/chat-cleanup.ts Collects file refs from workspaceFiles and JSONB messages, calls copilot backend, deletes S3 files with correct context per file
apps/sim/app/api/workspaces/[id]/data-retention/route.ts GET/PUT API for data retention config; properly gates writes to enterprise plan with admin permission check and audit logging
packages/db/migrations/0193_unknown_franklin_richards.sql Adds log/soft-delete/task retention columns to workspace; replaces full indexes with partial indexes on deleted_at/archived_at for query efficiency
apps/sim/ee/data-retention/components/data-retention-settings.tsx Enterprise-gated UI for configuring retention periods; correctly renders locked vs. editable views based on plan
apps/sim/ee/data-retention/hooks/data-retention.ts React Query hooks for data retention with correct staleTime, signal forwarding, and onSettled invalidation
apps/sim/lib/core/async-jobs/backends/trigger-dev.ts Adds cleanup-logs, cleanup-soft-deletes, cleanup-tasks to the Trigger.dev job type mapping; no issues

Sequence Diagram

sequenceDiagram
    participant Cron as Cron (GET /api/cron/*)
    participant Dispatcher as dispatchCleanupJobs
    participant Queue as JobQueue (Trigger.dev / DB)
    participant Task as Background Task
    participant DB as Database
    participant S3 as Object Storage
    participant Copilot as Copilot Backend

    Cron->>Dispatcher: dispatchCleanupJobs(jobType, retentionColumn)
    Dispatcher->>Queue: enqueue free-tier job
    Dispatcher->>Queue: enqueue paid-tier job
    Dispatcher->>DB: query enterprise workspaces with non-NULL retention
    Dispatcher->>Queue: batchTrigger enterprise jobs

    Queue->>Task: run(payload)
    Task->>DB: resolveTierWorkspaceIds or lookup workspace retention
    Task->>DB: SELECT expiring rows (batched, LIMIT 2000)
    Task->>S3: delete associated files (pre-deletion)
    Task->>Copilot: POST /api/tasks/cleanup (chat IDs)
    Task->>DB: DELETE rows by ID
    Task-->>Queue: complete
Loading

Reviews (2): Last reviewed commit: "fix lint" | Re-trigger Greptile

Comment thread apps/sim/background/cleanup-soft-deletes.ts Outdated
Comment thread apps/sim/background/cleanup-tasks.ts
Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
Comment thread apps/sim/background/cleanup-logs.ts Outdated
@TheodoreSpeaks
Copy link
Copy Markdown
Collaborator Author

@greptile review

Comment thread apps/sim/background/cleanup-soft-deletes.ts
Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
Comment thread apps/sim/app/api/workspaces/[id]/data-retention/route.ts
Comment thread apps/sim/background/cleanup-tasks.ts
Comment thread apps/sim/background/cleanup-soft-deletes.ts Outdated
Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts Outdated
Comment thread apps/sim/ee/data-retention/components/data-retention-settings.tsx
Comment thread apps/sim/background/cleanup-logs.ts
Comment thread apps/sim/lib/billing/cleanup-dispatcher.ts
Comment thread apps/sim/ee/data-retention/components/data-retention-settings.tsx Outdated
Comment thread apps/sim/background/cleanup-soft-deletes.ts Outdated
Comment thread apps/sim/background/cleanup-soft-deletes.ts
Comment thread apps/sim/app/workspace/[workspaceId]/settings/navigation.ts
Add 3 cron-triggered cleanup jobs dispatched via Trigger.dev (or inline fallback):
- cleanup-soft-deletes: hard-deletes soft-deleted workspace resources past retention
- cleanup-logs: deletes expired workflow execution logs + S3 files
- cleanup-tasks: deletes expired copilot chats, runs, feedback, inbox tasks

Enterprise admins can configure per-workspace retention via Settings > Data Retention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

# Conflicts:
#	packages/db/migrations/meta/0192_snapshot.json
#	packages/db/migrations/meta/_journal.json
#	packages/db/schema.ts
Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit be59d2b. Configure here.

Comment thread apps/sim/background/cleanup-tasks.ts
@TheodoreSpeaks TheodoreSpeaks merged commit 802f4cf into staging Apr 21, 2026
14 checks passed
@TheodoreSpeaks TheodoreSpeaks deleted the feat/auto-redaction branch April 21, 2026 00:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants