Skip to content

feat: Sessions - bidirectional durable agent streams#3417

Open
ericallam wants to merge 5 commits intomainfrom
feature/tri-8627-session-primitive-server-side-schema-routes-clickhouse
Open

feat: Sessions - bidirectional durable agent streams#3417
ericallam wants to merge 5 commits intomainfrom
feature/tri-8627-session-primitive-server-side-schema-routes-clickhouse

Conversation

@ericallam
Copy link
Copy Markdown
Member

@ericallam ericallam commented Apr 20, 2026

What this enables

A new first-class primitive, Session, for durable bidirectional I/O that outlives a single run. Sessions give you a server-managed channel pair (.out from the task, .in from the client) that you can write to, read from, and subscribe to across many runs, filter, list, and close, all through a single identifier.

Use cases unblocked

  • Chat agents that persist across turns. Turns 1..N attach to the same Session. The UI subscribes once and keeps receiving output as new runs attach.
  • Approval loops and long-running tasks with user feedback. The task waits on .in, the client writes to .in, and the server enforces no-writes-after-close.
  • Workflow progress streams that live past the run. A dashboard can subscribe to .out after the task finishes to replay the history.
  • Any session-scoped state where pre-existing run streams (scoped to a single run) were too narrow.

Public API surface

Control plane

  • POST /api/v1/sessions to create. Idempotent when you supply externalId.
  • GET /api/v1/sessions/:session to retrieve by friendlyId (session_abc) or by your own externalId. The server disambiguates via the session_ prefix.
  • GET /api/v1/sessions to list with filters (type, tag, taskIdentifier, externalId, derived status = ACTIVE/CLOSED/EXPIRED, created-at period/from/to) and cursor pagination. Backed by ClickHouse.
  • PATCH /api/v1/sessions/:session to update tags/metadata/externalId.
  • POST /api/v1/sessions/:session/close to terminate. Idempotent, hard-blocks new server-brokered writes.

Realtime

  • PUT /realtime/v1/sessions/:session/:io to initialize a channel. Returns S2 credentials in headers so clients can write direct to S2 for high-throughput cases.
  • GET /realtime/v1/sessions/:session/:io for SSE subscribe.
  • POST /realtime/v1/sessions/:session/:io/append for server-side appends.

Scopes

  • sessions is now a ResourceType. read:sessions:{id}, write:sessions:{id}, admin:sessions:{id} all flow through the existing JWT validator.

Implementation summary

Postgres (Session table)

  • Scalar scoping columns (projectId, runtimeEnvironmentId, environmentType, organizationId) with no foreign keys. Matches the January TaskRun FK-removal decision, keeps the write path partition-friendly.
  • Point-lookup indexes only: friendlyId unique, (env, externalId) unique, expiresAt. List queries are served from ClickHouse, so Postgres stays insert-heavy.
  • Terminal markers (closedAt, closedReason, expiresAt) are write-once. No status enum, no counters, no currentRunId pointer. All run-related state is derived.

ClickHouse (sessions_v1)

  • ReplacingMergeTree partitioned by month, ordered by (org_id, project_id, environment_id, created_at, session_id). tags indexed with a tokenbf_v1 skip index.
  • SessionsReplicationService mirrors RunsReplicationService exactly: logical replication with leader-locked consumer, ConcurrentFlushScheduler, retry with exponential backoff + jitter, identical metric shape. Dedicated slot + publication so the two consume independently.
  • SessionsRepository + ClickHouseSessionsRepository expose list / count / tags with the same cursor pagination convention as runs and waitpoints.

S2

  • New key format for session channels: sessions/{friendlyId}/{out|in}. The existing runs/{runId}/{streamId} format for implicit run streams is completely untouched.

What did not change

  • Run-scoped streams.pipe / streams.input still behave exactly as before. They do not create Session rows and the existing routes are unchanged. Sessions are a net-new primitive for the next phase of agent features, not a reshaping of the current streams API.

Verification

  • Webapp typecheck clean (10/10).
  • apps/webapp/test/sessionsReplicationService.test.ts exercises insert and update round-trips through Postgres logical replication into ClickHouse via testcontainers.
  • Live end-to-end against local dev: create, retrieve (friendlyId + externalId), update, .out.initialize, .out.append x2, .in.send, .out.subscribe over SSE, list (type, tag, status, externalId, pagination), close, idempotent re-close. Replicated row lands in ClickHouse within ~1s with closed_reason intact.

Not in this PR

  • Client SDK (lives on the ai-chat feature branch, wires up the runtime ergonomics for chat.agent).
  • Dashboard routes.
  • chat.agent integration.

Test plan

  • pnpm run typecheck --filter webapp
  • pnpm run test --filter webapp ./test/sessionsReplicationService.test.ts --run
  • Start the webapp with SESSION_REPLICATION_CLICKHOUSE_URL and SESSION_REPLICATION_ENABLED=1 set. Confirm the slot and publication auto-create on boot.
  • Hit POST /api/v1/sessions and verify the row replicates to trigger_dev.sessions_v1 within a couple of seconds.
  • POST /api/v1/sessions/:id/close and confirm subsequent POST /realtime/v1/sessions/:id/out/append returns 400.

Durable, typed, bidirectional I/O primitive that outlives a single run.
Ship target is agent/chat use cases; run-scoped streams.pipe/streams.input
are untouched and do not create Session rows.

Postgres
- New Session table: id, friendlyId, externalId, type (plain string),
  denormalised project/environment/organization scalar columns (no FKs),
  taskIdentifier, tags String[], metadata Json, closedAt, closedReason,
  expiresAt, timestamps
- Point-lookup indexes only (friendlyId unique, (env, externalId) unique,
  expiresAt). List queries are served from ClickHouse so Postgres stays
  minimal and insert-heavy.

Control-plane API
- POST   /api/v1/sessions           create (idempotent via externalId)
- GET    /api/v1/sessions           list with filters (type, tag,
                                     taskIdentifier, externalId, status
                                     ACTIVE|CLOSED|EXPIRED, period/from/to)
                                     and cursor pagination, ClickHouse-backed
- GET    /api/v1/sessions/:session  retrieve — polymorphic: `session_` prefix
                                     hits friendlyId, otherwise externalId
- PATCH  /api/v1/sessions/:session  update tags/metadata/externalId
- POST   /api/v1/sessions/:session/close  terminal close (idempotent)

Realtime (S2-backed)
- PUT    /realtime/v1/sessions/:session/:io           returns S2 creds
- GET    /realtime/v1/sessions/:session/:io           SSE subscribe
- POST   /realtime/v1/sessions/:session/:io/append    server-side append
- S2 key format: sessions/{friendlyId}/{out|in}

Auth
- sessions added to ResourceTypes. read:sessions:{id},
  write:sessions:{id}, admin:sessions:{id} scopes work via existing JWT
  validation.

ClickHouse
- sessions_v1 ReplacingMergeTree table
- SessionsReplicationService mirrors RunsReplicationService exactly:
  logical replication with leader-locked consumer, ConcurrentFlushScheduler,
  retry with exponential backoff + jitter, identical metric shape.
  Dedicated slot + publication (sessions_to_clickhouse_v1[_publication]).
- SessionsRepository + ClickHouseSessionsRepository expose list, count,
  tags with cursor pagination keyed by (created_at DESC, session_id DESC).
- Derived status (ACTIVE/CLOSED/EXPIRED) computed from closed_at + expires_at;
  in-memory fallback on list results to catch pre-replication writes.

Verification
- Webapp typecheck 10/10
- Core + SDK build 3/3
- sessionsReplicationService.test.ts integration tests 2/2 (insert + update
  round-trip via testcontainers)
- Live round-trip against local dev: create -> retrieve (friendlyId and
  externalId) -> out.initialize -> out.append x2 -> in.send -> out.subscribe
  (receives records) -> close -> ClickHouse sessions_v1 shows the replicated
  row with closed_reason
- Live list smoke: tag, type, status CLOSED, externalId, and cursor pagination
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Apr 20, 2026

🦋 Changeset detected

Latest commit: 2210fe2

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 29 packages
Name Type
@trigger.dev/core Patch
@trigger.dev/build Patch
trigger.dev Patch
@trigger.dev/python Patch
@trigger.dev/redis-worker Patch
@trigger.dev/schema-to-json Patch
@trigger.dev/sdk Patch
@internal/cache Patch
@internal/clickhouse Patch
@internal/llm-model-catalog Patch
@internal/redis Patch
@internal/replication Patch
@internal/run-engine Patch
@internal/schedule-engine Patch
@internal/testcontainers Patch
@internal/tracing Patch
@internal/tsql Patch
@internal/zod-worker Patch
d3-chat Patch
references-d3-openai-agents Patch
references-nextjs-realtime Patch
references-realtime-hooks-test Patch
references-realtime-streams Patch
references-telemetry Patch
@internal/sdk-compat-tests Patch
@trigger.dev/react-hooks Patch
@trigger.dev/rsc Patch
@trigger.dev/database Patch
@trigger.dev/otlp-importer Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai bot commented Apr 20, 2026

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 47bb187a-c4ac-4051-bb95-66de4f297537

📥 Commits

Reviewing files that changed from the base of the PR and between ff46f33 and 2210fe2.

📒 Files selected for processing (1)
  • apps/webapp/app/routes/api.v1.sessions.ts
📜 Recent review details
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (28)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (5, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (6, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (3, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (5, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (2, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (8, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (4, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (7, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (6, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (1, 8)
  • GitHub Check: units / webapp / 🧪 Unit Tests: Webapp (8, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (1, 8)
  • GitHub Check: units / internal / 🧪 Unit Tests: Internal (4, 8)
  • GitHub Check: units / packages / 🧪 Unit Tests: Packages (1, 1)
  • GitHub Check: typecheck / typecheck
  • GitHub Check: sdk-compat / Node.js 22.12 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - npm)
  • GitHub Check: sdk-compat / Cloudflare Workers
  • GitHub Check: sdk-compat / Node.js 20.20 (ubuntu-latest)
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - npm)
  • GitHub Check: sdk-compat / Bun Runtime
  • GitHub Check: sdk-compat / Deno Runtime
  • GitHub Check: e2e / 🧪 CLI v3 tests (windows-latest - pnpm)
  • GitHub Check: e2e / 🧪 CLI v3 tests (ubuntu-latest - pnpm)
  • GitHub Check: Analyze (javascript-typescript)
🧰 Additional context used
📓 Path-based instructions (7)
**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

**/*.{ts,tsx}: Use types over interfaces for TypeScript
Avoid using enums; prefer string unions or const objects instead

Files:

  • apps/webapp/app/routes/api.v1.sessions.ts
{packages/core,apps/webapp}/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use zod for validation in packages/core and apps/webapp

Files:

  • apps/webapp/app/routes/api.v1.sessions.ts
**/*.{ts,tsx,js,jsx}

📄 CodeRabbit inference engine (.github/copilot-instructions.md)

Use function declarations instead of default exports

Add crumbs as you write code using // @Crumbs comments or `// `#region` `@crumbs blocks. These are temporary debug instrumentation and must be stripped using agentcrumbs strip before merge.

Files:

  • apps/webapp/app/routes/api.v1.sessions.ts
**/*.ts

📄 CodeRabbit inference engine (.cursor/rules/otel-metrics.mdc)

**/*.ts: When creating or editing OTEL metrics (counters, histograms, gauges), ensure metric attributes have low cardinality by using only enums, booleans, bounded error codes, or bounded shard IDs
Do not use high-cardinality attributes in OTEL metrics such as UUIDs/IDs (envId, userId, runId, projectId, organizationId), unbounded integers (itemCount, batchSize, retryCount), timestamps (createdAt, startTime), or free-form strings (errorMessage, taskName, queueName)
When exporting OTEL metrics via OTLP to Prometheus, be aware that the exporter automatically adds unit suffixes to metric names (e.g., 'my_duration_ms' becomes 'my_duration_ms_milliseconds', 'my_counter' becomes 'my_counter_total'). Account for these transformations when writing Grafana dashboards or Prometheus queries

Files:

  • apps/webapp/app/routes/api.v1.sessions.ts
**/*.{js,ts,jsx,tsx,json,md,yaml,yml}

📄 CodeRabbit inference engine (AGENTS.md)

Format code using Prettier before committing

Files:

  • apps/webapp/app/routes/api.v1.sessions.ts
**/*.ts{,x}

📄 CodeRabbit inference engine (CLAUDE.md)

Always import from @trigger.dev/sdk when writing Trigger.dev tasks. Never use @trigger.dev/sdk/v3 or deprecated client.defineJob.

Files:

  • apps/webapp/app/routes/api.v1.sessions.ts
apps/webapp/**/*.{ts,tsx}

📄 CodeRabbit inference engine (.cursor/rules/webapp.mdc)

apps/webapp/**/*.{ts,tsx}: Access environment variables through the env export of env.server.ts instead of directly accessing process.env
Use subpath exports from @trigger.dev/core package instead of importing from the root @trigger.dev/core path

Use named constants for sentinel/placeholder values (e.g. const UNSET_VALUE = '__unset__') instead of raw string literals scattered across comparisons

Files:

  • apps/webapp/app/routes/api.v1.sessions.ts
🧠 Learnings (23)
📓 Common learnings
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/services/sessionsReplicationService.server.ts:224-231
Timestamp: 2026-04-20T14:50:16.440Z
Learning: In `apps/webapp/app/services/sessionsReplicationService.server.ts`, the acknowledge-before-flush pattern is intentional and mirrors `runsReplicationService.server.ts`. `_latestCommitEndLsn` is updated at Postgres commit time and acknowledged on a periodic interval via `#acknowledgeLatestTransaction`, without waiting for ClickHouse batch flush to complete. Do not flag this as a durability/ordering issue — this at-least-once delivery trade-off is an established project-wide convention for both runs and sessions replication services.
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/routes/realtime.v1.sessions.$session.$io.ts:37-51
Timestamp: 2026-04-20T15:06:16.910Z
Learning: In `apps/webapp/app/routes/realtime.v1.sessions.$session.$io.ts` (and all session realtime read paths), `$replica` is intentionally used for the `resolveSessionByIdOrExternalId` call — including the `closedAt` guard in the PUT/initialize path. The project convention is to use `$replica` consistently across all session realtime routes. The race window (replica lag allowing a ghost-initialize after close) is accepted as not realistic in practice (clients follow the close API response; they do not race it). If replica lag ever causes issues, the mitigation is to revisit all realtime routes together, not to swap individual routes to `prisma`. Do not flag `$replica` usage in session realtime routes as a stale-read issue.
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/services/sessionsReplicationService.server.ts:204-215
Timestamp: 2026-04-20T15:08:49.959Z
Learning: In `apps/webapp/app/services/sessionsReplicationService.server.ts` and `apps/webapp/app/services/runsReplicationService.server.ts`, the `getKey` function in `ConcurrentFlushScheduler` uses `${item.event}_${item.session.id}` / `${item.event}_${item.run.id}` respectively. This pattern is intentionally kept identical across both replication services for consistency. Any change to the deduplication key shape (e.g., keying solely by session/run id) must be applied to both services together, never to one service in isolation. Tracking as a cross-service follow-up.
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/services/sessionsRepository/clickhouseSessionsRepository.server.ts:27-40
Timestamp: 2026-04-20T15:08:57.551Z
Learning: In `apps/webapp/app/services/sessionsRepository/clickhouseSessionsRepository.server.ts`, the cursor predicate in `listSessionIds` compares only `session_id` while the `ORDER BY` clause uses `(created_at, session_id)`. This is intentional and consistent with the same pattern in `ClickHouseRunsRepository` and the waitpoints repository. Do not flag this as a skip/duplicate pagination bug in isolation — any fix must land across all three repositories at once as a shared follow-up.
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: internal-packages/clickhouse/src/sessions.ts:174-180
Timestamp: 2026-04-20T15:09:08.656Z
Learning: In `internal-packages/clickhouse/src/sessions.ts`, `getSessionTagsQueryBuilder` intentionally queries `trigger_dev.sessions_v1` WITHOUT `FINAL`, mirroring `getTaskRunTagsQueryBuilder` which queries `task_runs_v2` without `FINAL`. The DISTINCT arrayJoin tag-listing read can tolerate an occasional stale tag from a superseded ReplacingMergeTree row; the FINAL cost on a large table is considered not worth it. If FINAL is ever added, both tag query builders (sessions and runs) will be updated together. Do not flag the missing FINAL in either tag query builder as a consistency or stale-data issue.
📚 Learning: 2026-04-20T15:06:16.910Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/routes/realtime.v1.sessions.$session.$io.ts:37-51
Timestamp: 2026-04-20T15:06:16.910Z
Learning: In `apps/webapp/app/routes/realtime.v1.sessions.$session.$io.ts` (and all session realtime read paths), `$replica` is intentionally used for the `resolveSessionByIdOrExternalId` call — including the `closedAt` guard in the PUT/initialize path. The project convention is to use `$replica` consistently across all session realtime routes. The race window (replica lag allowing a ghost-initialize after close) is accepted as not realistic in practice (clients follow the close API response; they do not race it). If replica lag ever causes issues, the mitigation is to revisit all realtime routes together, not to swap individual routes to `prisma`. Do not flag `$replica` usage in session realtime routes as a stale-read issue.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-20T15:08:57.551Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/services/sessionsRepository/clickhouseSessionsRepository.server.ts:27-40
Timestamp: 2026-04-20T15:08:57.551Z
Learning: In `apps/webapp/app/services/sessionsRepository/clickhouseSessionsRepository.server.ts`, the cursor predicate in `listSessionIds` compares only `session_id` while the `ORDER BY` clause uses `(created_at, session_id)`. This is intentional and consistent with the same pattern in `ClickHouseRunsRepository` and the waitpoints repository. Do not flag this as a skip/duplicate pagination bug in isolation — any fix must land across all three repositories at once as a shared follow-up.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-20T15:08:49.959Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/services/sessionsReplicationService.server.ts:204-215
Timestamp: 2026-04-20T15:08:49.959Z
Learning: In `apps/webapp/app/services/sessionsReplicationService.server.ts` and `apps/webapp/app/services/runsReplicationService.server.ts`, the `getKey` function in `ConcurrentFlushScheduler` uses `${item.event}_${item.session.id}` / `${item.event}_${item.run.id}` respectively. This pattern is intentionally kept identical across both replication services for consistency. Any change to the deduplication key shape (e.g., keying solely by session/run id) must be applied to both services together, never to one service in isolation. Tracking as a cross-service follow-up.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-20T15:05:57.327Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3417
File: apps/webapp/app/routes/realtime.v1.sessions.$session.$io.append.ts:20-31
Timestamp: 2026-04-20T15:05:57.327Z
Learning: In `apps/webapp/app/routes/realtime.v1.sessions.$session.$io.append.ts`, the `MAX_APPEND_BODY_BYTES` cap is intentionally set to `1024 * 512` (512 KiB). The maintainer explicitly decided against lowering it to 128 KiB: the all-quotes worst-case JSON-escaping expansion that could exceed S2's 1 MiB per-record limit is considered pathological and not representative of real-world payloads (chat tokens, tool-call JSON, structured data). If overflow becomes a problem in practice, the preferred mitigation is an encoded-size guard inside `appendPart` itself. Do not flag this cap as a potential S2 overflow issue in future reviews.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-16T14:19:16.309Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-04-16T14:19:16.309Z
Learning: Applies to apps/webapp/**/*.server.ts : Always use `findFirst` instead of `findUnique` in Prisma queries. `findUnique` has an implicit DataLoader that batches concurrent calls and has active bugs even in Prisma 6.x (uppercase UUIDs returning null, composite key SQL correctness issues, 5-10x worse performance). `findFirst` is never batched and avoids this entire class of issues

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-13T21:44:00.032Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/services/taskIdentifierRegistry.server.ts:24-67
Timestamp: 2026-04-13T21:44:00.032Z
Learning: In `apps/webapp/app/services/taskIdentifierRegistry.server.ts`, the sequential upsert/updateMany/findMany writes in `syncTaskIdentifiers` are intentionally NOT wrapped in a Prisma transaction. This function runs only during deployment-change events (low-concurrency path), and any partial `isInLatestDeployment` state is acceptable because it self-corrects on the next deployment. Do not flag this as a missing-transaction/atomicity issue in future reviews.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-16T14:21:15.229Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/components/logs/LogsTaskFilter.tsx:135-163
Timestamp: 2026-04-16T14:21:15.229Z
Learning: In `triggerdotdev/trigger.dev` PR `#3368`, the `TaskIdentifier` table has a `@unique([runtimeEnvironmentId, slug])` DB constraint, guaranteeing one row per (environment, slug). In components like `apps/webapp/app/components/logs/LogsTaskFilter.tsx` and `apps/webapp/app/components/runs/v3/RunFilters.tsx`, using `key={item.slug}` for SelectItem list items is correct and unique. Do NOT flag `key={item.slug}` as potentially non-unique — the old duplicate-(slug, triggerSource) issue only existed with the legacy `DISTINCT` query, which this registry replaces.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-03-13T13:42:25.092Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3213
File: apps/webapp/app/routes/admin.llm-models.new.tsx:65-91
Timestamp: 2026-03-13T13:42:25.092Z
Learning: In `apps/webapp/app/routes/admin.llm-models.new.tsx`, sequential Prisma writes for model/tier creation are intentionally not wrapped in a transaction. The form is admin-only with low concurrency risk, and the blast radius is considered minimal for admin tooling.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-16T13:45:18.782Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/test/engine/taskIdentifierRegistry.test.ts:3-19
Timestamp: 2026-04-16T13:45:18.782Z
Learning: In `apps/webapp/test/engine/taskIdentifierRegistry.test.ts`, the `vi.mock` calls for `~/services/taskIdentifierCache.server` (stubbing `getTaskIdentifiersFromCache` and `populateTaskIdentifierCache`), `~/models/task.server` (stubbing `getAllTaskIdentifiers`), and `~/db.server` (stubbing `prisma` and `$replica`) are intentional. The suite uses real Postgres via testcontainers for all `TaskIdentifier` DB operations, but isolates the Redis cache layer and legacy query fallback as separate concerns not exercised in this test file. Do not flag these mocks as violations of the no-mocks policy in future reviews.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-03-22T13:49:23.474Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: internal-packages/database/prisma/migrations/20260318114244_add_prompt_friendly_id/migration.sql:5-5
Timestamp: 2026-03-22T13:49:23.474Z
Learning: In `internal-packages/database/prisma/migrations/**/*.sql`: When a column and its index are added in a follow-up migration file but the parent table itself was introduced in the same PR (i.e., no production rows exist yet), a plain `CREATE INDEX` / `CREATE UNIQUE INDEX` (without CONCURRENTLY) is safe and does not require splitting into a separate migration. The CONCURRENTLY requirement only applies when the table already has existing data in production.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-16T14:21:14.907Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: internal-packages/database/prisma/schema.prisma:666-666
Timestamp: 2026-04-16T14:21:14.907Z
Learning: In `triggerdotdev/trigger.dev`, the `BackgroundWorkerTask` covering index on `(runtimeEnvironmentId, slug, triggerSource)` lives in `internal-packages/database/prisma/migrations/20260413000000_add_bwt_covering_index/migration.sql` as a `CREATE INDEX CONCURRENTLY IF NOT EXISTS`, intentionally in its own migration file separate from the `TaskIdentifier` table migration. Do not flag this index as missing from the schema migrations in future reviews.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-03-26T10:02:25.354Z
Learnt from: 0ski
Repo: triggerdotdev/trigger.dev PR: 3254
File: apps/webapp/app/services/platformNotifications.server.ts:363-385
Timestamp: 2026-03-26T10:02:25.354Z
Learning: In `triggerdotdev/trigger.dev`, the `getNextCliNotification` fallback in `apps/webapp/app/services/platformNotifications.server.ts` intentionally uses `prisma.orgMember.findFirst` (single org) when no `projectRef` is provided. This is acceptable for v1 because the CLI (`dev` and `login` commands) always passes `projectRef` in normal usage, making the fallback a rare edge case. Do not flag the single-org fallback as a multi-org correctness bug in this file.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-16T14:19:16.309Z
Learnt from: CR
Repo: triggerdotdev/trigger.dev PR: 0
File: apps/webapp/CLAUDE.md:0-0
Timestamp: 2026-04-16T14:19:16.309Z
Learning: Applies to apps/webapp/app/v3/services/queues.server.ts : If adding a new task-level default, add it to the existing `select` clause in the `backgroundWorkerTask.findFirst()` query in `queues.server.ts` — do NOT add a second query. If the default doesn't need to be known at trigger time, resolve it at dequeue time instead

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-07T14:12:59.018Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3331
File: apps/webapp/app/runEngine/concerns/batchPayloads.server.ts:112-136
Timestamp: 2026-04-07T14:12:59.018Z
Learning: In `apps/webapp/app/runEngine/concerns/batchPayloads.server.ts`, the `pRetry` call wrapping `uploadPacketToObjectStore` intentionally retries **all** error types (no `shouldRetry` filter / `AbortError` guards). The maintainer explicitly prefers over-retrying to under-retrying because multiple heterogeneous object store backends are supported and it is impractical to enumerate all permanent error signatures. Do not flag this as an issue in future reviews.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-02-25T17:28:20.456Z
Learnt from: isshaddad
Repo: triggerdotdev/trigger.dev PR: 3130
File: docs/v3-openapi.yaml:3134-3135
Timestamp: 2026-02-25T17:28:20.456Z
Learning: In the Trigger.dev codebase, the `publicAccessToken` returned by the SDK's `wait.createToken()` method is not part of the HTTP response body from `POST /api/v1/waitpoints/tokens`. The server returns only `{ id, isCached, url }`. The SDK's `prepareData` hook generates the JWT client-side from the `x-trigger-jwt-claims` response header after the HTTP call completes. The OpenAPI spec correctly documents only the HTTP response body, not SDK transformations.
<!-- [/add_learning]

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2025-09-02T11:18:06.602Z
Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 2463
File: apps/webapp/app/services/gitHubSession.server.ts:31-36
Timestamp: 2025-09-02T11:18:06.602Z
Learning: In the GitHub App installation flow in apps/webapp/app/services/gitHubSession.server.ts, the redirectTo parameter stored in httpOnly session cookies is considered acceptable without additional validation by the maintainer, as the httpOnly cookie provides sufficient security for this use case.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-04-16T14:21:09.410Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3368
File: apps/webapp/app/services/taskIdentifierCache.server.ts:33-39
Timestamp: 2026-04-16T14:21:09.410Z
Learning: In `apps/webapp/app/services/taskIdentifierCache.server.ts`, the `decode()` function intentionally uses a plain `JSON.parse` cast instead of Zod validation. The Redis cache is exclusively written by the internal `populateTaskIdentifierCache` function via the symmetric `encode()` helper — there is no external input path. Any shape mismatch would be a serialization bug to surface explicitly, not untrusted data to filter out. Do not suggest adding Zod validation to the `decode()` function or the `getTaskIdentifiersFromCache` return path in future reviews.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-03-10T17:56:26.581Z
Learnt from: samejr
Repo: triggerdotdev/trigger.dev PR: 3201
File: apps/webapp/app/v3/services/setSeatsAddOn.server.ts:25-29
Timestamp: 2026-03-10T17:56:26.581Z
Learning: In the `triggerdotdev/trigger.dev` webapp, service classes such as `SetSeatsAddOnService` and `SetBranchesAddOnService` do NOT need to perform their own userId-to-organizationId authorization checks. Auth is enforced at the route layer: `requireUserId(request)` authenticates the user, and the `_app.orgs.$organizationSlug` layout route enforces that the authenticated user is a member of the org. Any `userId` and `organizationId` reaching these services from org-scoped routes are already validated. This is the consistent pattern used across all org-scoped services in the codebase.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2025-08-14T10:53:54.526Z
Learnt from: myftija
Repo: triggerdotdev/trigger.dev PR: 2391
File: apps/webapp/app/services/organizationAccessToken.server.ts:50-0
Timestamp: 2025-08-14T10:53:54.526Z
Learning: In the Trigger.dev codebase, token service functions (like revokePersonalAccessToken and revokeOrganizationAccessToken) don't include tenant scoping in their database queries. Instead, authorization and tenant scoping happens at a higher level in the authentication flow (typically in route handlers) before these service functions are called. This is a consistent pattern across both Personal Access Tokens (PATs) and Organization Access Tokens (OATs).

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-02-11T16:50:14.167Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3019
File: apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.dashboards.$dashboardId.widgets.tsx:126-131
Timestamp: 2026-02-11T16:50:14.167Z
Learning: In apps/webapp/app/routes/resources.orgs.$organizationSlug.projects.$projectParam.env.$envParam.dashboards.$dashboardId.widgets.tsx, MetricsDashboard entities are intentionally scoped to the organization level, not the project level. The dashboard lookup should filter by organizationId only (not projectId), allowing dashboards to be accessed across projects within the same organization. The optional projectId field on MetricsDashboard serves other purposes and should not be used as an authorization constraint.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-03-22T13:26:12.060Z
Learnt from: ericallam
Repo: triggerdotdev/trigger.dev PR: 3244
File: apps/webapp/app/components/code/TextEditor.tsx:81-86
Timestamp: 2026-03-22T13:26:12.060Z
Learning: In the triggerdotdev/trigger.dev codebase, do not flag `navigator.clipboard.writeText(...)` calls for `missing-await`/`unhandled-promise` issues. These clipboard writes are intentionally invoked without `await` and without `catch` handlers across the project; keep that behavior consistent when reviewing TypeScript/TSX files (e.g., usages like in `apps/webapp/app/components/code/TextEditor.tsx`).

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
📚 Learning: 2026-03-22T19:24:14.403Z
Learnt from: matt-aitken
Repo: triggerdotdev/trigger.dev PR: 3187
File: apps/webapp/app/v3/services/alerts/deliverErrorGroupAlert.server.ts:200-204
Timestamp: 2026-03-22T19:24:14.403Z
Learning: In the triggerdotdev/trigger.dev codebase, webhook URLs are not expected to contain embedded credentials/secrets (e.g., fields like `ProjectAlertWebhookProperties` should only hold credential-free webhook endpoints). During code review, if you see logging or inclusion of raw webhook URLs in error messages, do not automatically treat it as a credential-leak/secrets-in-logs issue by default—first verify the URL does not contain embedded credentials (for example, no username/password in the URL, no obvious secret/token query params or fragments). If the URL is credential-free per this project’s conventions, allow the logging.

Applied to files:

  • apps/webapp/app/routes/api.v1.sessions.ts
🔇 Additional comments (3)
apps/webapp/app/routes/api.v1.sessions.ts (3)

27-86: Loader LGTM.

Cursor direction derivation (page[before] → backward, else forward), super-scope authorization tuple, and the conditional next/previous spread all line up with the repository contract in clickhouseSessionsRepository.server.ts. The projectId/environmentType/organizationId re-injection on line 75-77 before serializeSession is a small smell (those fields aren't actually consumed by serializeSession per services/realtime/sessions.server.ts:38-57 — they only exist to satisfy the as Session cast), but not worth churning.


155-161: Error-handling fix looks good.

ServiceValidationError → 422 preserved; everything else is logged server-side and returns a generic 500 body. Matches the PR commit-message intent and closes the prior raw-error.message leak.


108-131: No issue here — update: {} does not trigger writes or @updatedAt updates.

With update: {} in the upsert, Prisma performs a SELECT to check existence and stops. It does not generate or execute an UPDATE statement, so @updatedAt is not refreshed and no replication event is emitted. This is the documented way to emulate findOrCreate behavior. The current code is correct and requires no changes.


Walkthrough

Adds a durable Session primitive across the stack: a Prisma Session model and migration, ClickHouse sessions_v1 table and ClickHouse client helpers, new ClickHouse-backed SessionsRepository, a SessionsReplicationService that streams Postgres logical replication into ClickHouse, session-friendly ID and API schemas in core, multiple REST and realtime routes for session CRUD and streaming/append, environment config and startup wiring for replication, session helper utilities, and end-to-end replication tests.

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120 minutes

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 20.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title 'feat: Sessions - bidirectional durable agent streams' clearly and concisely summarizes the main change: introduction of a new Session primitive for bidirectional communication.
Description check ✅ Passed The PR description is comprehensive and well-structured, covering what is enabled, use cases, public API surface, implementation details, verification steps, and test plan, but the author did not complete the required checklist template sections.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feature/tri-8627-session-primitive-server-side-schema-routes-clickhouse

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

…te/update

The session_ prefix identifies internal friendlyIds. Allowing it in a
user-supplied externalId would misroute subsequent GET/PATCH/close
requests through resolveSessionByIdOrExternalId to a friendlyId lookup,
returning null or the wrong session. Reject at the schema boundary so
both routes surface a clean 422.
Without allowJWT/corsStrategy, frontend clients holding public access
tokens hit 401 on GET /api/v1/sessions and browser preflights fail.
Matches the single-session GET/PATCH/close routes and the runs list
endpoint.
coderabbitai[bot]

This comment was marked as resolved.

devin-ai-integration[bot]

This comment was marked as resolved.

- Derive isCached from the upsert result (id mismatch = pre-existing row)
  instead of doing a separate findFirst first. The pre-check was racy —
  two concurrent first-time POSTs could both return 201 with
  isCached: false. Using the returned row's id is atomic and saves a
  round-trip.

- Scope the list endpoint's authorization to the standard action/resource
  pattern (matches api.v1.runs.ts): task-scoped JWTs can list sessions
  filtered by their task, and broader super-scopes (read:sessions,
  read:all, admin) authorize unfiltered listing.

- Log and swallow unexpected errors on POST rather than returning the
  raw error.message. Prisma/internal messages can leak column names and
  query fragments.
Copy link
Copy Markdown
Contributor

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 2 new potential issues.

View 9 additional findings in Devin Review.

Open in Devin Review

Comment thread apps/webapp/app/services/sessionsReplicationService.server.ts
Comment thread apps/webapp/app/routes/api.v1.sessions.ts
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants