Architecture

Component interactions, data locations, and request flows.

Service topology

Slack ───────────────────┐
GitHub Issues ──────┐    │
Jira / Shortcut ─┐  │    │
Custom webhooks ─┼──┼────┤
                 ▼  ▼    ▼
            ┌────────────────────┐         ┌────────────────────┐
            │  Platform Backend  │ ◀──────▶│ Platform Frontend  │
            │  (Express, :8888)  │  REST   │  (React, Amplify)  │
            └─────┬──────┬───────┘         └────────────────────┘
                  │      │
                  │      └──────────────┐
                  │                     │
            invoke│                     │ persist
                  ▼                     ▼
        ┌─────────────────────┐    ┌──────────────┐
        │ Worker Execution    │    │  PostgreSQL  │
        │ Service             │    │     (RDS)    │
        │ ─ ECS / Lambda /    │    └──────────────┘
        │   Docker invokers   │
        └─────────┬───────────┘
                  │
                  ▼
        ┌─────────────────────┐
        │  Viberator Worker   │
        │  (agent harness +   │
        │   git + SCM API)    │
        └─────────┬───────────┘
                  │
                  ▼
            GitHub / GitLab

The frontend never talks directly to workers. Workers never talk directly to the frontend. Everything goes through the backend, which is the single source of truth.

Technology stack

Backend (apps/platform-backend)

  • Node.js 24+, TypeScript 5.3.
  • Express 4 with Helmet, express-rate-limit, Multer, Joi, Passport (passport-local + a custom strategy).
  • Kysely 0.27 for type-safe Postgres queries; migrations in src/migrations/.
  • AWS SDK v3 clients for ECS, Lambda, S3, SSM, and CloudWatch.
  • @chat-adapter/slack 4.22 with @chat-adapter/state-pg 4.22 for Slack chat state.
  • Winston with winston-daily-rotate-file for logging.
  • node-cron for the in-process scheduler that drives claws (for deployments that don't use an external scheduler).

Frontend (apps/platform-frontend)

  • Vite 6, TypeScript 5.8, React 19, React Router 7.
  • Tailwind CSS 4, Radix UI Themes and Dialog primitives, motion for animation, sonner for toasts.
  • @heroicons/react and @radix-ui/react-icons for iconography.
  • Jest + @testing-library/react for tests.

Worker (apps/viberator)

  • Node.js 24+, TypeScript 5.0, built with tsup.
  • Agent SDK: @anthropic-ai/claude-code 2.x for Claude Code; the other agents (Qwen, Gemini, Codex, OpenCode, Mistral, Kimi) are CLIs installed globally inside the Docker images.
  • simple-git 3.x for clone/commit/push.
  • AWS SDK clients for S3 and SSM (credential resolution).
  • Winston for structured logs.

Slack package (packages/chat-slack)

Vercel Chat SDK 4.22 wrapped with project-specific handlers for:

  • slashCommand/viberator modal entry point.
  • modalSubmit — ticket creation + agent session launch.
  • threadReply — forwards user replies to the agent.
  • threadMention — handles @viberator callouts.
  • approvalAction — handles Approve / Reject button presses on phase approval cards.

Infrastructure (infra/)

  • Pulumi (TypeScript) with state stored in S3.
  • Three stacks (base, platform, workers) deployed in order.
  • AWS services: VPC, KMS, CloudWatch Logs, ECS Fargate, RDS PostgreSQL multi-AZ, S3, Amplify, ALB, Lambda, ECR, SSM Parameter Store, IAM (with OIDC for GitHub Actions).

Request flow: filing a ticket from the web UI

  1. The frontend or an integration POSTs to /api/tickets with title, description, severity, category, starting phase, and optional media.
  2. The backend validates the body, persists the ticket, and (if auto-fix is on or starting phase ≠ research) immediately calls TicketExecutionService.runTicket().
  3. TicketExecutionService resolves the project's clanker, builds a bootstrap payload (repo URL, branch, credentials, prompt template), creates a jobs row, and hands off to WorkerExecutionService.invokeWorker().
  4. WorkerExecutionService looks up the clanker's deployment strategy and routes to the appropriate invoker (LambdaInvoker, EcsInvoker, or DockerInvoker).
  5. The worker boots, fetches credentials from SSM, runs the agent harness, streams events back to the backend via a callback token, and on success writes a phase document.
  6. The frontend polls (or subscribes to) the job and ticket and updates the UI as new events arrive.

Request flow: filing a ticket from Slack

  1. User runs /viberator in a Slack channel.
  2. Slack POSTs to the platform's /api/slack/commands route. The chat handler returns a modal definition.
  3. User submits the modal. Slack POSTs to /api/slack/interactions. The handler calls createTicket, then launchSession, then posts the thread root and starts a ChatSessionBridgeService.
  4. The bridge polls agent_session_events every ~2 s and posts new events to the Slack thread. Approval requests become button cards. Thread replies go back through replyToSession.
  5. Approve / Reject button clicks become approvalAction calls that update ticket_phase_documents.approval_state and ticket_phase_approvals, and (on approve) launch the next phase.

Local vs production

Locally everything runs in docker-compose: PostgreSQL, the backend, the frontend, and a Docker-based worker. In production, the same backend runs on ECS Fargate, the same worker code runs on Lambda or ECS Fargate (or both, depending on which clankers you provision), and the same frontend code is served from Amplify.

The boundary between local and production is the deployment strategy on each clanker. Switch a clanker from docker to aws_lambda_container and it will start running on Lambda the next time it is invoked, with no application changes.