Retries & Error Handling

Defaults

Every schedule and job has two retry-related fields:

Field	Default	Description
`max_retries`	`3`	Number of retry attempts after first failure. `0` disables retries.
`timeout`	`30`	Seconds before an execution is considered timed out.

Set these per schedule or per job at creation time, or update them on an existing schedule via PATCH.

What triggers a retry

Event	Retries?	Why
Push: endpoint returns 5xx	Yes	Server error, likely transient
Push: endpoint times out	Yes	May be temporary overload
Push: network error (DNS, connection refused)	Yes	Infrastructure issue, likely transient
Push: endpoint returns 4xx	No	Client error. Won’t fix itself on retry
Pull: handler throws an error	Yes	Reported as failure, retry scheduled
Pull: lease expires (no result reported)	Yes	Worker may have crashed
Retries exhausted	No	Job transitions to `failed` permanently

Backoff formula

Retries use exponential backoff with jitter, capped at 1 hour:

baseDelay = min(1000ms × 2^(attempt - 1), 3,600,000ms)
jitter    = random(0, baseDelay)
delay     = min(baseDelay + jitter, 3,600,000ms)

Attempt	Base delay	Actual range
1	1s	1-2s
2	2s	2-4s
3	4s	4-8s
4	8s	8-16s
5	16s	16-32s
10	~17 min	17-34 min
13+	1 hour	exactly 1 hour (capped)

The jitter prevents thundering-herd when many jobs fail simultaneously.

The retry flow

Execution fails
  │
  ├── Retries remaining? ──▶ Yes: schedule new execution after backoff delay
  │                              Job status → retrying
  │                              New execution created with trigger: system_retry
  │
  └── Retries exhausted? ──▶ Job status → failed (terminal)

Each retry creates a new execution. ctx.attempt tells your handler which attempt this is (1-indexed).

Configuring retries

Disable retries entirely

{
  "name": "Fire-and-forget notification",
  "handler": "notify",
  "cron": "0 * * * *",
  "max_retries": 0
}

Increase retries for critical work

{
  "name": "Monthly billing",
  "handler": "charge-customer",
  "cron": "0 0 1 * *",
  "max_retries": 8,
  "timeout": 60
}

With 8 retries, the final attempt happens roughly 4–8 hours after the first failure (due to exponential backoff).

Per-job override

One-off jobs can have different retry settings than their schedule pattern:

curl -X POST https://api.chronos.sh/v1/jobs \
  -H "Authorization: Bearer chrns_your_api_key" \
  -H "Content-Type: application/json" \
  -d '{
    "name": "Critical charge",
    "handler": "charge-customer",
    "max_retries": 10,
    "timeout": 120,
    "payload": { "invoiceId": "inv_123" }
  }'

Timeout behavior

Push delivery

If your endpoint doesn’t respond within timeout seconds, Chronos aborts the request and marks the execution as timeout. A retry is scheduled if attempts remain.

Pull delivery

The SDK does not enforce timeouts. Your handler runs as long as it needs. However, Chronos tracks a lease on the server:

When a job is claimed, a lease is set: lease_expires_at = now + timeout
A background sweep checks for expired leases every 30 seconds
If the lease expired without a result, the execution is marked timeout
A retry is scheduled if attempts remain

If your handler routinely takes longer than 30 seconds, increase timeout so the lease doesn’t expire while you’re still working.

Handling errors in SDK handlers

Your handler’s behavior determines the execution outcome:

chronos.worker.handle('process-payment', async (ctx) => {
  // Throw to trigger a retry (if attempts remain)
  const result = await chargeCustomer(ctx.payload.customerId);
  if (!result.success) {
    throw new Error(`Charge failed: ${result.error}`);
  }

  // Return to mark as completed
  return { chargeId: result.id };
});

Return a value (or void) → execution completed
Throw an error → execution failed, error message captured (truncated to 4KB), retry scheduled if attempts remain

Distinguish retryable vs terminal failures

If you know a failure is permanent (bad data, invalid state), you might want to avoid wasting retries. Since Chronos always retries on handler failure (until exhausted), design your handler to handle terminal cases gracefully:

chronos.worker.handle('send-email', async (ctx) => {
  const user = await db.users.findById(ctx.payload.userId);

  // Terminal: user deleted — no point retrying
  if (!user) {
    console.warn(`User ${ctx.payload.userId} not found, skipping`);
    return { skipped: true, reason: 'user_not_found' };
  }

  // This might throw on transient network issues → retry is appropriate
  await emailService.send(user.email, ctx.payload.template);
  return { sent: true };
});

SDK error classes

When building around the SDK, these error types help you handle different failure modes:

Error	When	Your action
`ChronosConfigError`	Invalid options at construction	Fix your config. Thrown at startup
`ChronosApiError`	API returned an error (non-2xx or `success: false`)	Check `.status` and `.code`
`ChronosRateLimitError`	API returned 429 (subclass of `ChronosApiError`)	Check `.retryAfterSeconds`. Worker handles this automatically
`ChronosNetworkError`	Fetch failed (DNS, TCP, TLS)	Transient. SDK retries poll automatically
`ChronosHandlerError`	Your handler threw	Logged internally, failure reported to API

The SDK handles poll-loop errors internally (logs + retries after retryDelayMs). On 429 responses, the worker honors the Retry-After header instead of retryDelayMs. You don’t need to catch these. The worker keeps running.

Monitoring failures

Use the executions list endpoint to find failed jobs:

curl "https://api.chronos.sh/v1/executions?status=failed" \
  -H "Authorization: Bearer chrns_your_api_key"

Each failed execution includes:

error: the error message (from your handler’s thrown error or the HTTP response)
response_code: HTTP status code (push delivery only)
duration_ms: how long the execution ran before failing
trigger: system (first attempt) or system_retry (retry)