Skip to main content

Retries

Failed deliveries are retried up to three times, then dropped. There is no dead-letter view, no auto-pause, no per-subscription failure counter, and no replay button.

What counts as a failure

Any of the following fails a delivery:

  • A non-2xx HTTP status code from the receiver.
  • A connection error: DNS failure, TCP reset, TLS handshake failure, read timeout, connect timeout.

4xx and 5xx are treated identically. Retry-After is not honoured. A 400 from a misconfigured receiver triggers the same retry chain as a 503 from a temporarily overloaded one.

Retry schedule

A failed delivery is rescheduled with a fixed backoff:

AttemptDelay before retry
1 (initial)(none)
25 minutes
310 minutes
420 minutes

After the fourth attempt fails, the delivery is dropped silently into the server error log. Total elapsed time before drop is roughly 35 minutes. If your receiver is down longer than that, those events are gone — design your receiver to tolerate gaps and re-fetch through the REST API to fill in.

Failure observability

Each delivery exception is logged to the server error log. To see them:

  • AdminCP → Logs → Server error log. Filter for messages starting with Webhook to <url> failed:.
  • The job runner output also shows retried jobs while they are pending in the queue.

There is no per-webhook delivery list, no most-recent-status field, and no UI for replaying a failed delivery. If you need that level of observability, log every received webhook in your receiver.

What does NOT happen

  • Subscriptions are not auto-paused after consecutive failures. A permanently broken receiver will keep generating retry traffic until you flip the Active toggle in the admin UI.
  • Failed deliveries are not queued in a dead-letter view. Once the third retry fails, the body is gone.
  • Replay is not available. There is no "redeliver this" button. If you need to recover missed events, query the REST API at the receiver and reconcile against your local state.
  • Retry-After is not honoured. A 429 with Retry-After: 60 is retried at the fixed schedule, not after 60 seconds.

If you need any of these features, treat them as receiver-side concerns. Persist every delivery you receive, ack quickly, and reconcile against the REST API on a schedule.

Practical advice for receivers

  • Ack fast. The HTTP client waits a few seconds before counting the request as failed. Do the minimum to validate the secret, write the body to a queue, and return 200. Process the body asynchronously.
  • Tolerate duplicates. Retries deliver the identical body. If a 200 went out but the sender never saw it, it will retry. Make your handler idempotent on the (content_type, content_id, event) tuple.
  • Tolerate gaps. Three retries over 35 minutes is the entire safety net. For data you cannot lose, treat webhooks as a near-real-time fast path and reconcile against the REST API for completeness.
  • Pin to specific events. A "*" filter receives every event, including ones added in future releases. If you only act on mc_dm_version.publish, list it explicitly.

Operating playbook

When a receiver goes down:

  1. Disable the webhook in AdminCP → Setup → Webhooks so retries stop adding to the load.
  2. Fix the receiver.
  3. Re-enable the webhook to resume new events.
  4. Reconcile missed events through the REST API for the duration the receiver was down.

Reconciliation is a receiver responsibility.