technical2026-04-286 min read

WebSocket Streaming API Deep Dive

By Edward Monzon


Why streaming matters

When a user sends a message to an AI agent, they don't want to wait 5 seconds for a complete response to appear all at once. They want to see tokens arrive in real time — word by word, like watching someone type.

Streaming transforms the user experience from "is it thinking?" to an engaging, responsive conversation. It reduces perceived latency by 80%+ and keeps users engaged.

ClawDeploy supports two streaming protocols: WebSocket and Server-Sent Events (SSE).


Choosing a protocol

Feature WebSocket SSE
Direction Bidirectional Server to client only
Browser support Universal Universal (except IE)
Connection Persistent, full-duplex Persistent, half-duplex
Reconnection Manual (you implement) Automatic (built into EventSource)
Best for Chat UIs with typing indicators Simple integrations, read-only feeds
Proxy/CDN support Varies (needs upgrade header) Excellent (plain HTTP)

Recommendation: Use WebSocket for interactive chat UIs. Use SSE for server-side integrations and simpler clients.


WebSocket API

Connecting

const ws = new WebSocket(
  'wss://my-agent.clawdeploy.cuemby.io/api/ws'
);

// Authenticate on open
ws.addEventListener('open', () => {
  ws.send(JSON.stringify({
    type: 'auth',
    token: 'YOUR_API_KEY'
  }));
});

Sending messages

ws.send(JSON.stringify({
  type: 'message',
  content: 'Summarize the Q3 report',
  conversationId: 'conv_abc123' // optional
}));

Receiving streaming responses

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);

  switch (data.type) {
    case 'token':
      // Individual token — append to UI
      process.stdout.write(data.content);
      break;

    case 'message_complete':
      // Full response assembled
      console.log('Full response:', data.content);
      console.log('Usage:', data.usage);
      break;

    case 'tool_call':
      // Agent is calling an MCP tool
      console.log('Calling:', data.tool, data.args);
      break;

    case 'error':
      console.error('Error:', data.message);
      break;
  }
});

Event types

Event Description Key fields
auth_ok Authentication successful agentId, agentName
token Single token in the stream content, index
message_complete Full response done content, usage, conversationId
tool_call Agent invoking MCP tool tool, args
tool_result Tool execution result tool, result
typing Agent is processing status: start or stop
error Error occurred message, code

SSE API

Connecting

const response = await fetch(
  'https://my-agent.clawdeploy.cuemby.io/api/chat/stream',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      message: 'Summarize the Q3 report',
      conversationId: 'conv_abc123'
    }),
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value, { stream: true });
  for (const line of chunk.split('\n')) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.type === 'token') {
        process.stdout.write(data.content);
      }
    }
  }
}

Connection management

Heartbeat / keep-alive

ClawDeploy sends a ping every 30 seconds on WebSocket connections. Your client should respond with pong:

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'ping') {
    ws.send(JSON.stringify({ type: 'pong' }));
  }
});

Automatic reconnection

For WebSocket, implement exponential backoff:

let reconnectDelay = 1000;

function connect() {
  const ws = new WebSocket(WS_URL);

  ws.addEventListener('open', () => {
    reconnectDelay = 1000; // Reset on success
  });

  ws.addEventListener('close', () => {
    setTimeout(() => {
      reconnectDelay = Math.min(reconnectDelay * 2, 30000);
      connect();
    }, reconnectDelay);
  });
}

SSE via EventSource handles reconnection automatically — no extra code needed.


Rate limiting

Limit Value
Max concurrent WebSocket connections per agent 100
Max messages per minute per connection 30
Max message size 32 KB
Connection timeout (idle) 5 minutes

When rate-limited, you receive an error event with a retryAfter field in milliseconds.


Integration patterns

Pattern 1: Chat widget (WebSocket)

The built-in ClawDeploy chat widget uses WebSocket internally. For custom UIs, the WebSocket API gives you the same real-time experience with full control over rendering.

Pattern 2: Backend integration (SSE)

For server-to-server integrations (Slack bot, email responder), SSE is simpler — one HTTP request, stream the response, post the result.

Pattern 3: Batch processing (REST)

For non-interactive use cases, use the standard REST API:

curl -X POST https://my-agent.clawdeploy.cuemby.io/api/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message": "Classify this ticket: [ticket text]"}'

This waits for the full response. Simpler, but no streaming.


Usage tracking in streams

Every message_complete event includes usage data:

{
  "type": "message_complete",
  "content": "Here is the summary...",
  "conversationId": "conv_abc123",
  "usage": {
    "inputTokens": 1420,
    "outputTokens": 387,
    "totalTokens": 1807,
    "cost": 0.0112,
    "model": "claude-sonnet-4-6",
    "latencyMs": 1240
  }
}

Use this for client-side cost tracking, latency monitoring, and usage dashboards.


Get started

  1. Sign up for ClawDeploy
  2. Create an agent and grab your API key from Settings → API Keys
  3. Connect via WebSocket or SSE using the examples above
  4. Check the full API docs for advanced options

Start building →

Ready to deploy your first agent?

No credit card required. Free 7-day trial.

Get started free →