technical2026-04-286 min read

WebSocket Streaming API Deep Dive

By Edward Monzon

Why streaming matters

When a user sends a message to an AI agent, they don't want to wait 5 seconds for a complete response to appear all at once. They want to see tokens arrive in real time — word by word, like watching someone type.

Streaming transforms the user experience from "is it thinking?" to an engaging, responsive conversation. It reduces perceived latency by 80%+ and keeps users engaged.

ClawDeploy supports two streaming protocols: WebSocket and Server-Sent Events (SSE).

Choosing a protocol

Feature	WebSocket	SSE
Direction	Bidirectional	Server to client only
Browser support	Universal	Universal (except IE)
Connection	Persistent, full-duplex	Persistent, half-duplex
Reconnection	Manual (you implement)	Automatic (built into EventSource)
Best for	Chat UIs with typing indicators	Simple integrations, read-only feeds
Proxy/CDN support	Varies (needs upgrade header)	Excellent (plain HTTP)

Recommendation: Use WebSocket for interactive chat UIs. Use SSE for server-side integrations and simpler clients.

WebSocket API

Connecting

const ws = new WebSocket(
  'wss://my-agent.clawdeploy.cuemby.io/api/ws'
);

// Authenticate on open
ws.addEventListener('open', () => {
  ws.send(JSON.stringify({
    type: 'auth',
    token: 'YOUR_API_KEY'
  }));
});

Sending messages

ws.send(JSON.stringify({
  type: 'message',
  content: 'Summarize the Q3 report',
  conversationId: 'conv_abc123' // optional
}));

Receiving streaming responses

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);

  switch (data.type) {
    case 'token':
      // Individual token — append to UI
      process.stdout.write(data.content);
      break;

    case 'message_complete':
      // Full response assembled
      console.log('Full response:', data.content);
      console.log('Usage:', data.usage);
      break;

    case 'tool_call':
      // Agent is calling an MCP tool
      console.log('Calling:', data.tool, data.args);
      break;

    case 'error':
      console.error('Error:', data.message);
      break;
  }
});

Event types

Event	Description	Key fields
`auth_ok`	Authentication successful	`agentId`, `agentName`
`token`	Single token in the stream	`content`, `index`
`message_complete`	Full response done	`content`, `usage`, `conversationId`
`tool_call`	Agent invoking MCP tool	`tool`, `args`
`tool_result`	Tool execution result	`tool`, `result`
`typing`	Agent is processing	`status`: start or stop
`error`	Error occurred	`message`, `code`

SSE API

Connecting

const response = await fetch(
  'https://my-agent.clawdeploy.cuemby.io/api/chat/stream',
  {
    method: 'POST',
    headers: {
      'Authorization': 'Bearer YOUR_API_KEY',
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({
      message: 'Summarize the Q3 report',
      conversationId: 'conv_abc123'
    }),
  }
);

const reader = response.body.getReader();
const decoder = new TextDecoder();

while (true) {
  const { done, value } = await reader.read();
  if (done) break;

  const chunk = decoder.decode(value, { stream: true });
  for (const line of chunk.split('\n')) {
    if (line.startsWith('data: ')) {
      const data = JSON.parse(line.slice(6));
      if (data.type === 'token') {
        process.stdout.write(data.content);
      }
    }
  }
}

Connection management

Heartbeat / keep-alive

ClawDeploy sends a ping every 30 seconds on WebSocket connections. Your client should respond with pong:

ws.addEventListener('message', (event) => {
  const data = JSON.parse(event.data);
  if (data.type === 'ping') {
    ws.send(JSON.stringify({ type: 'pong' }));
  }
});

Automatic reconnection

For WebSocket, implement exponential backoff:

let reconnectDelay = 1000;

function connect() {
  const ws = new WebSocket(WS_URL);

  ws.addEventListener('open', () => {
    reconnectDelay = 1000; // Reset on success
  });

  ws.addEventListener('close', () => {
    setTimeout(() => {
      reconnectDelay = Math.min(reconnectDelay * 2, 30000);
      connect();
    }, reconnectDelay);
  });
}

SSE via EventSource handles reconnection automatically — no extra code needed.

Rate limiting

Limit	Value
Max concurrent WebSocket connections per agent	100
Max messages per minute per connection	30
Max message size	32 KB
Connection timeout (idle)	5 minutes

When rate-limited, you receive an error event with a retryAfter field in milliseconds.

Integration patterns

Pattern 1: Chat widget (WebSocket)

The built-in ClawDeploy chat widget uses WebSocket internally. For custom UIs, the WebSocket API gives you the same real-time experience with full control over rendering.

Pattern 2: Backend integration (SSE)

For server-to-server integrations (Slack bot, email responder), SSE is simpler — one HTTP request, stream the response, post the result.

Pattern 3: Batch processing (REST)

For non-interactive use cases, use the standard REST API:

curl -X POST https://my-agent.clawdeploy.cuemby.io/api/chat \
  -H "Authorization: Bearer YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"message": "Classify this ticket: [ticket text]"}'

This waits for the full response. Simpler, but no streaming.

Usage tracking in streams

Every message_complete event includes usage data:

{
  "type": "message_complete",
  "content": "Here is the summary...",
  "conversationId": "conv_abc123",
  "usage": {
    "inputTokens": 1420,
    "outputTokens": 387,
    "totalTokens": 1807,
    "cost": 0.0112,
    "model": "claude-sonnet-4-6",
    "latencyMs": 1240
  }
}

Use this for client-side cost tracking, latency monitoring, and usage dashboards.

Get started

Sign up for ClawDeploy
Create an agent and grab your API key from Settings → API Keys
Connect via WebSocket or SSE using the examples above
Check the full API docs for advanced options

Start building →

Ready to deploy your first agent?

No credit card required. Free 7-day trial.

Get started free →