WebSocket Streaming API Deep Dive
By Edward Monzon
Why streaming matters
When a user sends a message to an AI agent, they don't want to wait 5 seconds for a complete response to appear all at once. They want to see tokens arrive in real time — word by word, like watching someone type.
Streaming transforms the user experience from "is it thinking?" to an engaging, responsive conversation. It reduces perceived latency by 80%+ and keeps users engaged.
ClawDeploy supports two streaming protocols: WebSocket and Server-Sent Events (SSE).
Choosing a protocol
| Feature | WebSocket | SSE |
|---|---|---|
| Direction | Bidirectional | Server to client only |
| Browser support | Universal | Universal (except IE) |
| Connection | Persistent, full-duplex | Persistent, half-duplex |
| Reconnection | Manual (you implement) | Automatic (built into EventSource) |
| Best for | Chat UIs with typing indicators | Simple integrations, read-only feeds |
| Proxy/CDN support | Varies (needs upgrade header) | Excellent (plain HTTP) |
Recommendation: Use WebSocket for interactive chat UIs. Use SSE for server-side integrations and simpler clients.
WebSocket API
Connecting
const ws = new WebSocket(
'wss://my-agent.clawdeploy.cuemby.io/api/ws'
);
// Authenticate on open
ws.addEventListener('open', () => {
ws.send(JSON.stringify({
type: 'auth',
token: 'YOUR_API_KEY'
}));
});
Sending messages
ws.send(JSON.stringify({
type: 'message',
content: 'Summarize the Q3 report',
conversationId: 'conv_abc123' // optional
}));
Receiving streaming responses
ws.addEventListener('message', (event) => {
const data = JSON.parse(event.data);
switch (data.type) {
case 'token':
// Individual token — append to UI
process.stdout.write(data.content);
break;
case 'message_complete':
// Full response assembled
console.log('Full response:', data.content);
console.log('Usage:', data.usage);
break;
case 'tool_call':
// Agent is calling an MCP tool
console.log('Calling:', data.tool, data.args);
break;
case 'error':
console.error('Error:', data.message);
break;
}
});
Event types
| Event | Description | Key fields |
|---|---|---|
auth_ok |
Authentication successful | agentId, agentName |
token |
Single token in the stream | content, index |
message_complete |
Full response done | content, usage, conversationId |
tool_call |
Agent invoking MCP tool | tool, args |
tool_result |
Tool execution result | tool, result |
typing |
Agent is processing | status: start or stop |
error |
Error occurred | message, code |
SSE API
Connecting
const response = await fetch(
'https://my-agent.clawdeploy.cuemby.io/api/chat/stream',
{
method: 'POST',
headers: {
'Authorization': 'Bearer YOUR_API_KEY',
'Content-Type': 'application/json',
},
body: JSON.stringify({
message: 'Summarize the Q3 report',
conversationId: 'conv_abc123'
}),
}
);
const reader = response.body.getReader();
const decoder = new TextDecoder();
while (true) {
const { done, value } = await reader.read();
if (done) break;
const chunk = decoder.decode(value, { stream: true });
for (const line of chunk.split('\n')) {
if (line.startsWith('data: ')) {
const data = JSON.parse(line.slice(6));
if (data.type === 'token') {
process.stdout.write(data.content);
}
}
}
}
Connection management
Heartbeat / keep-alive
ClawDeploy sends a ping every 30 seconds on WebSocket connections. Your client should respond with pong:
ws.addEventListener('message', (event) => {
const data = JSON.parse(event.data);
if (data.type === 'ping') {
ws.send(JSON.stringify({ type: 'pong' }));
}
});
Automatic reconnection
For WebSocket, implement exponential backoff:
let reconnectDelay = 1000;
function connect() {
const ws = new WebSocket(WS_URL);
ws.addEventListener('open', () => {
reconnectDelay = 1000; // Reset on success
});
ws.addEventListener('close', () => {
setTimeout(() => {
reconnectDelay = Math.min(reconnectDelay * 2, 30000);
connect();
}, reconnectDelay);
});
}
SSE via EventSource handles reconnection automatically — no extra code needed.
Rate limiting
| Limit | Value |
|---|---|
| Max concurrent WebSocket connections per agent | 100 |
| Max messages per minute per connection | 30 |
| Max message size | 32 KB |
| Connection timeout (idle) | 5 minutes |
When rate-limited, you receive an error event with a retryAfter field in milliseconds.
Integration patterns
Pattern 1: Chat widget (WebSocket)
The built-in ClawDeploy chat widget uses WebSocket internally. For custom UIs, the WebSocket API gives you the same real-time experience with full control over rendering.
Pattern 2: Backend integration (SSE)
For server-to-server integrations (Slack bot, email responder), SSE is simpler — one HTTP request, stream the response, post the result.
Pattern 3: Batch processing (REST)
For non-interactive use cases, use the standard REST API:
curl -X POST https://my-agent.clawdeploy.cuemby.io/api/chat \
-H "Authorization: Bearer YOUR_API_KEY" \
-H "Content-Type: application/json" \
-d '{"message": "Classify this ticket: [ticket text]"}'
This waits for the full response. Simpler, but no streaming.
Usage tracking in streams
Every message_complete event includes usage data:
{
"type": "message_complete",
"content": "Here is the summary...",
"conversationId": "conv_abc123",
"usage": {
"inputTokens": 1420,
"outputTokens": 387,
"totalTokens": 1807,
"cost": 0.0112,
"model": "claude-sonnet-4-6",
"latencyMs": 1240
}
}
Use this for client-side cost tracking, latency monitoring, and usage dashboards.
Get started
- Sign up for ClawDeploy
- Create an agent and grab your API key from Settings → API Keys
- Connect via WebSocket or SSE using the examples above
- Check the full API docs for advanced options