18.1 WebSocket Fundamentals

The WebSocket protocol provides full-duplex, bidirectional communication over a single TCP connection. This section covers the protocol fundamentals, connection lifecycle, frame structure, and the HTTP upgrade mechanism.

Protocol Overview

WebSocket enables persistent, low-latency communication between clients and servers, eliminating the request-response overhead of HTTP. The protocol begins with an HTTP upgrade handshake and then switches to the WebSocket protocol for frame-based messaging.

Key Characteristics:

  • Full-duplex: simultaneous bidirectional communication
  • Frame-based: structured binary protocol
  • Persistent connection: single TCP connection lifetime
  • Low overhead: minimal header size after initial upgrade
  • Binary-safe: supports both text and binary data

Protocol Versions: WebSocket (RFC 6455) is the standard version implemented in Java's HTTP client. Unlike HTTP versions, WebSocket has a single stable version that supports both HTTP/1.1 and HTTP/2 as underlying transport.

HTTP Upgrade Handshake

WebSocket connections begin with an HTTP upgrade request. The client sends an HTTP/1.1 request with special upgrade headers:

// Client initiates upgrade request
GET /chat HTTP/1.1
Host: server.example.com
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Key: dGhlIHNhbXBsZSBub25jZQ==
Sec-WebSocket-Version: 13
Sec-WebSocket-Extensions: permessage-deflate

Key Upgrade Headers:

  • Upgrade: websocket - indicates WebSocket protocol upgrade
  • Connection: Upgrade - persistent connection required
  • Sec-WebSocket-Key - base64-encoded random 16-byte nonce
  • Sec-WebSocket-Version - protocol version (13 is current)
  • Sec-WebSocket-Extensions - optional compression extensions

The server responds with a 101 status code and derives the response key:

// Server response
HTTP/1.1 101 Switching Protocols
Upgrade: websocket
Connection: Upgrade
Sec-WebSocket-Accept: s3pPLMBiTxaQ9kYGzzhZRbK+xOo=
Sec-WebSocket-Extensions: permessage-deflate

Response Key Derivation:

public class WebSocketKeyUtils {
    private static final String MAGIC = "258EAFA5-E914-47DA-95CA-C5AB0DC85B11";

    public static String deriveResponseKey(String clientKey) {
        try {
            String concatenated = clientKey + MAGIC;
            MessageDigest sha1 = MessageDigest.getInstance("SHA-1");
            byte[] hash = sha1.digest(concatenated.getBytes(StandardCharsets.UTF_8));
            return Base64.getEncoder().encodeToString(hash);
        } catch (NoSuchAlgorithmException e) {
            throw new RuntimeException(e);
        }
    }
}

// Usage
String clientKey = "dGhlIHNhbXBsZSBub25jZQ==";
String responseKey = WebSocketKeyUtils.deriveResponseKey(clientKey);
// responseKey = "s3pPLMBiTxaQ9kYGzzhZRbK+xOo="

Frame Structure

Once the upgrade completes, communication switches to WebSocket frames. Each frame contains metadata and payload data.

Frame Header Format:

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-------+-+-------------+-------------------------------+
|F|R|R|R| opcode|M| Payload len |    Extended payload length    |
|I|S|S|S|(4 bits)|A|   (7 bits)  |         (0/16/64 bits)        |
|N|V|V|V|       |S|             |                               |
| |1|2|3|       | K|             |                               |
+-+-+-+-+-------+-+-------------+-------------------------------+
|     Extended payload length continued, if payload len = 127    |
+-------------------------------+-------------------------------+
|                                |Masking-key, if MASK set      |
+-------------------------------+-------------------------------+
|                                |                               |
|          Payload Data          |         Payload Data cont.    |
+-------------------------------+-------------------------------+

Frame Fields:

  • FIN (1 bit): Final fragment of message (1 = final, 0 = more frames coming)
  • RSV1-RSV3 (3 bits): Reserved for extensions (normally 0)
  • Opcode (4 bits): Frame type (see below)
  • MASK (1 bit): Payload masked (1 = masked, required for client→server)
  • Payload Length (7/16/64 bits): Message size encoding
  • Masking Key (32 bits): Client-to-server XOR mask key
  • Payload Data: Actual message data

Opcode Values:

public enum FrameOpcode {
    CONTINUATION(0x0),      // Continuation frame
    TEXT(0x1),              // Text frame
    BINARY(0x2),            // Binary frame
    CLOSE(0x8),             // Close frame
    PING(0x9),              // Ping frame (keep-alive)
    PONG(0xA);              // Pong frame (ping response)

    private final int code;

    FrameOpcode(int code) {
        this.code = code;
    }

    public int getCode() {
        return code;
    }
}

Control Frames:

  • Close frames (0x8): Initiate connection closure with status code and reason
  • Ping frames (0x9): Server-initiated keep-alive probes
  • Pong frames (0xA): Client responses to ping

Text vs Binary Frames

WebSocket frames can carry text or binary data. The first frame's opcode determines the message type.

Text Frames: Text frames (opcode 0x1) contain UTF-8 encoded text. Multiple frames can form a single message using continuation frames.

public class TextFrameHandler {
    private WebSocket webSocket;
    private StringBuilder messageBuffer = new StringBuilder();

    public void onText(CharSequence data, boolean isLast) {
        messageBuffer.append(data);

        if (isLast) {
            String completeMessage = messageBuffer.toString();
            processMessage(completeMessage);
            messageBuffer = new StringBuilder();
        }
    }

    private void processMessage(String message) {
        // Process complete text message
        System.out.println("Received: " + message);
    }
}

Binary Frames: Binary frames (opcode 0x2) contain raw binary data without encoding constraints. Useful for efficient data transfer (images, compressed content, custom serialization).

public class BinaryFrameHandler {
    private List<ByteBuffer> messageFrames = new ArrayList<>();

    public void onBinary(ByteBuffer data, boolean isLast) {
        messageFrames.add(data);

        if (isLast) {
            ByteBuffer completeMessage = assembleMessage();
            processBinaryData(completeMessage);
            messageFrames.clear();
        }
    }

    private ByteBuffer assembleMessage() {
        int totalSize = messageFrames.stream()
            .mapToInt(ByteBuffer::remaining)
            .sum();

        ByteBuffer combined = ByteBuffer.allocate(totalSize);
        messageFrames.forEach(combined::put);
        combined.flip();

        return combined;
    }

    private void processBinaryData(ByteBuffer data) {
        // Process complete binary message
        byte[] bytes = new byte[data.remaining()];
        data.get(bytes);
        System.out.println("Received binary: " + bytes.length + " bytes");
    }
}

Fragmentation and Continuation

Large messages are split into multiple frames using continuation frames (opcode 0x0).

public class FragmentationExample {
    private WebSocket webSocket;

    // Client sends fragmented message
    public void sendLargeMessage(String message) {
        String[] chunks = message.split("(?<=\\G.{1000})"); // Split into 1KB chunks

        for (int i = 0; i < chunks.length; i++) {
            boolean isLast = (i == chunks.length - 1);
            webSocket.sendText(chunks[i], isLast);
        }
    }

    // Server receives fragmented message
    private StringBuilder incomingMessage = new StringBuilder();

    public CompletionStage<?> onText(WebSocket webSocket, CharSequence data, boolean isLast) {
        incomingMessage.append(data);

        if (isLast) {
            String complete = incomingMessage.toString();
            incomingMessage = new StringBuilder();
            // Process complete message
            return processMessage(complete);
        }
        return CompletableFuture.completedStage(null);
    }

    private CompletionStage<?> processMessage(String message) {
        System.out.println("Received complete message: " + message.length() + " chars");
        return CompletableFuture.completedStage(null);
    }
}

Connection Lifecycle

WebSocket connections follow a state machine:

States:

  1. CONNECTING: Initial upgrade handshake in progress
  2. OPEN: Upgrade completed, bidirectional communication active
  3. CLOSING: Close frame sent/received, cleanup in progress
  4. CLOSED: Connection terminated
public class ConnectionLifecycleListener implements WebSocket.Listener {
    private enum State {
        CONNECTING, OPEN, CLOSING, CLOSED
    }

    private State state = State.CONNECTING;
    private long connectedTime;

    @Override
    public void onOpen(WebSocket webSocket) {
        state = State.OPEN;
        connectedTime = System.currentTimeMillis();
        System.out.println("WebSocket OPEN");
    }

    @Override
    public CompletionStage<?> onText(WebSocket webSocket, CharSequence data, boolean last) {
        if (state == State.OPEN) {
            System.out.println("Message: " + data);
            webSocket.request(1); // Request next message
        }
        return CompletableFuture.completedStage(null);
    }

    @Override
    public CompletionStage<?> onBinary(WebSocket webSocket, ByteBuffer data, boolean last) {
        if (state == State.OPEN) {
            System.out.println("Binary data: " + data.remaining() + " bytes");
            webSocket.request(1);
        }
        return CompletableFuture.completedStage(null);
    }

    @Override
    public CompletionStage<?> onClose(WebSocket webSocket, int statusCode, String reason) {
        if (state == State.OPEN) {
            state = State.CLOSING;
            System.out.println("Close initiated: " + statusCode + " " + reason);
        } else if (state == State.CLOSING) {
            state = State.CLOSED;
            System.out.println("Connection closed");
        }
        return CompletableFuture.completedStage(null);
    }

    @Override
    public void onError(WebSocket webSocket, Throwable error) {
        System.err.println("WebSocket error: " + error.getMessage());
        state = State.CLOSED;
    }
}

Keep-Alive with Ping/Pong

Servers periodically send ping frames; clients respond with pong frames to maintain connection health.

public class PingPongHandler implements WebSocket.Listener {
    private ScheduledExecutorService scheduler = Executors.newScheduledThreadPool(1);
    private WebSocket webSocket;
    private long lastPongTime = System.currentTimeMillis();

    @Override
    public void onOpen(WebSocket webSocket) {
        this.webSocket = webSocket;

        // Expect pings every 30 seconds
        scheduler.scheduleAtFixedRate(() -> {
            long timeSinceLastPong = System.currentTimeMillis() - lastPongTime;
            if (timeSinceLastPong > 60_000) { // 60 second timeout
                System.out.println("Pong timeout detected, closing connection");
                webSocket.sendClose(WebSocket.ABNORMAL_CLOSURE, "Pong timeout");
            }
        }, 30, 30, TimeUnit.SECONDS);

        webSocket.request(1);
    }

    @Override
    public CompletionStage<?> onPing(WebSocket webSocket, ByteBuffer message) {
        // Automatically respond to ping with pong
        return webSocket.sendPong(message).thenRun(() -> {
            System.out.println("Sent pong");
            webSocket.request(1);
        });
    }

    @Override
    public CompletionStage<?> onPong(WebSocket webSocket, ByteBuffer message) {
        // Track pong reception for timeout detection
        lastPongTime = System.currentTimeMillis();
        System.out.println("Received pong");
        webSocket.request(1);
        return CompletableFuture.completedStage(null);
    }

    public void close() {
        scheduler.shutdown();
    }
}

Close Frames and Status Codes

Close frames contain a 2-byte status code and optional UTF-8 reason text.

Standard Status Codes:

public class WebSocketCloseCodes {
    // 1000-1003: Protocol-defined status codes
    public static final int NORMAL_CLOSURE = 1000;           // Normal closure
    public static final int GOING_AWAY = 1001;               // Endpoint disappearing
    public static final int PROTOCOL_ERROR = 1002;           // Protocol error
    public static final int UNSUPPORTED_DATA = 1003;         // Unsupported data type

    // 1007-1009: Connection state errors
    public static final int INVALID_FRAME_PAYLOAD = 1007;    // Invalid frame payload
    public static final int POLICY_VIOLATION = 1008;         // Policy violation
    public static final int MESSAGE_TOO_BIG = 1009;          // Message too large

    // 1010-1011: Server errors
    public static final int MISSING_EXTENSION = 1010;        // Extension missing
    public static final int INTERNAL_SERVER_ERROR = 1011;    // Internal server error

    // 1012-1015: Connection issues (server-only codes)
    public static final int SERVICE_RESTART = 1012;          // Service restart
    public static final int TRY_AGAIN_LATER = 1013;          // Try again later
    public static final int TLS_HANDSHAKE_ERROR = 1015;      // TLS error (unreliable)
}

public class GracefulCloseHandler {
    private WebSocket webSocket;

    public void closeGracefully() {
        // Send close frame with normal closure code
        webSocket.sendClose(
            WebSocketCloseCodes.NORMAL_CLOSURE,
            "Closing connection"
        );
    }

    public CompletionStage<?> handleRemoteClose(
            WebSocket webSocket, int statusCode, String reason) {
        System.out.println("Remote close: " + statusCode + " - " + reason);

        if (statusCode == WebSocketCloseCodes.NORMAL_CLOSURE) {
            System.out.println("Normal closure");
        } else if (statusCode >= 4000 && statusCode <= 4999) {
            System.out.println("Application-defined error");
        }

        // WebSocket automatically responds to close
        return CompletableFuture.completedStage(null);
    }
}

Masking for Security

Client-to-server frames are masked to prevent cache poisoning attacks. Each client uses a unique 32-bit masking key per frame.

public class MaskingExample {
    /**
     * Apply XOR mask to frame payload
     */
    public static byte[] applyMask(byte[] payload, byte[] maskKey) {
        byte[] masked = new byte[payload.length];
        for (int i = 0; i < payload.length; i++) {
            masked[i] = (byte) (payload[i] ^ maskKey[i % 4]);
        }
        return masked;
    }

    /**
     * Generate random 32-bit masking key
     */
    public static byte[] generateMaskKey() {
        byte[] maskKey = new byte[4];
        new SecureRandom().nextBytes(maskKey);
        return maskKey;
    }

    public static void main(String[] args) {
        String message = "Hello, WebSocket!";
        byte[] payload = message.getBytes(StandardCharsets.UTF_8);
        byte[] maskKey = generateMaskKey();
        byte[] masked = applyMask(payload, maskKey);
        byte[] unmasked = applyMask(masked, maskKey); // Apply mask twice = original

        System.out.println("Original: " + message);
        System.out.println("Masked length: " + masked.length);
        System.out.println("Unmasked: " + new String(unmasked));
    }
}

HTTP/2 WebSocket Support

WebSocket over HTTP/2 (RFC 8441) provides connection multiplexing benefits while maintaining WebSocket semantics.

public class Http2WebSocketClient {
    private HttpClient httpClient;

    public Http2WebSocketClient() {
        // Configure HttpClient to prefer HTTP/2
        this.httpClient = HttpClient.newBuilder()
            .version(HttpClient.Version.HTTP_2)
            .build();
    }

    public CompletableFuture<WebSocket> connectViaHttp2(String uri) {
        // WebSocket over HTTP/2 uses different URI scheme
        // ws: becomes http2 (implicit)
        // wss: becomes https with HTTP/2

        return httpClient.newWebSocketBuilder()
            .buildAsync(
                URI.create(uri),
                new WebSocketListener()
            );
    }

    private static class WebSocketListener implements WebSocket.Listener {
        @Override
        public void onOpen(WebSocket webSocket) {
            System.out.println("HTTP/2 WebSocket connected");
            webSocket.request(1);
        }

        @Override
        public CompletionStage<?> onText(WebSocket webSocket, CharSequence data, boolean last) {
            System.out.println("Message: " + data);
            webSocket.request(1);
            return CompletableFuture.completedStage(null);
        }

        @Override
        public void onError(WebSocket webSocket, Throwable error) {
            error.printStackTrace();
        }
    }
}

Summary

The WebSocket protocol provides efficient bidirectional communication over a persistent TCP connection initiated via HTTP upgrade. Key concepts include:

  • Protocol Versioning: Single stable version with HTTP/1.1 and HTTP/2 support
  • Upgrade Handshake: HTTP upgrade with header-based key derivation
  • Frame Types: Text, binary, and control frames with continuation support
  • Lifecycle Management: CONNECTING → OPEN → CLOSING → CLOSED states
  • Keep-Alive: Ping/pong frames for connection health monitoring
  • Security: XOR masking for client-to-server frames
  • Status Codes: Standardized close codes for graceful termination

Understanding these fundamentals is essential for implementing robust WebSocket clients in Java.