Wait Free Coordination -Zookeeper(Part 2)
Overview of ZooKeeper Service and Client Interaction
ZooKeeper Service
ZooKeeper is a distributed coordination service that provides a hierarchical namespace for storing data nodes (znodes). It helps manage synchronization and configuration data for distributed applications.
Key Terminology
- Client: A user or application that interacts with the ZooKeeper service.
- Server: A process that provides the ZooKeeper service. Multiple ZooKeeper servers form an ensemble.
- Znode: An in-memory data node in the ZooKeeper’s hierarchical namespace (data tree).
- Data Tree: The hierarchical namespace in which znodes are organized.
- Update/Write: Any operation that modifies the state of the data tree (e.g., creating, deleting, or updating znodes).
- Session: A connection established by a client with the ZooKeeper service. Each session has a unique session handle.
Interaction Process
1. Establishing a Connection
- Client Library: Clients use a ZooKeeper client library to interact with the ZooKeeper service.
- Session Creation: When a client wants to connect to ZooKeeper, it establishes a session. This involves connecting to one of the ZooKeeper servers and obtaining a session handle.
2. Submitting Requests
- Client API: The client library provides an API for clients to submit requests to ZooKeeper. These requests can be for various operations like creating, reading, updating, or deleting znodes.
- Network Management: The client library also manages the network connections between the client and the ZooKeeper servers, handling failover and reconnection if needed.
Example Scenario
Let’s walk through an example where a distributed application uses ZooKeeper to manage leader election and configuration updates.
Step-by-Step Example
1. Initial Setup
- ZooKeeper Ensemble: Assume we have a ZooKeeper ensemble running with three servers: ZK Server 1, ZK Server 2, and ZK Server 3.
- Application Servers: There are three application servers: App Server A, App Server B, and App Server C.
2. Clients Establish Connections
- App Server A: Uses the ZooKeeper client library to connect to the ensemble and establish a session.
ZooKeeper zkA = new ZooKeeper("zk-server-1:2181,zk-server-2:2181,zk-server-3:2181", sessionTimeout, watcher);
- Here,
zkA
represents the session handle for App Server A. - App Server B: Similarly establishes a session.
ZooKeeper zkB = new ZooKeeper("zk-server-1:2181,zk-server-2:2181,zk-server-3:2181", sessionTimeout, watcher)
3. Leader Election
- Creating Election Nodes:
- App Server A attempts to create an ephemeral sequential node in the
/election
directory.
zkA.create("/election/leader-", new byte[0], Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
- This might create
/election/leader-0001
. - App Server B does the same, resulting in
/election/leader-0002
.
4. Determining the Leader
- Listing Nodes:
- Both servers list the children of the
/election
node.
List<String> children = zkA.getChildren("/election", false);
Collections.sort(children);
String leader = children.get(0); // leader-000
- App Server A becomes the leader because it created the node with the lowest sequence number.
5. Configuration Update
- Leader Updates Configuration:
- App Server A, now the leader, updates the configuration stored in a znode.
zkA.setData("/config", "configVersion=2.0".getBytes(), -1)
- Watches and Notifications:
- App Servers B and C have set watches on the
/config
znode to get notified of changes.
byte[] configData = zkB.getData("/config", true, null);
- ZooKeeper notifies App Servers B and C:
- When the configuration is updated by the leader, ZooKeeper sends a notification to App Servers B and C.
- They retrieve the new configuration and update their local state.
6. Handling Network and Failover
- Client Library: The ZooKeeper client library manages network connections and ensures that if a connection to one ZooKeeper server is lost, the client reconnects to another server in the ensemble automatically.
Types of Znodes in ZooKeeper
- ZooKeeper allows clients to create two types of znodes: regular znodes and ephemeral znodes. Additionally, clients can create sequential znodes by setting a sequential flag during creation. Let’s break down these concepts with explanations and examples.
1. Regular Znodes
Description: Regular znodes are persistent. They remain in the ZooKeeper ensemble until explicitly deleted by the client.
- Example:
// Creating a regular znode zk.create("/app1/config", "some_configuration_data".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT);
2. Ephemeral Znodes
- Description: Ephemeral znodes exist only for the duration of the client’s session. If the session ends (either deliberately or due to a failure), the ephemeral znodes are automatically deleted by the system.
- Example:
// Creating an ephemeral znode zk.create("/app1/session", "temporary_data".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL);
3. Sequential Znodes
- Description: When a sequential flag is set during the creation of a znode, ZooKeeper appends a monotonically increasing counter to the znode’s name. This ensures unique names under the same parent znode.
- Example:
// Creating a sequential znode zk.create("/app1/task-", "task_data".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.PERSISTENT_SEQUENTIAL); // The created znode could be /app1/task-0000000001, /app1/task-0000000002, etc.
Watch Mechanism in ZooKeeper
Watches allow clients to receive notifications of changes to znodes without the need for continuous polling. This mechanism is crucial for maintaining synchronization and state updates efficiently.
How Watches Work
- Setting a Watch: When a client performs a read operation with the watch flag set, the server registers a watch for that znode.
- Triggering a Watch: When the watched znode is modified (e.g., data change, creation, or deletion), the server sends a notification to the client.
- One-time Trigger: Watches are one-time triggers. Once a watch is triggered or the client session ends, the watch is unregistered.
- Session Events: Session-related events, such as connection loss, are also sent to watch callbacks to inform the client that watch events might be delayed.
Example of Using Watches
- Setting a Watch:
// Setting a watch on the znode /foo zk.getData("/foo", true, null);
- Handling Watch Events:
// Implementing a watch event handler Watcher watcher = new Watcher() {
@Override public void process(WatchedEvent event) {
if (event.getType() == EventType.NodeDataChanged) {
System.out.println("/foo has been changed");
}
}
}; // Setting the watch with the custom watcher zk.getData("/foo", watcher, null);
Scenario Example: Leader Election
Imagine a scenario where znodes are used for leader election among distributed servers. Each server creates an ephemeral sequential znode under a common parent znode /election
. The server that creates the znode with the lowest sequence number becomes the leader.
- Server A and Server B Start:
// Server A creates an ephemeral sequential znode
String pathA = zk.create("/election/leader-", "A".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
// Example: pathA = /election/leader-0000000001
// Server B creates an ephemeral sequential znode
String pathB = zk.create("/election/leader-", "B".getBytes(), Ids.OPEN_ACL_UNSAFE, CreateMode.EPHEMERAL_SEQUENTIAL);
// Example: pathB = /election/leader-0000000002
Determining the Leader:
- Server A becomes the leader because its znode
/election/leader-0000000001
has the lowest sequence number. - If Server A fails, its ephemeral znode is deleted, and Server B (with the next lowest sequence number) becomes the leader.
ZooKeeper Data Model Explained with Examples
The ZooKeeper data model is designed to facilitate coordination tasks in distributed systems, using a structure that resembles a simplified file system. Here’s a detailed explanation:
Hierarchical Namespace
- Structure: ZooKeeper uses a hierarchical namespace similar to a file system.
- Znodes: These are the fundamental units, analogous to files and directories in a file system, but are not designed for general data storage.
Simplified API
- Operations: The API supports basic operations such as create, delete, read, and write on znodes.
- Full Reads and Writes: ZooKeeper does not support partial reads or writes. Instead, operations involve full data reads and writes.
Key/Value Table with Hierarchical Keys
- Hierarchical Keys: Each znode has a unique path in the hierarchy, functioning as a key.
- Example Paths:
/app1/config
for configuration data of Application 1./app2/status
for status data of Application 2.
Subtrees for Different Applications
- Segmentation: Different parts of the namespace (subtrees) can be allocated to different applications.
- Access Control: Subtrees can have specific access rights.
- Example:
/app1
subtree for Application 1./app2
subtree for Application 2.
Znode Characteristics
- Ephemeral and Regular Znodes: As discussed earlier, znodes can be ephemeral or regular.
- Metadata: Znodes store metadata like timestamps and version counters, useful for tracking changes and implementing conditional updates.
Use Case Example: Group Membership Protocol
- Scenario: Implementing a group membership protocol for Application 1
- Process: Each client process pip_ipi creates a znode pip_ipi under
/app1
. The znode persists as long as the process is running. - Example:
- Process 1 creates
/app1/p_1
. - Process 2 creates
/app1/p_2
.
Storing Metadata for Coordination
- Example: In a leader-based application, the current leader can write its identity to a known znode. New servers can read this znode to find out who the leader is.
- Leader Election:
- Leader writes its identity to
/app1/leader
. - New server reads
/app1/leader
to find the current leader.
Conditional Updates with Metadata
- Version Counters: Znodes have version numbers, allowing clients to perform updates conditionally.
- Example: Update a znode only if its version matches a specific value, ensuring no other client has modified it concurrently.
Sessions in ZooKeeper
Sessions in ZooKeeper are essential for managing client connections and ensuring consistent interaction with the ZooKeeper ensemble. Here’s an explanation of sessions with an example:
What is a Session?
- Initialization: A client connects to a ZooKeeper server and initiates a session.
- Timeout: Each session has an associated timeout period. If the server does not receive any communication from the client within this timeout period, it considers the client faulty.
- End of Session: A session ends either when the client explicitly closes it or when ZooKeeper detects that the client is faulty (e.g., due to network issues or a crash).
Session Characteristics
- State Changes: During a session, the client experiences a sequence of state changes reflecting the execution of its operations.
- Session Persistence: A session allows the client to move seamlessly between different servers in the ZooKeeper ensemble without losing the session state. This persistence ensures continuous and consistent interaction with ZooKeeper, even if the client switches servers.
Example Scenario: Maintaining a Distributed Lock
- Client Initiation:
- Client A connects to ZooKeeper and initiates a session.
- ZooKeeper assigns a unique session ID to Client A and starts the session with a timeout of, say, 10 seconds.
2. Operation Execution:
- Client A creates an ephemeral znode
/lock/app1
to acquire a lock for Application 1. The znode exists as long as the session is active. - Client A performs its operations while holding the lock.
3. Session Timeout:
- If Client A does not send any heartbeat (a periodic signal indicating the client is still active) or any request within 10 seconds, ZooKeeper considers the client faulty.
- ZooKeeper automatically deletes the ephemeral znode
/lock/app1
, releasing the lock for other clients to acquire.
4. Session Failover:
- If Client A needs to switch to another ZooKeeper server (due to a server failure or network issues), it can do so without losing the session.
- Client A reconnects to another server in the ensemble, and the session continues as if nothing happened.
5. Session Closure:
- If Client A explicitly closes the session after completing its tasks, ZooKeeper ends the session and deletes the ephemeral znode
/lock/app1
.
Visual Representation
Here’s a simplified diagram to visualize this process:
Client A initiates session with Server 1:
Client A -------> Server 1
(Session ID: 12345, Timeout: 10s)
Client A acquires lock by creating ephemeral znode:
/lock/app1
Client A performs operations...
Client A sends periodic heartbeats:
Client A -----> Server 1
Client A needs to switch to Server 2:
Client A -------> Server 2
(Session ID: 12345 persists, operations continue)
Client A completes operations and closes session:
Client A -------> Server 2
Session ends, /lock/app1 is deleted
In this example:
- The session ID persists across different servers, allowing seamless failover.
- The ephemeral znode
/lock/app1
exists as long as the session is active and is automatically cleaned up if the session ends due to a timeout or explicit closure.
Benefits
- Fault Tolerance: Ensures that the system remains robust even if individual servers or clients fail.
- Consistency: Maintains a consistent state across operations within a session.
- Seamless Failover: Allows clients to move between servers without losing session state, ensuring uninterrupted service.
By using sessions, ZooKeeper provides a reliable and consistent way to manage client interactions and coordination tasks in a distributed environment.
Next sections will be covered in Part 3