The placement of partitions in a distributed system is essential for the reliability and scalability of the system. For example, when a node fails, it is important that the partitions hosted on that node are reallocated evenly among the remaining nodes. Consistent hashing is one such algorithm that can satisfy this guarantee. Helix provides a variant of consistent hashing based on the RUSH algorithm, among others.

This means given a number of partitions, replicas and number of nodes, Helix does the automatic assignment of partition to nodes such that:

  • Each node has the same number of partitions
  • Replicas of the same partition do not stay on the same node
  • When a node fails, the partitions will be equally distributed among the remaining nodes
  • When new nodes are added, the number of partitions moved will be minimized along with satisfying the above criteria

Helix employs a rebalancing algorithm to compute the ideal state of the system. When the current state differs from the ideal state, Helix uses it as the target state of the system and computes the appropriate transitions needed to bring it to the ideal state.

Helix makes it easy to perform this operation, while giving you control over the algorithm. In this section, we'll see how to implement the desired behavior.

Helix has four options for rebalancing, in increasing order of customization by the system builder:

  • FULL_AUTO
  • SEMI_AUTO
  • CUSTOMIZED
  • USER_DEFINED
            |FULL_AUTO     |  SEMI_AUTO | CUSTOMIZED|  USER_DEFINED  |
            ---------------------------------------------------------|
   LOCATION | HELIX        |  APP       |  APP      |      APP       |
            ---------------------------------------------------------|
      STATE | HELIX        |  HELIX     |  APP      |      APP       |
            ----------------------------------------------------------

FULL_AUTO

When the rebalance mode is set to FULL_AUTO, Helix controls both the location of the replica along with the state. This option is useful for applications where creation of a replica is not expensive.

For example, consider this system that uses a LeaderStandby state model, with 3 partitions and 2 replicas in the ideal state.

{
  "id" : "MyResource",
  "simpleFields" : {
    "REBALANCE_MODE" : "FULL_AUTO",
    "NUM_PARTITIONS" : "3",
    "REPLICAS" : "2",
    "STATE_MODEL_DEF_REF" : "LeaderStandby",
  }
  "listFields" : {
    "MyResource_0" : [],
    "MyResource_1" : [],
    "MyResource_2" : []
  },
  "mapFields" : {
  }
}

If there are 3 nodes in the cluster, then Helix will balance the leaders and standbys equally. The ideal state is therefore:

{
  "id" : "MyResource",
  "simpleFields" : {
    "NUM_PARTITIONS" : "3",
    "REPLICAS" : "2",
    "STATE_MODEL_DEF_REF" : "LeaderStandby",
  },
  "mapFields" : {
    "MyResource_0" : {
      "N1" : "LEADER",
      "N2" : "STANDBY",
    },
    "MyResource_1" : {
      "N2" : "LEADER",
      "N3" : "STANDBY",
    },
    "MyResource_2" : {
      "N3" : "LEADER",
      "N1" : "STANDBY",
    }
  }
}

Another typical example is evenly distributing a group of tasks among the currently healthy processes. For example, if there are 60 tasks and 4 nodes, Helix assigns 15 tasks to each node. When one node fails, Helix redistributes its 15 tasks to the remaining 3 nodes, resulting in a balanced 20 tasks per node. Similarly, if a node is added, Helix re-allocates 3 tasks from each of the 4 nodes to the 5th node, resulting in a balanced distribution of 12 tasks per node..

SEMI_AUTO

When the application needs to control the placement of the replicas, use the SEMI_AUTO rebalance mode.

Example: In the ideal state below, the partition 'MyResource_0' is constrained to be placed only on node1 or node2. The choice of state is still controlled by Helix. That means MyResource_0.LEADER could be on node1 and MyResource_0.STANDBY on node2, or vice-versa but neither would be placed on node3.

{
  "id" : "MyResource",
  "simpleFields" : {
    "REBALANCE_MODE" : "SEMI_AUTO",
    "NUM_PARTITIONS" : "3",
    "REPLICAS" : "2",
    "STATE_MODEL_DEF_REF" : "LeaderStandby",
  }
  "listFields" : {
    "MyResource_0" : [node1, node2],
    "MyResource_1" : [node2, node3],
    "MyResource_2" : [node3, node1]
  },
  "mapFields" : {
  }
}

The LeaderStandby state model requires that a partition has exactly one LEADER at all times, and the other replicas should be STANDBYs. In this simple example with 2 replicas per partition, there would be one LEADER and one STANDBY. Upon failover, a STANDBY has to assume leadership, and a new STANDBY will be generated.

In this mode when node1 fails, unlike in FULL_AUTO mode the partition is not moved from node1 to node3. Instead, Helix will decide to change the state of MyResource_0 on node2 from STANDBY to LEADER, based on the system constraints.

CUSTOMIZED

Helix offers a third mode called CUSTOMIZED, in which the application controls the placement and state of each replica. The application needs to implement a callback interface that Helix invokes when the cluster state changes. Within this callback, the application can recompute the idealstate. Helix will then issue appropriate transitions such that Idealstate and Currentstate converges.

Here's an example, again with 3 partitions, 2 replicas per partition, and the LeaderStandby state model:

{
  "id" : "MyResource",
  "simpleFields" : {
    "REBALANCE_MODE" : "CUSTOMIZED",
    "NUM_PARTITIONS" : "3",
    "REPLICAS" : "2",
    "STATE_MODEL_DEF_REF" : "LeaderStandby",
  },
  "mapFields" : {
    "MyResource_0" : {
      "N1" : "LEADER",
      "N2" : "STANDBY",
    },
    "MyResource_1" : {
      "N2" : "LEADER",
      "N3" : "STANDBY",
    },
    "MyResource_2" : {
      "N3" : "LEADER",
      "N1" : "STANDBY",
    }
  }
}

Suppose the current state of the system is ‘MyResource_0’ -> {N1:LEADER, N2:STANDBY} and the application changes the ideal state to ‘MyResource_0’ -> {N1:STANDBY,N2:LEADER}. While the application decides which node is LEADER and which is STANDBY, Helix will not blindly issue LEADER-->STANDBY to N1 and STANDBY-->LEADER to N2 in parallel, since that might result in a transient state where both N1 and N2 are leaders, which violates the LeaderStandby constraint that there is exactly one LEADER at a time. Helix will first issue LEADER-->STANDBY to N1 and after it is completed, it will issue STANDBY-->LEADER to N2.

USER_DEFINED

For maximum flexibility, Helix exposes an interface that can allow applications to plug in custom rebalancing logic. By providing the name of a class that implements the Rebalancer interface, Helix will automatically call the contained method whenever there is a change to the live participants in the cluster. For more, see User-Defined Rebalancer.

Backwards Compatibility

In previous versions, FULL_AUTO was called AUTO_REBALANCE and SEMI_AUTO was called AUTO. Furthermore, they were presented as the IDEAL_STATE_MODE. Helix supports both IDEAL_STATE_MODE and REBALANCE_MODE, but IDEAL_STATE_MODE is now deprecated and may be phased out in future versions.

Back to top

Reflow Maven skin maintained by Olivier Lamy.

Apache Helix, Apache, the Apache feather logo, and the Apache Helix project logos are trademarks of The Apache Software Foundation. All other marks mentioned may be trademarks or registered trademarks of their respective owners.
Privacy Policy