Apache Helix – Tutorial - Customizing Heath Checks

In this chapter, we'll learn how to customize health checks based on metrics of your distributed system.

Health Checks

Note: this in currently in development mode, not yet ready for production.

Helix provides the ability for each node in the system to report health metrics on a periodic basis.

Helix supports multiple ways to aggregate these metrics:

SUM
AVG
EXPONENTIAL DECAY
WINDOW

Helix persists the aggregated value only.

Applications can define a threshold on the aggregate values according to the SLAs, and when the SLA is violated Helix will fire an alert. Currently Helix only fires an alert, but in a future release we plan to use these metrics to either mark the node dead or load balance the partitions. This feature will be valuable for distributed systems that support multi-tenancy and have a large variation in work load patterns. In addition, this can be used to detect skewed partitions (hotspots) and rebalance the cluster.

Helix Tutorial: Customizing Health Checks

Health Checks