on the brokers. The following shows the default parameters provided for load balancers in the If your cluster is running on a local host, the default for --bootstrap-server is localhost:9092. throughput is used for reassignment traffic or if too much data is being written kafka.server:type=BrokerTopicMetrics,name=ReplicationBytesInPerSec and Set Self-Balancing to rebalance on any uneven load (including a change in available brokers): Set Self-Balancing to rebalance only when brokers are added or removed: Use confluent.balancer.throttle.bytes.per.second to set a custom throttle for maximum network bandwidth available for Self-Balancing or to remove a custom throttle. each clusters workload and hardware capabilities are different, it is difficult to After the design comes testing. The Kafka bootstrap load Replication factors can never be greater than the total number of brokers (regardless of Self-Balancing). Replica counts and disk usage normalize across the cluster, while network utilization is still not completely consistent across the cluster. Specifies a maximum number of replicas allowed per broker. Maybe the traffic on a cluster has (gasp!) using the parameter described in Component load balancer configuration.
The solution is to retry the broker removal after Self-Balancing is initialized. When you enable Self-Balancing on a running cluster, or start a cluster with Self-Balancing
If you want to use Control Center with Self-Balancing for Configuration and Monitoring, you The following shows an example of the default services deployed for Confluent Control Center: To add external access to Confluent Control Center after installation, you need to update the Confluent Control Center Confluent Operator and Confluent Platform cluster. Additionally, Self-Balancing requires metrics on cluster performance from the Confluent Telemetry Reporter, which is enabled by default Self-Balancing Clusters (SBC) are built to make cluster scaling as automatic and unobtrusive as possible. connecting to other components. Specifies a property file containing configurations to be passed to the Admin Client. When the cluster detects a load imbalance or broker overload, Self-Balancing Clusters compute a reassignment plan to adjust the partition layout and execute the plan. Once Self-Balancing has The example below shows the upgrade command to add an external not be accessible from Confluent Control Center. Because distributions, this file is typically located at /etc/hosts). You can specify multiple topics in a comma-separated list. Self-Balancing Clusters management tools enable dynamic and elastic Confluent Platform deployments, letting you scale your clusters in response to changing loads rather than always planning for the worst case. We started SBC by building upon Kafkas existing, production-validated metrics and partition reassignment mechanisms to monitor the cluster and move data. Since rebalancing cluster load involves moving data around the cluster, its essential that Self-Balancing Clusters always take great care to protect that data. Please report any inaccuracies on this page or suggest an edit. to
The solution is to wait for Self-Balancing to initialize (about 30 minutes), and retry the broker removal. The example below shows the upgrade command to add an external
For information about Kubernetes load balancer annotations for AWS, see Load Balancers. Disabling Self-Balancing Clusters will automatically cancel any ongoing reassignments. by the example in Kafka access. All other trademarks, servicemarks, and copyrights are the property of their respective owners. The following snippet shows the default (and unmodified) load balancer At a minimum, you will need the following configurations. The cluster will have under-replicated partitions temporarily while a broker is being removed. automatically configures the following provider-specific annotations at Following are example outputs for this command in various scenarios. If you are using Self-Balancing in combination with Multi-Region Clusters , you must also specify the rack location for each broker with Nightly runs of our system test framework do full end-to-end validation of rebalancing scenarios on real Kafka clusters. only as necessary. in OpenShift deployments. internal, the installation automatically creates an internal load balancer Many thanks to Gwen Shapira, Victoria Bialas, Stanislav Kozlovski, Vikas Singh, Bob Barrett, Aishwarya Gune, David Mao, and Javier Redondo for their advice and feedback on this post, and to the SBC engineers for helping build a feature that was so easy to write about. visibility into whether a goal violation for workload distribution has been met Using either Confluent Control Center or the new kafka-remove-brokers command, SBC will shut down Kafka on the old broker and ensure that all replicas on that broker are migrated away. balancer takes the name of the component, which defaults to kafka. to disk as partitions are reassigned. No-DNS access is for development and testing purposes only and should not be This starting value is a conservative one, provider.yaml file. SBC monitors a set of the same metrics that you are probably already watching: Self-Balancing Clusters dont just consider metric equality. Do not disable Self-Balancing while an add or remove broker operation is in progress; wait until the add or remove completes. entries recognized by your provider environment). If a new add broker request is received while another add broker task is in progress, Self-Balancing will merge the new request with the in-progress task. You can enable and disable both automatic load rebalancing and Self-Balancing Clusters dynamically should the need arise. Try it free today. The cluster also may have under-replicated partitions if a broker removal fails due to insufficient metrics. And of course, load changes. You can change the trigger condition for Self-Balancing while the cluster is running. But its not just reassignment that has a cost; the act of measuring the cluster and deciding if its in or out of balance consumes resources as well. Sometimes the brokers take the decision out of your hands and fail on their ownusually at 3:00 a.m. Dont worry about the early-morning page, though; if youve set confluent.balancer.heal.broker.failure.threshold.ms (it defaults to one hour), Self-Balancing Clusters detect the broker failure and, after that threshold timeout, automatically migrate replicated partitions off the failed broker. If your cluster needs to grow, just start up the new broker(s). configuration. load balancer: You access Schema Registry using the load balancer DNS/port as shown in the example below: Enable access to KSQL by updating the loadBalancer parameters as shown The recent release of Confluent Cloud and Confluent Platform 7.0 introduced the ability to easily remove Apache Kafka brokers and shrink your Confluent Server cluster with just a single command. A reassignment that balances network load and makes the cluster less fault tolerant or one that overloads a broker is clearly a bad reassignment.
Once the external load balancers are created, you add a DNS entry associated ## External will create public facing endpoints, setting this to internal will, ## create a private-facing ELB with VPC peering, ## Domain name will configure in Kafka's external listener, ## If configured the bootstrap fqdn will be .bootstrapPrefix.domain (dots are not supported in the prefix), ## If not the bootstrapPrefix will be .name.domain, ## If not configured, the default value will be 'b' appended to the domain name as prefix (dots are not supported in the prefix), ## Create a LoadBalancer for external networking, ## Add other annotations here that you want on the ELB, ## If external access is enable, the FQDN must be provided, ## If prefix is configured, external DNS name is configured as
