Event Streaming#

This Role is required to enable the Carbonio Active Replica feature, the foundation of High Availability on Carbonio, and is based on Apache’s Kafka and ZooKeeper, which must be installed together on the same Node. For better performances, it is strongly suggested to install both the services on a dedicated Node, otherwise you can opt to directly access a Kafka + ZooKeeper service provided as a Saas solution.

In the remainder, you find the installation and configuration instructions for both software on a node dedicated in the same infrastructure that hosts the other Carbonio Nodes.

It is possible to install multiple Event Streaming Roles in a Carbonio infrastructure: in this case, you need to make sure that the ZooKeeper’s configuration is replicated on every Event Streaming Node.

Installation of Apache ZooKeeper#

Instructions are currently provided for Ubuntu; installation on RHEL should be almost the same, but official directions will follow soon.

Requirements and Preliminary Tasks#

All the installation tasks are done via CLI, so you need some acquaintance with it.

We also assume you are using ZooKeeper version 3.8.3, you can however use other 3.8.x versions: check their availability on the Apache ZooKeeper official page, and remember to replace the version in the directions below.

Moreover, to setup ZooKeeper, you need:

  • a dedicated user, carbonio-queue, shared with Kafka

    # adduser --system --group --home '/var/lib/queue/' 'carbonio-queue'
    
  • a ZooKeeper data directory: /var/lib/queue/zookeeper/

    # mkdir /var/lib/queue/zookeeper/
    

Installation#

# apt update -y
# apt install -y openjdk-17-jre-headless
# wget https://dlcdn.apache.org/zookeeper/zookeeper-3.8.3/apache-zookeeper-3.8.3-bin.tar.gz
# tar -xpvzf apache-zookeeper-3.8.3-bin.tar.gz -C /opt/
# mv /opt/apache-zookeeper-3.8.3-bin /opt/zookeeper
# rm -f apache-zookeeper.tar.gz
# chown carbonio-queue:root -R /opt/zookeeper/
# chown carbonio-queue: -R /var/lib/queue/

Configuration#

Copy the sample configuration into /opt/zookeeper/conf/zoo.cfg and update the dataDir:

# cp /opt/zookeeper/conf/zoo_sample.cfg /opt/zookeeper/conf/zoo.cfg
# sed -i -e 's@dataDir=.*@dataDir=/var/lib/queue/zookeeper/@g' /opt/zookeeper/conf/zoo.cfg

Hint

If you do not feel comfortable with the sed command, you can edit the file manually: find the line starting with dataDir= and replace any path after the = with /var/lib/queue/zookeeper.

Define a ZooKeeper ID#

Each Node in a Carbonio infrastructure must have a unique ZooKeeper ID, which is required for its correct operating.

Define a value for the Node (we use 10) on which ZooKeper is installed and add it to the file

# ZOOKEEPER_ID=10
# echo $ZOOKEEPER_ID > /var/lib/queue/zookeeper/myid

If you install a second and even a third Event Streaming Role, you have then to append at the end of the configuration file /opt/zookeeper/conf/zoo.cfg one entry for every zookeeper node, using the format server.[ZOOKEEPER_ID]=[NODE_IP]:2888:3888

For example, suppose you want to install a three-nodes Event Streaming. You have already you assigned ZOOKEEPER_ID=1O to node IP 10.0.10.11 and you add

  • ID ZOOKEEPER_ID=2O to node IP 10.0.10.12

  • ID ZOOKEEPER_ID=3O to node IP 10.0.10.13

You need to make sure that three entries are added to file /opt/zookeeper/conf/zoo.cfg on every Node:

# echo "server.10=10.0.10.11:2888:3888" >> /opt/zookeeper/conf/zoo.cfg
# echo "server.20=10.0.10.12:2888:3888" >> /opt/zookeeper/conf/zoo.cfg
# echo "server.30=10.0.10.13:2888:3888" >> /opt/zookeeper/conf/zoo.cfg

Create System Unit file#

Copy this snippet to define a minimal zookeeper configuration into file /lib/systemd/system/carbonio-zookeeper.service.

[Unit]
Description=ZooKeeper Service
Documentation=http://zookeeper.apache.org
Requires=network.target
After=network.target

[Service]
Type=forking
Restart=on-failure
RestartSec=15
LimitNOFILE=65536
User=carbonio-queue
Group=carbonio-queue
ExecStart=/opt/zookeeper/bin/zkServer.sh start /opt/zookeeper/conf/zoo.cfg
ExecStop=/opt/zookeeper/bin/zkServer.sh stop /opt/zookeeper/conf/zoo.cfg
ExecReload=/opt/zookeeper/bin/zkServer.sh restart /opt/zookeeper/conf/zoo.cfg
WorkingDirectory=/var/lib/queue/zookeeper/

[Install]
WantedBy=default.target

Enable ZooKeeper service#

# systemctl daemon-reload
# systemctl enable carbonio-zookeeper
# systemctl start carbonio-zookeeper.service

Installation of Apache Kafka#

Requirements and Preliminary Tasks#

We also assume you are using Kafka version 3.1.2, you can however use other 3.1.x versions: check their availability on the Apache Kafka official page, and remember to replace the version in the directions below.

Moreover, to setup ZooKeeper, you need:

  • a dedicated user, carbonio-queue, which is the same used by ZooKeeper

  • Kafka data directory: /var/lib/queue/kafka/logs/

    # mkdir -p /var/lib/queue/kafka/logs
    

Installation#

# wget https://archive.apache.org/dist/kafka/3.1.2/kafka_2.13-3.1.2.tgz
# tar -xpvzf kafka_2.13-3.1.2.tgz  -C /opt/
# mv /opt/kafka_2.13-3.1.2 /opt/kafka
# rm -f kafka_2.13-3.1.2.tgz
# chown carbonio-queue:root -R /var/lib/queue/
# chown carbonio-queue:root -R /opt/kafka/

Configuration#

The default Kafka configuration need to be updated to reflect the correct parameters for broker_id, which must be unique within the infrastructure, log.dirs, and topic in files /opt/kafka/config/server.properties and /opt/kafka/config/producer.properties.

File /opt/kafka/config/server.properties
# sed -i "s@broker_id=.*@$(( $RANDOM % 20 + 1 ))@" /opt/kafka/config/server.properties
# sed -i "s@log.dirs=.*@log.dirs=/var/lib/queue/kafka/logs/@" /opt/kafka/config/server.properties
# echo "auto.create.topics.enable=true" >> /opt/kafka/config/server.properties

If you install Event Streaming on multiple Nodes, it is strongly recommended to also add or update the following parameters.

num.recovery.threads.per.data.dir=4
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
transaction.state.log.min.isr=2
num.partitions=6
default.replication.factor=2
File /opt/kafka/config/producer.properties
# sed -i "s@compression.type=.*@compression.type=lz4@" /opt/kafka/config/producer.properties

Create System Unit File#

Copy this snippet to define a minimal zookeeper configuration into file /lib/systemd/system/carbonio-kafka.service.

[Unit]
Description=Kafka Service
Documentation=http://kafka.apache.org
Requires=network.target
After=network.target

[Service]
Type=simple
Restart=on-failure
RestartSec=15
LimitNOFILE=65536
User=carbonio-queue
Group=carbonio-queue
ExecStart=/opt/kafka/bin/kafka-server-start.sh /opt/kafka/config/server.properties
ExecStop=/opt/kafka/bin/kafka-server-stop.sh /opt/kafka/config/server.properties
WorkingDirectory=/var/lib/queue/kafka/
Environment="KAFKA_HEAP_OPTS=-Xmx1G -Xms1G"

[Install]
WantedBy=default.target

Enable Kafka Service#

# systemctl daemon-reload
# systemctl enable carbonio-kafka
# systemctl start carbonio-kafka.service