Prerequisites
Before starting, ensure all cluster nodes (master and region servers) meet the following requirements:
/etc/hosts if needed).ntp or chrony to keep system clocks in sync.java -version.Step 1: Download and Install HBase
wget https://archive.apache.org/dist/hbase/2.4.9/hbase-2.4.9-bin.tar.gz
tar -xzvf hbase-2.4.9-bin.tar.gz -C /opt
sudo mv /opt/hbase-2.4.9 /usr/local/hbase
sudo chown -R $USER:$USER /usr/local/hbase
Step 2: Configure Environment Variables
Edit the ~/.bashrc file (or /etc/profile for system-wide access) to add HBase environment variables:
export HBASE_HOME=/usr/local/hbase
export PATH=$PATH:$HBASE_HOME/bin
Apply changes immediately:
source ~/.bashrc
Step 3: Configure HBase Core Files
hbase-env.sh (located in $HBASE_HOME/conf):
JAVA_HOME to your JDK path (e.g., /usr/lib/jvm/java-11-openjdk-amd64).export HBASE_MANAGES_ZK=false
hbase-site.xml (critical for cluster setup):<configuration>
<!-- Root directory for HBase data in HDFS -->
<property>
<name>hbase.rootdir</name>
<value>hdfs://namenode:8020/hbase</value> <!-- Replace with your NameNode hostname/IP -->
</property>
<!-- Enable distributed mode -->
<property>
<name>hbase.cluster.distributed</name>
<value>true</value>
</property>
<!-- External ZooKeeper quorum (comma-separated list of ZooKeeper nodes) -->
<property>
<name>hbase.zookeeper.quorum</name>
<value>zookeeper1,zookeeper2,zookeeper3</value> <!-- Replace with your ZooKeeper hostnames/IPs -->
</property>
<!-- Directory for ZooKeeper local data -->
<property>
<name>hbase.zookeeper.property.dataDir</name>
<value>/var/lib/zookeeper</value> <!-- Ensure this directory exists on all ZooKeeper nodes -->
</property>
</configuration>
regionservers (list all region server nodes):$HBASE_HOME/conf/regionservers and add each region server’s hostname (one per line). The master node is not included here by default.Step 4: Start Hadoop and ZooKeeper
Before launching HBase, ensure HDFS and ZooKeeper are running:
hdfs namenode -format # Format HDFS (only needed once)
start-dfs.sh # Start HDFS daemons (NameNode, DataNodes)
start-yarn.sh # Start YARN (if using MapReduce)
zkServer.sh start
Verify ZooKeeper status with zkServer.sh status (ensure at least one node is in “leader” mode).Step 5: Start HBase Cluster
On the HBase master node, execute the following command to start all HBase services:
start-hbase.sh
This script starts the HMaster (manages the cluster) and RegionServers (handle data storage) on their respective nodes.
To verify processes are running, use jps on each node:
HMaster.HRegionServer.Step 6: Validate the Cluster
hbase shell
status
You should see output indicating the number of region servers, HMaster status, and ZooKeeper connection details.create 'test_table', 'cf' # Create a table named 'test_table' with column family 'cf'
put 'test_table', 'row1', 'cf:col1', 'value1' # Insert data
get 'test_table', 'row1' # Retrieve data
Post-Installation Checks
$HBASE_HOME/logs) for errors or warnings.http://<master-node-ip>:16010 (default port) to view cluster metrics.ufw or your firewall tool.Key Notes for Production
hbase.regionserver.handler.count (handler threads), hbase.hregion.memstore.flush.size (flush threshold), and compression (hbase.hregion.compress.algo) based on your hardware and workload.