ubuntu hbase数据存储方式 - 问答

HBase on Ubuntu: Data Storage Architecture and Mechanisms
HBase, a distributed NoSQL database built on Hadoop HDFS, stores data in a column-oriented, scalable, and fault-tolerant manner. On Ubuntu (or any Linux-based system), HBase leverages HDFS as its underlying storage layer, ensuring data durability and high availability through replication. Below is a structured breakdown of its core storage components and workflows:

1. Core Storage Components

HBase’s data storage is organized around four key components, each serving a distinct role in the data lifecycle:

a. HDFS: The Underlying Distributed File System

HBase relies on HDFS (Hadoop Distributed File System) to store all persistent data. Tables, regions, and files are distributed across multiple nodes in the HDFS cluster, providing fault tolerance (via replication) and parallel processing capabilities. For example, in a pseudo-distributed setup, the hbase.rootdir property in hbase-site.xml is configured to point to an HDFS path (e.g., hdfs://localhost:9000/hbase), ensuring all HBase data is stored in HDFS.

b. HFile: The Primary Data Storage Format

HFile is HBase’s binary file format for storing table data. It is optimized for sequential scans and random reads, with features like:

Data Blocks: The basic I/O unit, containing key-value pairs sorted by row key. Each block can be compressed (using formats like Snappy or Gzip) to reduce storage and I/O overhead.
Indexes: Data Block Index (tracks the start key of each block) and Meta Block Index (optional, for user-defined metadata) enable fast lookup of specific rows.
Bloom Filters: Probabilistic data structures that quickly determine if a row key exists in a file, reducing unnecessary disk reads.
HFiles are immutable—once written, they are not modified. New data is appended to new HFiles, and old files are merged periodically (via compaction) to maintain performance.

c. MemStore: In-Memory Write Buffer

Before data is written to HDFS, it is stored in MemStore (an in-memory buffer per column family). MemStore serves two purposes:

Write Optimization: Accepts writes in memory, allowing HBase to return immediately to the client (write-ahead logging ensures durability).
Read Caching: Frequently accessed data is cached in MemStore, reducing disk I/O for read operations.
When MemStore reaches a threshold size (configured via hbase.hregion.memstore.flush.size), its contents are flushed to disk as a new HFile.

d. HLog (Write-Ahead Log): Fault Tolerance Mechanism

HLog (implemented as a Hadoop SequenceFile) records every write operation (puts, deletes) before it is written to MemStore. This ensures data durability—if a RegionServer crashes, the HLog can be replayed to recover lost data. Each HLog entry includes:

A HLogKey (identifying the table, region, and sequence number).
A KeyValue object (the actual data being written).
HLogs are rolled periodically (based on size or time) and cleaned up after successful compaction (when data is safely stored in HFiles).

2. Data Model and Storage Structure

HBase organizes data into a table-based model with the following hierarchy:

a. Table

A table is a collection of rows, split into Regions (horizontal partitions) for scalability. Each Region is managed by a RegionServer and stored on a single node in the HDFS cluster.

b. Row

A row is identified by a unique RowKey (a byte array), which determines the physical storage location of the row. Rows are stored in lexicographical order (sorted by RowKey), enabling efficient range scans.

c. Column Family

Columns are grouped into Column Families (defined at table creation time). Each Column Family is a separate storage unit, with its own compression, caching, and replication settings. For example, a table with Column Families cf1 (user profile) and cf2 (order history) will store data for each family in distinct HFiles.

d. Column Qualifier

Within a Column Family, columns are identified by a Column Qualifier (e.g., cf1:name, cf1:email). This allows dynamic addition of columns without schema changes.

e. Cell

The smallest unit of data, a Cell is a combination of RowKey, Column Family, Column Qualifier, and Timestamp (version). Each cell can store multiple versions of data (sorted in reverse chronological order, with the latest version first). Versions are retained based on policies (e.g., keep last 3 versions) to manage storage usage.

3. Data Write Workflow

When data is written to HBase, it follows a three-step process to ensure durability and performance:

Write to WAL (HLog): The client sends a write request to the RegionServer, which first appends the operation to the HLog (on HDFS). This ensures data is not lost if the RegionServer crashes.
Write to MemStore: The data is then written to the MemStore (in-memory buffer) of the corresponding Column Family.
Flush to HFile: When the MemStore reaches its threshold size, its contents are flushed to disk as a new HFile. The HLog entry is then marked as “safe” and can be deleted during log cleaning.

4. Data Read Workflow

Reading data from HBase involves multiple layers to optimize performance:

Check MemStore: The RegionServer first checks the MemStore for the requested row. If found, it returns the latest version.
Check Block Cache: If not in MemStore, the server checks the Block Cache (a memory cache of recently accessed HFile blocks).
Read from HFile: If the data is not in the cache, the server reads the relevant HFile(s) from HDFS. Bloom Filters are used to skip files that definitely do not contain the row key.
Merge Versions: The server merges data from multiple sources (MemStore, Block Cache, HFiles) and applies any pending compactions to return the latest version of the data.

5. Compaction and Splitting

To maintain performance, HBase performs two critical background processes:

a. Compaction

Compaction merges multiple small HFiles into a single larger HFile. This reduces the number of files on HDFS (improving read performance) and removes deleted or expired data (based on TTL or version policies). There are two types of compaction:

Minor Compaction: Merges small HFiles into larger ones (without removing data).
Major Compaction: Merges all HFiles in a Region and removes all deleted/expired data.

b. Splitting

When a Region grows too large (exceeding hbase.hregion.max.filesize), it is split into two smaller Regions. Each new Region contains roughly half of the original data and is assigned to a different RegionServer (to distribute load). Splitting is triggered automatically but can also be initiated manually.

This architecture enables HBase to handle petabytes of data with low-latency reads/writes, making it suitable for use cases like real-time analytics, IoT data storage, and large-scale key-value lookups.

0 赞

0 踩