Archive for the ‘HDFS’ Tag

HBase Major Compaction

This is in continuation to my last two posts:

Each HBase Table has

  • 1 or More Column-families – that group columns and specify the physical layout of data storage
  • 1 or More Regions – that are akin to Shards (in the RDBMS world) i.e. A set of rows belonging to a table specified by its StartKey and EndKey

For every Column-family of a table in a region we have a Store which has

  • 1 MemStore – a buffer that holds in-memory modifications (till it is flushed to store files)
  • 0 or More Store files (HFiles) – that get created when MemStore fills up.

These store files are immutable and HBase creates a new file on every MemStore flush i.e. it does not write to an existing HFile.

Compaction combines all these Store files for a Region into fewer Store files to optimize performance. There are two types of compaction.

  • Minor Compaction – combines several Store files into fewer Store files
  • Major Compaction – reads all the Store files for a Region and writes to a single Store file.

Let us see how Major Compaction impacts HBase storage.

Create a table and insert data.


hbase(main):021:0> create 'users','info'
0 row(s) in 1.0540 seconds

hbase(main):022:0> list
TABLE
tbl1
users
2 row(s) in 0.0160 seconds

hbase(main):023:0> put 'users','abhi','info:name','abhishek'
0 row(s) in 0.0730 seconds

hbase(main):024:0> put 'users','abhi','info:age','30'
0 row(s) in 0.0120 seconds

Let us browse the HBase Root Directory and see how the data gets persisted physically on the filesystem.


abhi@hbase2:~$ ls -ltha /opt/hbase/data/
total 48K
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 users
drwxr-xr-x 8 hbase users 4.0K Nov  3 14:50 .
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 07:43 tbl1
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 05:35 .oldlogs
drwxrwxr-x 3 hbase hbase 4.0K Nov  3 05:34 .logs
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 -ROOT-
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .META.
-rwxr-xr-x 1 hbase hbase   38 Oct 30 12:00 hbase.id
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.id.crc
-rwxr-xr-x 1 hbase hbase    3 Oct 30 12:00 hbase.version
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.version.crc
drwxr-xr-x 3 abhi  users 4.0K Oct 11 08:10 ..
abhi@hbase2:~$
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 6dda0024cbf8619a9c823e6ebbf78888
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 .
-rwxr-xr-x 1 hbase hbase  515 Nov  3 14:50 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:50 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:50 .tmp
drwxr-xr-x 8 hbase users 4.0K Nov  3 14:50 ..
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 .
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:50 .oldlogs
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:50 info
-rwxr-xr-x 1 hbase hbase  222 Nov  3 14:50 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Nov  3 14:50 ..regioninfo.crc
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 ..
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 8.0K
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 ..
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:50 .

As you can see above, HBase created

  • a directory ‘users’ for the table and under it
  • a sub-directory ‘6dda0024cbf8619a9c823e6ebbf78888’ for the Region and under it
  • a sub-directory ‘info’ for the Column-family

All modifications to table/region columns that belong to the ‘info’ column-family get stored as store files under ‘/opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/’

Although we entered data in the table but we don’t see any store files as all the data is currently in MemStore and has not been flushed yet. So let us flush the memstore and view the contents of the ‘info’ directory.


hbase(main):025:0> flush 'users'
0 row(s) in 0.0390 seconds

abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 16K
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:52 .
drwxrwxr-x 5 hbase hbase 4.0K Nov  3 14:52 ..
-rwxrwxrwx 1 hbase hbase  660 Nov  3 14:52 32f19d12583a46b98211ee77311f48eb
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .32f19d12583a46b98211ee77311f48eb.crc

Notice how the store file /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/32f19d12583a46b98211ee77311f48eb got created. Let us add few more data to our table and view the filesystem.


hbase(main):026:0> put 'users','avi','info:name','avinash'
0 row(s) in 0.0050 seconds

hbase(main):027:0> flush 'users'
0 row(s) in 0.0490 seconds
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 24K
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:52 .
-rwxrwxrwx 1 hbase hbase  623 Nov  3 14:52 ecc5f02da6234ac397d25bee6df0d019
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .ecc5f02da6234ac397d25bee6df0d019.crc
drwxrwxr-x 5 hbase hbase 4.0K Nov  3 14:52 ..
-rwxrwxrwx 1 hbase hbase  660 Nov  3 14:52 32f19d12583a46b98211ee77311f48eb
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .32f19d12583a46b98211ee77311f48eb.crc

Let us add some more data..

hbase(main):028:0> put 'users','avi','info:age','20'
0 row(s) in 0.0040 seconds

hbase(main):029:0> flush 'users'
0 row(s) in 0.1040 seconds
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 32K
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:53 .
-rwxrwxrwx 1 hbase hbase  615 Nov  3 14:53 ebda0cc0af9a4d9e803a10cce27c52b6
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:53 .ebda0cc0af9a4d9e803a10cce27c52b6.crc
-rwxrwxrwx 1 hbase hbase  623 Nov  3 14:52 ecc5f02da6234ac397d25bee6df0d019
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .ecc5f02da6234ac397d25bee6df0d019.crc
drwxrwxr-x 5 hbase hbase 4.0K Nov  3 14:52 ..
-rwxrwxrwx 1 hbase hbase  660 Nov  3 14:52 32f19d12583a46b98211ee77311f48eb
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .32f19d12583a46b98211ee77311f48eb.crc
abhi@hbase2:~$

Notice how for each flush, a new store file gets created. Let us view the contents of these store files.

abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/ebda0cc0af9a4d9e803a10cce27c52b6 -p
12/11/03 14:55:59 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:55:59 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:56:00 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: avi/info:age/1351979593884/Put/vlen=2 V: 20
Scanned kv count -> 1
abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/ecc5f02da6234ac397d25bee6df0d019 -p
12/11/03 14:56:19 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:56:19 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:56:20 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: avi/info:name/1351979559394/Put/vlen=7 V: avinash
Scanned kv count -> 1
abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/32f19d12583a46b98211ee77311f48eb -p
12/11/03 14:56:31 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:56:31 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:56:31 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: abhi/info:age/1351979477099/Put/vlen=2 V: 30
K: abhi/info:name/1351979467158/Put/vlen=8 V: abhishek
Scanned kv count -> 2
abhi@hbase2:~$

An alternate method to view the store file contents..

abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile --printkv --file /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/ebda0cc0af9a4d9e803a10cce27c52b6
12/11/03 14:56:57 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:56:57 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:56:58 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: avi/info:age/1351979593884/Put/vlen=2 V: 20
Scanned kv count -> 1
abhi@hbase2:~$

Let us invoke Major Compaction to combine these files into a single new file.

hbase(main):030:0> major_compact 'users'
0 row(s) in 0.1000 seconds

hbase(main):031:0>
abhi@hbase2:~$
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 16K
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:57 .
-rwxrwxrwx 1 hbase hbase  731 Nov  3 14:57 6a65463fa2814751b255fdcf1542cd0d
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:57 .6a65463fa2814751b255fdcf1542cd0d.crc
drwxrwxr-x 5 hbase hbase 4.0K Nov  3 14:52 ..
abhi@hbase2:~$

Let us view the contents of the new file that got created as a result of major compaction.

abhi@hbase2:~$
abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/6a65463fa2814751b255fdcf1542cd0d -p          12/11/03 14:58:23 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:58:23 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:58:23 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: abhi/info:age/1351979477099/Put/vlen=2 V: 30
K: abhi/info:name/1351979467158/Put/vlen=8 V: abhishek
K: avi/info:age/1351979593884/Put/vlen=2 V: 20
K: avi/info:name/1351979559394/Put/vlen=7 V: avinash
Scanned kv count -> 4
abhi@hbase2:~$
abhi@hbase2:~$

Understanding HBase files and directories

This is in continuation to my last post – Getting started with HBase.

HBase physically stores data in the specified Root Directory on the filesystem. The filesystem is typically HDFS but since I have installed HBase in the stand-alone mode, I am using the local filesystem.

Now lets examine the contents of our HBase Root Directory.

Note: Always do a flush on your tables so that the data gets written as files in your filesystem.


abhi@hbase2:~$ ls -ltha /opt/hbase/data/
total 48K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .oldlogs
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .logs
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 table1
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 users
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 -ROOT-
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .META.
-rwxr-xr-x 1 hbase hbase   38 Oct 30 12:00 hbase.id
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.id.crc
-rwxr-xr-x 1 hbase hbase    3 Oct 30 12:00 hbase.version
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.version.crc
drwxr-xr-x 3 abhi  users 4.0K Oct 11 08:10 ..
abhi@hbase2:~$
abhi@hbase2:~$
abhi@hbase2:~$ ls -lthaR /opt/hbase/data/
/opt/hbase/data/:
total 48K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .oldlogs
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .logs
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 table1
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 users
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 -ROOT-
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .META.
-rwxr-xr-x 1 hbase hbase   38 Oct 30 12:00 hbase.id
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.id.crc
-rwxr-xr-x 1 hbase hbase    3 Oct 30 12:00 hbase.version
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.version.crc
drwxr-xr-x 3 abhi  users 4.0K Oct 11 08:10 ..

/opt/hbase/data/.oldlogs:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..

/opt/hbase/data/.logs:
total 12K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:31 hbase2,54165,1351719115872
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..

/opt/hbase/data/.logs/hbase2,54165,1351719115872:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:31 .
-rwxr-xr-x 1 hbase hbase    0 Oct 31 14:31 .hbase2%2C54165%2C1351719115872.1351719119755.crc
-rwxr-xr-x 1 hbase hbase    0 Oct 31 14:31 hbase2%2C54165%2C1351719115872.1351719119755
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 ..

/opt/hbase/data/table1:
total 24K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 9a35a1636b9d0639e2838c5a8ff180cf
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
-rwxr-xr-x 1 hbase hbase  935 Oct 30 17:10 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:10 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .tmp

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf:
total 28K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 cf2
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 cf1
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .oldlogs
-rwxr-xr-x 1 hbase hbase  225 Oct 30 17:10 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 17:10 ..regioninfo.crc
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 ..

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/cf2:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  790 Oct 30 17:35 cbca7d4b4619453e95e313e54fd12649
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .cbca7d4b4619453e95e313e54fd12649.crc

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/cf1:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  848 Oct 30 17:35 c11f3c3fe30e437c907e7b4656bbb6a8
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .c11f3c3fe30e437c907e7b4656bbb6a8.crc

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/.oldlogs:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 17:10 hlog.1351642227144
-rwxr-xr-x 1 hbase hbase   12 Oct 30 17:10 .hlog.1351642227144.crc

/opt/hbase/data/table1/.tmp:
total 8.0K
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .

/opt/hbase/data/users:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ecff3a77396cba69adea1b1f789ca5a2
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 .
-rwxr-xr-x 1 hbase hbase  515 Oct 30 15:36 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 15:36 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .tmp

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .oldlogs
-rwxr-xr-x 1 hbase hbase  222 Oct 30 15:36 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 15:36 ..regioninfo.crc
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 ..

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase 1.3K Oct 30 17:35 4080f890ac4449a2a151d5c4d79f8579
-rw-rw-r-- 1 hbase hbase   20 Oct 30 17:35 .4080f890ac4449a2a151d5c4d79f8579.crc

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 15:36 hlog.1351636576499
-rwxr-xr-x 1 hbase hbase   12 Oct 30 15:36 .hlog.1351636576499.crc

/opt/hbase/data/users/.tmp:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 ..

/opt/hbase/data/-ROOT-:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 70236052
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  551 Oct 30 12:00 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:00 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .tmp

/opt/hbase/data/-ROOT-/70236052:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .oldlogs
-rwxr-xr-x 1 hbase hbase  109 Oct 30 12:00 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 ..regioninfo.crc

/opt/hbase/data/-ROOT-/70236052/info:
total 32K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  718 Oct 30 17:35 fb48fa0302be4d37a5b70ffbf039fe9a
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .fb48fa0302be4d37a5b70ffbf039fe9a.crc
-rwxrwxrwx 1 hbase hbase  718 Oct 30 12:11 a913edee0ac34de490c46ee12175dc02
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:11 .a913edee0ac34de490c46ee12175dc02.crc
-rwxrwxrwx 1 hbase hbase  714 Oct 30 12:00 c6f09dc3ee6a4150b8e787a747a81707
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:00 .c6f09dc3ee6a4150b8e787a747a81707.crc

/opt/hbase/data/-ROOT-/70236052/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  411 Oct 30 12:00 hlog.1351623609149
-rwxr-xr-x 1 hbase hbase   12 Oct 30 12:00 .hlog.1351623609149.crc

/opt/hbase/data/-ROOT-/.tmp:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 ..

/opt/hbase/data/.META.:
total 12K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 1028785192
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .

/opt/hbase/data/.META./1028785192:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .oldlogs
-rwxr-xr-x 1 hbase hbase  111 Oct 30 12:00 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 ..regioninfo.crc
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 ..

/opt/hbase/data/.META./1028785192/info:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase 2.8K Oct 30 17:35 de44bdf76ce6477ba3a7da1df0b159df
-rw-rw-r-- 1 hbase hbase   32 Oct 30 17:35 .de44bdf76ce6477ba3a7da1df0b159df.crc

/opt/hbase/data/.META./1028785192/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 12:00 hlog.1351623609390
-rwxr-xr-x 1 hbase hbase   12 Oct 30 12:00 .hlog.1351623609390.crc
abhi@hbase2:~$

In the above:
.logs and .oldlogs – contain the Write-Ahead Log (WAL) files that are shared by all regions from that region server

  • The .logs directory has a subdirectory for each RegionServer e.g. /opt/hbase/data/.logs/hbase2,54165,1351719115872.
    RegionServer subdirectory name is of the format [RegionServer Host], [Port], [Server Start Code]

    In each RegionServer subdirectory, there are the HLog files. You can view the contents of a HLog file using the org.apache.hadoop.hbase.regionserver.wal.HLog tool.

    abhi@hbase2:~$ hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump /opt/hbase/data/.logs/hbase2,54165,1351719115872/hbase2%2C54165%2C1351719115872.1351719119755
    12/11/05 13:45:37 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
    12/11/05 13:45:37 INFO wal.SequenceFileLogReader: Input stream class: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker, not adjusting length
    Sequence 618 from region 70236052 in table -ROOT-
      Action:
        row: .META.,,1
        column: info:server
        at time: Mon Nov 05 02:01:20 PST 2012
      Action:
        row: .META.,,1
        column: info:serverstartcode
        at time: Mon Nov 05 02:01:20 PST 2012
    Sequence 619 from region 1028785192 in table .META.
      Action:
        row: tbl1,,1351953259243.5ab545fc59596f7784eb179df4654930.
        column: info:server
        at time: Mon Nov 05 02:01:21 PST 2012
      Action:
        row: tbl1,,1351953259243.5ab545fc59596f7784eb179df4654930.
        column: info:serverstartcode
        at time: Mon Nov 05 02:01:21 PST 2012
    
    ... ... ... ... ...
    
    Sequence 637 from region 5ab545fc59596f7784eb179df4654930 in table tbl1
      Action:
        row: row3
        column: cf2:col2
        at time: Mon Nov 05 03:00:53 PST 2012
    Sequence 638 from region 5ab545fc59596f7784eb179df4654930 in table tbl1
      Action:
        row: row3
        column: cf3:col1
        at time: Mon Nov 05 03:00:54 PST 2012
    abhi@hbase2:~$
    
    
  • The .oldlogs directory contains all the old logfiles i.e. the ones that are already stored in the store files

-ROOT- and .META. – contain the files related to the catalog tables
hbase.id and hbase.version – hold the unique ID of the cluster, and the file format respectively

users and table1 – hold the store files for the user-defined tables. Each table has its own directory which has the following contents:

  • .tableinfo file that contains the table and column family schemas
    /opt/hbase/data/users/.tableinfo.0000000001
    You can view the contents of the file as follows:

    abhi@hbase2:~$ cat /opt/hbase/data/users/.tableinfo.0000000001
    MIN_VERSIONS0TTL
    2147483647      BLOCKSIZE65536  IN_MEMORYfalse
    BLOCKCACHEtrue
    
    {NAME => 'users', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
    abhi@hbase2:~$
    
  • Region directories – a directory for each region of a table. The directory name is MD5 hash of the region name
    /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2

      A Region directory contains

    • .regioninfo – file that contains serialized information of a Region
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2
    • Column-Family directories – a directory for each Column-Family that holds the actual storage file of a table
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info

      The store files are in HFile format.
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info/4080f890ac4449a2a151d5c4d79f8579

      The figure below describes the path to the actual storage file.

      Path to the Actual Storage File

      NOTE: If you have inserted data in your table and yet you don’t see any storage files under the column-family directory, do a flush on your table i.e. flush 'users'

      We can view the contents of a store file using the org.apache.hadoop.hbase.io.hfile.HFile tool.

      abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info/4080f890ac4449a2a151d5c4d79f8579 -p
      12/10/31 17:02:04 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
      12/10/31 17:02:04 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
      12/10/31 17:02:04 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
      K: abhi/info:age/1351726630581/Put/vlen=2 V: 30
      K: abhi/info:name/1351726623818/Put/vlen=8 V: abhishek
      Scanned kv count -> 2
      abhi@hbase2:~$