Understanding HBase files and directories

This is in continuation to my last post – Getting started with HBase.

HBase physically stores data in the specified Root Directory on the filesystem. The filesystem is typically HDFS but since I have installed HBase in the stand-alone mode, I am using the local filesystem.

Now lets examine the contents of our HBase Root Directory.

Note: Always do a flush on your tables so that the data gets written as files in your filesystem.


abhi@hbase2:~$ ls -ltha /opt/hbase/data/
total 48K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .oldlogs
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .logs
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 table1
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 users
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 -ROOT-
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .META.
-rwxr-xr-x 1 hbase hbase   38 Oct 30 12:00 hbase.id
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.id.crc
-rwxr-xr-x 1 hbase hbase    3 Oct 30 12:00 hbase.version
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.version.crc
drwxr-xr-x 3 abhi  users 4.0K Oct 11 08:10 ..
abhi@hbase2:~$
abhi@hbase2:~$
abhi@hbase2:~$ ls -lthaR /opt/hbase/data/
/opt/hbase/data/:
total 48K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .oldlogs
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .logs
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 table1
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 users
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 -ROOT-
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .META.
-rwxr-xr-x 1 hbase hbase   38 Oct 30 12:00 hbase.id
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.id.crc
-rwxr-xr-x 1 hbase hbase    3 Oct 30 12:00 hbase.version
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.version.crc
drwxr-xr-x 3 abhi  users 4.0K Oct 11 08:10 ..

/opt/hbase/data/.oldlogs:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..

/opt/hbase/data/.logs:
total 12K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:31 hbase2,54165,1351719115872
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..

/opt/hbase/data/.logs/hbase2,54165,1351719115872:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:31 .
-rwxr-xr-x 1 hbase hbase    0 Oct 31 14:31 .hbase2%2C54165%2C1351719115872.1351719119755.crc
-rwxr-xr-x 1 hbase hbase    0 Oct 31 14:31 hbase2%2C54165%2C1351719115872.1351719119755
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 ..

/opt/hbase/data/table1:
total 24K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 9a35a1636b9d0639e2838c5a8ff180cf
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
-rwxr-xr-x 1 hbase hbase  935 Oct 30 17:10 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:10 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .tmp

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf:
total 28K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 cf2
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 cf1
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .oldlogs
-rwxr-xr-x 1 hbase hbase  225 Oct 30 17:10 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 17:10 ..regioninfo.crc
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 ..

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/cf2:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  790 Oct 30 17:35 cbca7d4b4619453e95e313e54fd12649
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .cbca7d4b4619453e95e313e54fd12649.crc

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/cf1:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  848 Oct 30 17:35 c11f3c3fe30e437c907e7b4656bbb6a8
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .c11f3c3fe30e437c907e7b4656bbb6a8.crc

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/.oldlogs:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 17:10 hlog.1351642227144
-rwxr-xr-x 1 hbase hbase   12 Oct 30 17:10 .hlog.1351642227144.crc

/opt/hbase/data/table1/.tmp:
total 8.0K
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .

/opt/hbase/data/users:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ecff3a77396cba69adea1b1f789ca5a2
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 .
-rwxr-xr-x 1 hbase hbase  515 Oct 30 15:36 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 15:36 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .tmp

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .oldlogs
-rwxr-xr-x 1 hbase hbase  222 Oct 30 15:36 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 15:36 ..regioninfo.crc
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 ..

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase 1.3K Oct 30 17:35 4080f890ac4449a2a151d5c4d79f8579
-rw-rw-r-- 1 hbase hbase   20 Oct 30 17:35 .4080f890ac4449a2a151d5c4d79f8579.crc

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 15:36 hlog.1351636576499
-rwxr-xr-x 1 hbase hbase   12 Oct 30 15:36 .hlog.1351636576499.crc

/opt/hbase/data/users/.tmp:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 ..

/opt/hbase/data/-ROOT-:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 70236052
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  551 Oct 30 12:00 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:00 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .tmp

/opt/hbase/data/-ROOT-/70236052:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .oldlogs
-rwxr-xr-x 1 hbase hbase  109 Oct 30 12:00 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 ..regioninfo.crc

/opt/hbase/data/-ROOT-/70236052/info:
total 32K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  718 Oct 30 17:35 fb48fa0302be4d37a5b70ffbf039fe9a
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .fb48fa0302be4d37a5b70ffbf039fe9a.crc
-rwxrwxrwx 1 hbase hbase  718 Oct 30 12:11 a913edee0ac34de490c46ee12175dc02
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:11 .a913edee0ac34de490c46ee12175dc02.crc
-rwxrwxrwx 1 hbase hbase  714 Oct 30 12:00 c6f09dc3ee6a4150b8e787a747a81707
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:00 .c6f09dc3ee6a4150b8e787a747a81707.crc

/opt/hbase/data/-ROOT-/70236052/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  411 Oct 30 12:00 hlog.1351623609149
-rwxr-xr-x 1 hbase hbase   12 Oct 30 12:00 .hlog.1351623609149.crc

/opt/hbase/data/-ROOT-/.tmp:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 ..

/opt/hbase/data/.META.:
total 12K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 1028785192
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .

/opt/hbase/data/.META./1028785192:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .oldlogs
-rwxr-xr-x 1 hbase hbase  111 Oct 30 12:00 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 ..regioninfo.crc
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 ..

/opt/hbase/data/.META./1028785192/info:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase 2.8K Oct 30 17:35 de44bdf76ce6477ba3a7da1df0b159df
-rw-rw-r-- 1 hbase hbase   32 Oct 30 17:35 .de44bdf76ce6477ba3a7da1df0b159df.crc

/opt/hbase/data/.META./1028785192/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 12:00 hlog.1351623609390
-rwxr-xr-x 1 hbase hbase   12 Oct 30 12:00 .hlog.1351623609390.crc
abhi@hbase2:~$

In the above:
.logs and .oldlogs – contain the Write-Ahead Log (WAL) files that are shared by all regions from that region server

  • The .logs directory has a subdirectory for each RegionServer e.g. /opt/hbase/data/.logs/hbase2,54165,1351719115872.
    RegionServer subdirectory name is of the format [RegionServer Host], [Port], [Server Start Code]

    In each RegionServer subdirectory, there are the HLog files. You can view the contents of a HLog file using the org.apache.hadoop.hbase.regionserver.wal.HLog tool.

    abhi@hbase2:~$ hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump /opt/hbase/data/.logs/hbase2,54165,1351719115872/hbase2%2C54165%2C1351719115872.1351719119755
    12/11/05 13:45:37 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
    12/11/05 13:45:37 INFO wal.SequenceFileLogReader: Input stream class: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker, not adjusting length
    Sequence 618 from region 70236052 in table -ROOT-
      Action:
        row: .META.,,1
        column: info:server
        at time: Mon Nov 05 02:01:20 PST 2012
      Action:
        row: .META.,,1
        column: info:serverstartcode
        at time: Mon Nov 05 02:01:20 PST 2012
    Sequence 619 from region 1028785192 in table .META.
      Action:
        row: tbl1,,1351953259243.5ab545fc59596f7784eb179df4654930.
        column: info:server
        at time: Mon Nov 05 02:01:21 PST 2012
      Action:
        row: tbl1,,1351953259243.5ab545fc59596f7784eb179df4654930.
        column: info:serverstartcode
        at time: Mon Nov 05 02:01:21 PST 2012
    
    ... ... ... ... ...
    
    Sequence 637 from region 5ab545fc59596f7784eb179df4654930 in table tbl1
      Action:
        row: row3
        column: cf2:col2
        at time: Mon Nov 05 03:00:53 PST 2012
    Sequence 638 from region 5ab545fc59596f7784eb179df4654930 in table tbl1
      Action:
        row: row3
        column: cf3:col1
        at time: Mon Nov 05 03:00:54 PST 2012
    abhi@hbase2:~$
    
    
  • The .oldlogs directory contains all the old logfiles i.e. the ones that are already stored in the store files

-ROOT- and .META. – contain the files related to the catalog tables
hbase.id and hbase.version – hold the unique ID of the cluster, and the file format respectively

users and table1 – hold the store files for the user-defined tables. Each table has its own directory which has the following contents:

  • .tableinfo file that contains the table and column family schemas
    /opt/hbase/data/users/.tableinfo.0000000001
    You can view the contents of the file as follows:

    abhi@hbase2:~$ cat /opt/hbase/data/users/.tableinfo.0000000001
    MIN_VERSIONS0TTL
    2147483647      BLOCKSIZE65536  IN_MEMORYfalse
    BLOCKCACHEtrue
    
    {NAME => 'users', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
    abhi@hbase2:~$
    
  • Region directories – a directory for each region of a table. The directory name is MD5 hash of the region name
    /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2

      A Region directory contains

    • .regioninfo – file that contains serialized information of a Region
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2
    • Column-Family directories – a directory for each Column-Family that holds the actual storage file of a table
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info

      The store files are in HFile format.
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info/4080f890ac4449a2a151d5c4d79f8579

      The figure below describes the path to the actual storage file.

      Path to the Actual Storage File

      NOTE: If you have inserted data in your table and yet you don’t see any storage files under the column-family directory, do a flush on your table i.e. flush 'users'

      We can view the contents of a store file using the org.apache.hadoop.hbase.io.hfile.HFile tool.

      abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info/4080f890ac4449a2a151d5c4d79f8579 -p
      12/10/31 17:02:04 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
      12/10/31 17:02:04 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
      12/10/31 17:02:04 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
      K: abhi/info:age/1351726630581/Put/vlen=2 V: 30
      K: abhi/info:name/1351726623818/Put/vlen=8 V: abhishek
      Scanned kv count -> 2
      abhi@hbase2:~$
      
Advertisements

1 comment so far

  1. […] Understanding HBase files and directories […]


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: