Tag: Hadoop

ZooKeeper: znodes

You can view details of your ZooKeeper instance using zk_dump from the hbase shell.

 

hbase(main):001:0> zk_dump
HBase is rooted at /hbase
Active master address: hbase2,52114,1352384965804
Backup master addresses:
Region server holding ROOT: hbase2,43876,1352384966172
Region servers:
 hbase2,43876,1352384966172
Quorum Server Statistics:
 localhost:2181
  Zookeeper version: 3.4.3-cdh4.1.0--1, built on 09/29/2012 17:54 GMT
  Clients:
   /127.0.0.1:43352[1](queued=0,recved=12,sent=12)
   /127.0.0.1:43146[1](queued=0,recved=2459,sent=2466)
   /127.0.0.1:43147[1](queued=0,recved=2283,sent=2284)
   /127.0.0.1:43354[0](queued=0,recved=1,sent=0)
   /127.0.0.1:43145[1](queued=0,recved=2551,sent=2645)

  Latency min/avg/max: 0/0/104
  Received: 7519
  Sent: 7620
  Outstanding: 0
  Zxid: 0xa9
  Mode: standalone
  Node count: 16

hbase(main):002:0>

HBase creates a list of znodes under its root node that contain various details. Let us examine the values they hold using the zookeeper-client tool.

 

abhi@hbase2:~$ zookeeper-client 
Connecting to localhost:2181
2012-11-08 13:56:27,437 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.3-cdh4.1.0--1, built on 09/29/2012 17:54 GMT
2012-11-08 13:56:27,441 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=hbase2
2012-11-08 13:56:27,441 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.6.0_31
2012-11-08 13:56:27,442 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Sun Microsystems Inc.
2012-11-08 13:56:27,443 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/opt/java/jdk1.6.0_31/jre
2012-11-08 13:56:27,444 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/usr/lib/zookeeper/bin/../build/classes:/usr/lib/zookeeper/bin/../build/lib/*.jar:/usr/lib/zookeeper/bin/../lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/bin/../lib/log4j-1.2.15.jar:/usr/lib/zookeeper/bin/../lib/jline-0.9.94.jar:/usr/lib/zookeeper/bin/../zookeeper-3.4.3-cdh4.1.0.jar:/usr/lib/zookeeper/bin/../src/java/lib/*.jar:/etc/zookeeper/conf::/etc/zookeeper/conf:/usr/lib/zookeeper/zookeeper.jar:/usr/lib/zookeeper/zookeeper-3.4.3-cdh4.1.0.jar:/usr/lib/zookeeper/lib/log4j-1.2.15.jar:/usr/lib/zookeeper/lib/slf4j-api-1.6.1.jar:/usr/lib/zookeeper/lib/slf4j-log4j12-1.6.1.jar:/usr/lib/zookeeper/lib/netty-3.2.2.Final.jar:/usr/lib/zookeeper/lib/jline-0.9.94.jar
2012-11-08 13:56:27,444 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/opt/java/jdk1.6.0_31/jre/lib/amd64/server:/opt/java/jdk1.6.0_31/jre/lib/amd64:/opt/java/jdk1.6.0_31/jre/../lib/amd64:/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2012-11-08 13:56:27,445 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2012-11-08 13:56:27,446 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2012-11-08 13:56:27,446 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2012-11-08 13:56:27,447 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2012-11-08 13:56:27,447 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=3.2.0-29-generic
2012-11-08 13:56:27,448 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=abhi
2012-11-08 13:56:27,449 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/home/abhi
2012-11-08 13:56:27,449 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/home/abhi
2012-11-08 13:56:27,452 [myid:] - INFO  [main:ZooKeeper@433] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@42b988a6
Welcome to ZooKeeper!
2012-11-08 13:56:27,515 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@958] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (Unable to locate a login configuration)
JLine support is enabled
2012-11-08 13:56:27,534 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@850] - Socket connection established to localhost/127.0.0.1:2181, initiating session
2012-11-08 13:56:27,576 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1187] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x13ae06ce9780006, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0] get /hbase/hbaseid
�
   1126@hbase21c3139f4-de24-4d59-9441-755a0e3f572e
cZxid = 0xc
ctime = Thu Nov 08 06:29:26 PST 2012
mZxid = 0xd
mtime = Thu Nov 08 06:29:26 PST 2012
pZxid = 0xc
cversion = 0
dataVersion = 1
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 52
numChildren = 0
[zk: localhost:2181(CONNECTED) 1] get /hbase/master
�
   1126@hbase2hbase2,52114,1352384965804
cZxid = 0x9
ctime = Thu Nov 08 06:29:26 PST 2012
mZxid = 0x9
mtime = Thu Nov 08 06:29:26 PST 2012
pZxid = 0x9
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x13ae06ce9780000
dataLength = 44
numChildren = 0
[zk: localhost:2181(CONNECTED) 2] get /hbase/replication
Node does not exist: /hbase/replication
[zk: localhost:2181(CONNECTED) 3] get /hbase/root-region-server
�
   1126@hbase2hbase2,43876,1352384966172
cZxid = 0x15
ctime = Thu Nov 08 06:29:33 PST 2012
mZxid = 0x15
mtime = Thu Nov 08 06:29:33 PST 2012
pZxid = 0x15
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 42
numChildren = 0
[zk: localhost:2181(CONNECTED) 4] ls /hbase/rs
[hbase2,43876,1352384966172]
[zk: localhost:2181(CONNECTED) 5] get /hbase/shutdown
�
   1126@hbase2Thu Nov 08 06:29:26 PST 2012
cZxid = 0xf
ctime = Thu Nov 08 06:29:26 PST 2012
mZxid = 0xf
mtime = Thu Nov 08 06:29:26 PST 2012
pZxid = 0xf
cversion = 0
dataVersion = 0
aclVersion = 0
ephemeralOwner = 0x0
dataLength = 44
numChildren = 0
[zk: localhost:2181(CONNECTED) 6] ls /hbase/splitlog
[]
[zk: localhost:2181(CONNECTED) 7] ls /hbase/table
[]
[zk: localhost:2181(CONNECTED) 8] ls /hbase/unassigned
[]
[zk: localhost:2181(CONNECTED) 9]


HBase Major Compaction

This is in continuation to my last two posts:

Each HBase Table has

  • 1 or More Column-families – that group columns and specify the physical layout of data storage
  • 1 or More Regions – that are akin to Shards (in the RDBMS world) i.e. A set of rows belonging to a table specified by its StartKey and EndKey

For every Column-family of a table in a region we have a Store which has

  • 1 MemStore – a buffer that holds in-memory modifications (till it is flushed to store files)
  • 0 or More Store files (HFiles) – that get created when MemStore fills up.

These store files are immutable and HBase creates a new file on every MemStore flush i.e. it does not write to an existing HFile.

Compaction combines all these Store files for a Region into fewer Store files to optimize performance. There are two types of compaction.

  • Minor Compaction – combines several Store files into fewer Store files
  • Major Compaction – reads all the Store files for a Region and writes to a single Store file.

Let us see how Major Compaction impacts HBase storage.

Create a table and insert data.


hbase(main):021:0> create 'users','info'
0 row(s) in 1.0540 seconds

hbase(main):022:0> list
TABLE
tbl1
users
2 row(s) in 0.0160 seconds

hbase(main):023:0> put 'users','abhi','info:name','abhishek'
0 row(s) in 0.0730 seconds

hbase(main):024:0> put 'users','abhi','info:age','30'
0 row(s) in 0.0120 seconds

Let us browse the HBase Root Directory and see how the data gets persisted physically on the filesystem.


abhi@hbase2:~$ ls -ltha /opt/hbase/data/
total 48K
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 users
drwxr-xr-x 8 hbase users 4.0K Nov  3 14:50 .
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 07:43 tbl1
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 05:35 .oldlogs
drwxrwxr-x 3 hbase hbase 4.0K Nov  3 05:34 .logs
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 -ROOT-
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .META.
-rwxr-xr-x 1 hbase hbase   38 Oct 30 12:00 hbase.id
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.id.crc
-rwxr-xr-x 1 hbase hbase    3 Oct 30 12:00 hbase.version
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.version.crc
drwxr-xr-x 3 abhi  users 4.0K Oct 11 08:10 ..
abhi@hbase2:~$
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 6dda0024cbf8619a9c823e6ebbf78888
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 .
-rwxr-xr-x 1 hbase hbase  515 Nov  3 14:50 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:50 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:50 .tmp
drwxr-xr-x 8 hbase users 4.0K Nov  3 14:50 ..
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 .
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:50 .oldlogs
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:50 info
-rwxr-xr-x 1 hbase hbase  222 Nov  3 14:50 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Nov  3 14:50 ..regioninfo.crc
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 ..
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 8.0K
drwxrwxr-x 4 hbase hbase 4.0K Nov  3 14:50 ..
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:50 .

As you can see above, HBase created

  • a directory ‘users’ for the table and under it
  • a sub-directory ‘6dda0024cbf8619a9c823e6ebbf78888’ for the Region and under it
  • a sub-directory ‘info’ for the Column-family

All modifications to table/region columns that belong to the ‘info’ column-family get stored as store files under ‘/opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/’

Although we entered data in the table but we don’t see any store files as all the data is currently in MemStore and has not been flushed yet. So let us flush the memstore and view the contents of the ‘info’ directory.


hbase(main):025:0> flush 'users'
0 row(s) in 0.0390 seconds

abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 16K
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:52 .
drwxrwxr-x 5 hbase hbase 4.0K Nov  3 14:52 ..
-rwxrwxrwx 1 hbase hbase  660 Nov  3 14:52 32f19d12583a46b98211ee77311f48eb
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .32f19d12583a46b98211ee77311f48eb.crc

Notice how the store file /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/32f19d12583a46b98211ee77311f48eb got created. Let us add few more data to our table and view the filesystem.


hbase(main):026:0> put 'users','avi','info:name','avinash'
0 row(s) in 0.0050 seconds

hbase(main):027:0> flush 'users'
0 row(s) in 0.0490 seconds
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 24K
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:52 .
-rwxrwxrwx 1 hbase hbase  623 Nov  3 14:52 ecc5f02da6234ac397d25bee6df0d019
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .ecc5f02da6234ac397d25bee6df0d019.crc
drwxrwxr-x 5 hbase hbase 4.0K Nov  3 14:52 ..
-rwxrwxrwx 1 hbase hbase  660 Nov  3 14:52 32f19d12583a46b98211ee77311f48eb
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .32f19d12583a46b98211ee77311f48eb.crc

Let us add some more data..

hbase(main):028:0> put 'users','avi','info:age','20'
0 row(s) in 0.0040 seconds

hbase(main):029:0> flush 'users'
0 row(s) in 0.1040 seconds
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 32K
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:53 .
-rwxrwxrwx 1 hbase hbase  615 Nov  3 14:53 ebda0cc0af9a4d9e803a10cce27c52b6
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:53 .ebda0cc0af9a4d9e803a10cce27c52b6.crc
-rwxrwxrwx 1 hbase hbase  623 Nov  3 14:52 ecc5f02da6234ac397d25bee6df0d019
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .ecc5f02da6234ac397d25bee6df0d019.crc
drwxrwxr-x 5 hbase hbase 4.0K Nov  3 14:52 ..
-rwxrwxrwx 1 hbase hbase  660 Nov  3 14:52 32f19d12583a46b98211ee77311f48eb
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:52 .32f19d12583a46b98211ee77311f48eb.crc
abhi@hbase2:~$

Notice how for each flush, a new store file gets created. Let us view the contents of these store files.

abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/ebda0cc0af9a4d9e803a10cce27c52b6 -p
12/11/03 14:55:59 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:55:59 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:56:00 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: avi/info:age/1351979593884/Put/vlen=2 V: 20
Scanned kv count -> 1
abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/ecc5f02da6234ac397d25bee6df0d019 -p
12/11/03 14:56:19 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:56:19 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:56:20 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: avi/info:name/1351979559394/Put/vlen=7 V: avinash
Scanned kv count -> 1
abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/32f19d12583a46b98211ee77311f48eb -p
12/11/03 14:56:31 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:56:31 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:56:31 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: abhi/info:age/1351979477099/Put/vlen=2 V: 30
K: abhi/info:name/1351979467158/Put/vlen=8 V: abhishek
Scanned kv count -> 2
abhi@hbase2:~$

An alternate method to view the store file contents..

abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile --printkv --file /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/ebda0cc0af9a4d9e803a10cce27c52b6
12/11/03 14:56:57 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:56:57 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:56:58 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: avi/info:age/1351979593884/Put/vlen=2 V: 20
Scanned kv count -> 1
abhi@hbase2:~$

Let us invoke Major Compaction to combine these files into a single new file.

hbase(main):030:0> major_compact 'users'
0 row(s) in 0.1000 seconds

hbase(main):031:0>
abhi@hbase2:~$
abhi@hbase2:~$ ls -ltha /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/
total 16K
drwxrwxr-x 2 hbase hbase 4.0K Nov  3 14:57 .
-rwxrwxrwx 1 hbase hbase  731 Nov  3 14:57 6a65463fa2814751b255fdcf1542cd0d
-rw-rw-r-- 1 hbase hbase   16 Nov  3 14:57 .6a65463fa2814751b255fdcf1542cd0d.crc
drwxrwxr-x 5 hbase hbase 4.0K Nov  3 14:52 ..
abhi@hbase2:~$

Let us view the contents of the new file that got created as a result of major compaction.

abhi@hbase2:~$
abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/6dda0024cbf8619a9c823e6ebbf78888/info/6a65463fa2814751b255fdcf1542cd0d -p          12/11/03 14:58:23 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
12/11/03 14:58:23 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
12/11/03 14:58:23 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
K: abhi/info:age/1351979477099/Put/vlen=2 V: 30
K: abhi/info:name/1351979467158/Put/vlen=8 V: abhishek
K: avi/info:age/1351979593884/Put/vlen=2 V: 20
K: avi/info:name/1351979559394/Put/vlen=7 V: avinash
Scanned kv count -> 4
abhi@hbase2:~$
abhi@hbase2:~$

Understanding HBase files and directories

This is in continuation to my last post – Getting started with HBase.

HBase physically stores data in the specified Root Directory on the filesystem. The filesystem is typically HDFS but since I have installed HBase in the stand-alone mode, I am using the local filesystem.

Now lets examine the contents of our HBase Root Directory.

Note: Always do a flush on your tables so that the data gets written as files in your filesystem.


abhi@hbase2:~$ ls -ltha /opt/hbase/data/
total 48K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .oldlogs
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .logs
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 table1
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 users
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 -ROOT-
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .META.
-rwxr-xr-x 1 hbase hbase   38 Oct 30 12:00 hbase.id
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.id.crc
-rwxr-xr-x 1 hbase hbase    3 Oct 30 12:00 hbase.version
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.version.crc
drwxr-xr-x 3 abhi  users 4.0K Oct 11 08:10 ..
abhi@hbase2:~$
abhi@hbase2:~$
abhi@hbase2:~$ ls -lthaR /opt/hbase/data/
/opt/hbase/data/:
total 48K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .oldlogs
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .logs
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 table1
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 users
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 -ROOT-
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .META.
-rwxr-xr-x 1 hbase hbase   38 Oct 30 12:00 hbase.id
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.id.crc
-rwxr-xr-x 1 hbase hbase    3 Oct 30 12:00 hbase.version
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 .hbase.version.crc
drwxr-xr-x 3 abhi  users 4.0K Oct 11 08:10 ..

/opt/hbase/data/.oldlogs:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:32 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..

/opt/hbase/data/.logs:
total 12K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:31 hbase2,54165,1351719115872
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..

/opt/hbase/data/.logs/hbase2,54165,1351719115872:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 31 14:31 .
-rwxr-xr-x 1 hbase hbase    0 Oct 31 14:31 .hbase2%2C54165%2C1351719115872.1351719119755.crc
-rwxr-xr-x 1 hbase hbase    0 Oct 31 14:31 hbase2%2C54165%2C1351719115872.1351719119755
drwxrwxr-x 3 hbase hbase 4.0K Oct 31 14:31 ..

/opt/hbase/data/table1:
total 24K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 9a35a1636b9d0639e2838c5a8ff180cf
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 .
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
-rwxr-xr-x 1 hbase hbase  935 Oct 30 17:10 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:10 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .tmp

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf:
total 28K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 cf2
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 cf1
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .oldlogs
-rwxr-xr-x 1 hbase hbase  225 Oct 30 17:10 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 17:10 ..regioninfo.crc
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 ..

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/cf2:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  790 Oct 30 17:35 cbca7d4b4619453e95e313e54fd12649
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .cbca7d4b4619453e95e313e54fd12649.crc

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/cf1:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  848 Oct 30 17:35 c11f3c3fe30e437c907e7b4656bbb6a8
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .c11f3c3fe30e437c907e7b4656bbb6a8.crc

/opt/hbase/data/table1/9a35a1636b9d0639e2838c5a8ff180cf/.oldlogs:
total 16K
drwxrwxr-x 5 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 17:10 hlog.1351642227144
-rwxr-xr-x 1 hbase hbase   12 Oct 30 17:10 .hlog.1351642227144.crc

/opt/hbase/data/table1/.tmp:
total 8.0K
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 17:10 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:10 .

/opt/hbase/data/users:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ecff3a77396cba69adea1b1f789ca5a2
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 .
-rwxr-xr-x 1 hbase hbase  515 Oct 30 15:36 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 15:36 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .tmp

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .oldlogs
-rwxr-xr-x 1 hbase hbase  222 Oct 30 15:36 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 15:36 ..regioninfo.crc
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 ..

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase 1.3K Oct 30 17:35 4080f890ac4449a2a151d5c4d79f8579
-rw-rw-r-- 1 hbase hbase   20 Oct 30 17:35 .4080f890ac4449a2a151d5c4d79f8579.crc

/opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 15:36 hlog.1351636576499
-rwxr-xr-x 1 hbase hbase   12 Oct 30 15:36 .hlog.1351636576499.crc

/opt/hbase/data/users/.tmp:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 15:36 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 15:36 ..

/opt/hbase/data/-ROOT-:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 70236052
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  551 Oct 30 12:00 .tableinfo.0000000001
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:00 ..tableinfo.0000000001.crc
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .tmp

/opt/hbase/data/-ROOT-/70236052:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .oldlogs
-rwxr-xr-x 1 hbase hbase  109 Oct 30 12:00 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 ..regioninfo.crc

/opt/hbase/data/-ROOT-/70236052/info:
total 32K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase  718 Oct 30 17:35 fb48fa0302be4d37a5b70ffbf039fe9a
-rw-rw-r-- 1 hbase hbase   16 Oct 30 17:35 .fb48fa0302be4d37a5b70ffbf039fe9a.crc
-rwxrwxrwx 1 hbase hbase  718 Oct 30 12:11 a913edee0ac34de490c46ee12175dc02
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:11 .a913edee0ac34de490c46ee12175dc02.crc
-rwxrwxrwx 1 hbase hbase  714 Oct 30 12:00 c6f09dc3ee6a4150b8e787a747a81707
-rw-rw-r-- 1 hbase hbase   16 Oct 30 12:00 .c6f09dc3ee6a4150b8e787a747a81707.crc

/opt/hbase/data/-ROOT-/70236052/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  411 Oct 30 12:00 hlog.1351623609149
-rwxr-xr-x 1 hbase hbase   12 Oct 30 12:00 .hlog.1351623609149.crc

/opt/hbase/data/-ROOT-/.tmp:
total 8.0K
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
drwxrwxr-x 4 hbase hbase 4.0K Oct 30 12:00 ..

/opt/hbase/data/.META.:
total 12K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 1028785192
drwxr-xr-x 8 hbase users 4.0K Oct 30 17:10 ..
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 .

/opt/hbase/data/.META./1028785192:
total 24K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 .
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 info
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .oldlogs
-rwxr-xr-x 1 hbase hbase  111 Oct 30 12:00 .regioninfo
-rw-rw-r-- 1 hbase hbase   12 Oct 30 12:00 ..regioninfo.crc
drwxrwxr-x 3 hbase hbase 4.0K Oct 30 12:00 ..

/opt/hbase/data/.META./1028785192/info:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 17:35 .
-rwxrwxrwx 1 hbase hbase 2.8K Oct 30 17:35 de44bdf76ce6477ba3a7da1df0b159df
-rw-rw-r-- 1 hbase hbase   32 Oct 30 17:35 .de44bdf76ce6477ba3a7da1df0b159df.crc

/opt/hbase/data/.META./1028785192/.oldlogs:
total 16K
drwxrwxr-x 4 hbase hbase 4.0K Oct 31 14:32 ..
drwxrwxr-x 2 hbase hbase 4.0K Oct 30 12:00 .
-rwxr-xr-x 1 hbase hbase  124 Oct 30 12:00 hlog.1351623609390
-rwxr-xr-x 1 hbase hbase   12 Oct 30 12:00 .hlog.1351623609390.crc
abhi@hbase2:~$

In the above:
.logs and .oldlogs – contain the Write-Ahead Log (WAL) files that are shared by all regions from that region server

  • The .logs directory has a subdirectory for each RegionServer e.g. /opt/hbase/data/.logs/hbase2,54165,1351719115872.
    RegionServer subdirectory name is of the format [RegionServer Host], [Port], [Server Start Code]

    In each RegionServer subdirectory, there are the HLog files. You can view the contents of a HLog file using the org.apache.hadoop.hbase.regionserver.wal.HLog tool.

    abhi@hbase2:~$ hbase org.apache.hadoop.hbase.regionserver.wal.HLog --dump /opt/hbase/data/.logs/hbase2,54165,1351719115872/hbase2%2C54165%2C1351719115872.1351719119755
    12/11/05 13:45:37 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
    12/11/05 13:45:37 INFO wal.SequenceFileLogReader: Input stream class: org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker, not adjusting length
    Sequence 618 from region 70236052 in table -ROOT-
      Action:
        row: .META.,,1
        column: info:server
        at time: Mon Nov 05 02:01:20 PST 2012
      Action:
        row: .META.,,1
        column: info:serverstartcode
        at time: Mon Nov 05 02:01:20 PST 2012
    Sequence 619 from region 1028785192 in table .META.
      Action:
        row: tbl1,,1351953259243.5ab545fc59596f7784eb179df4654930.
        column: info:server
        at time: Mon Nov 05 02:01:21 PST 2012
      Action:
        row: tbl1,,1351953259243.5ab545fc59596f7784eb179df4654930.
        column: info:serverstartcode
        at time: Mon Nov 05 02:01:21 PST 2012
    
    ... ... ... ... ...
    
    Sequence 637 from region 5ab545fc59596f7784eb179df4654930 in table tbl1
      Action:
        row: row3
        column: cf2:col2
        at time: Mon Nov 05 03:00:53 PST 2012
    Sequence 638 from region 5ab545fc59596f7784eb179df4654930 in table tbl1
      Action:
        row: row3
        column: cf3:col1
        at time: Mon Nov 05 03:00:54 PST 2012
    abhi@hbase2:~$
    
    
  • The .oldlogs directory contains all the old logfiles i.e. the ones that are already stored in the store files

-ROOT- and .META. – contain the files related to the catalog tables
hbase.id and hbase.version – hold the unique ID of the cluster, and the file format respectively

users and table1 – hold the store files for the user-defined tables. Each table has its own directory which has the following contents:

  • .tableinfo file that contains the table and column family schemas
    /opt/hbase/data/users/.tableinfo.0000000001
    You can view the contents of the file as follows:

    abhi@hbase2:~$ cat /opt/hbase/data/users/.tableinfo.0000000001
    MIN_VERSIONS0TTL
    2147483647      BLOCKSIZE65536  IN_MEMORYfalse
    BLOCKCACHEtrue
    
    {NAME => 'users', FAMILIES => [{NAME => 'info', BLOOMFILTER => 'NONE', REPLICATION_SCOPE => '0', VERSIONS => '3', COMPRESSION => 'NONE', MIN_VERSIONS => '0', TTL => '2147483647', BLOCKSIZE => '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}
    abhi@hbase2:~$
    
  • Region directories – a directory for each region of a table. The directory name is MD5 hash of the region name
    /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2

      A Region directory contains

    • .regioninfo – file that contains serialized information of a Region
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2
    • Column-Family directories – a directory for each Column-Family that holds the actual storage file of a table
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info

      The store files are in HFile format.
      /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info/4080f890ac4449a2a151d5c4d79f8579

      The figure below describes the path to the actual storage file.

      Path to the Actual Storage File

      NOTE: If you have inserted data in your table and yet you don’t see any storage files under the column-family directory, do a flush on your table i.e. flush 'users'

      We can view the contents of a store file using the org.apache.hadoop.hbase.io.hfile.HFile tool.

      abhi@hbase2:~$ hbase org.apache.hadoop.hbase.io.hfile.HFile -f /opt/hbase/data/users/ecff3a77396cba69adea1b1f789ca5a2/info/4080f890ac4449a2a151d5c4d79f8579 -p
      12/10/31 17:02:04 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
      12/10/31 17:02:04 WARN conf.Configuration: fs.default.name is deprecated. Instead, use fs.defaultFS
      12/10/31 17:02:04 INFO hfile.CacheConfig: Allocating LruBlockCache with maximum size 247.9m
      K: abhi/info:age/1351726630581/Put/vlen=2 V: 30
      K: abhi/info:name/1351726623818/Put/vlen=8 V: abhishek
      Scanned kv count -> 2
      abhi@hbase2:~$
      

Getting started with HBase

There are quite a few HBase tutorials out there, but the reason I wanted to add another one was for two specific reasons:
1. To document the installation steps (HBase stand-alone mode) for my ready reference in future
2. To highlight the installation issues I faced and how I got around them so that anyone else facing the same can benefit.

Installation Steps

Download and install the latest version of Ubuntu from http://www.ubuntu.com/download

In my case I downloaded and installed Ubuntu 12.04.1 64-bit i.e. ubuntu-12.04.1-desktop-amd64.iso

Download and install Java SDK from http://www.oracle.com/technetwork/java/javase/downloads/index.html

Since Cloudera recommends JDK version 1.6.0_31 (https://ccp.cloudera.com/display/CDH4DOC/Java+Development+Kit+Installation)
I downloaded the same and installed it as follows:

root@ubuntu:~# mkdir /opt/java
root@ubuntu:~# cd /opt/java/
root@ubuntu:/opt/java# chmod +x jdk-6u31-linux-x64.bin 
root@ubuntu:/opt/java# ./jdk-6u31-linux-x64.bin 
... ... ... ...
... ... ... ...
root@ubuntu:/opt/java#
root@ubuntu:/opt/java# ls -lth jdk1.6.0_31/
total 19M
-r--r--r--  1 root root 4.7K Oct  6 14:32 register_zh_CN.html
-r--r--r--  1 root root 5.1K Oct  6 14:32 register.html
-r--r--r--  1 root root 6.5K Oct  6 14:32 register_ja.html
drwxr-xr-x  7 root root 4.0K Oct  6 14:32 jre
drwxr-xr-x  3 root root 4.0K Oct  6 14:32 lib
drwxr-xr-x  7 root root 4.0K Jan 20  2012 db
drwxr-xr-x  3 root root 4.0K Jan 20  2012 include
drwxr-xr-x  9 root root 4.0K Jan 20  2012 sample
drwxr-xr-x 10 root root 4.0K Jan 20  2012 demo
drwxr-xr-x  4 root root 4.0K Jan 20  2012 man
drwxr-xr-x  2 root root 4.0K Jan 20  2012 bin
-r--r--r--  1 root root 3.3K Jan 20  2012 COPYRIGHT
-r--r--r--  1 root root   40 Jan 20  2012 LICENSE
-r--r--r--  1 root root  115 Jan 20  2012 README.html
-r--r--r--  1 root root 165K Jan 20  2012 THIRDPARTYLICENSEREADME.txt
-rw-r--r--  1 root root  19M Jan 20  2012 src.zip
root@ubuntu:/opt/java# 

Set the JAVA_HOME environment variable

root@ubuntu:~# echo $JAVA_HOME
root@ubuntu:~#
root@ubuntu:~# vi .bashrc 

Add the following lines, save and exit:

export JAVA_HOME=/opt/java/jdk1.6.0_31
export PATH=$JAVA_HOME/bin:$PATH

Check

root@ubuntu:~# source .bashrc 
root@ubuntu:~# echo $JAVA_HOME
/opt/java/jdk1.6.0_31

root@ubuntu:~#
root@ubuntu:~# java -version
java version "1.6.0_31"
Java(TM) SE Runtime Environment (build 1.6.0_31-b04)
Java HotSpot(TM) 64-Bit Server VM (build 20.6-b01, mixed mode)
root@ubuntu:~# 

Configure the CDH repositories

root@ubuntu:~# mkdir /opt/hbase
root@ubuntu:~# cd /opt/hbase/

Download http://archive.cloudera.com/cdh4/one-clickinstall/precise/amd64/cdh4-repository_1.0_all.deb into /opt/hbase

root@ubuntu:/opt/hbase# ls -lth
total 4.0K
-rw-r--r-- 1 root root 3.3K Oct  6 14:38 cdh4-repository_1.0_all.deb
root@ubuntu:/opt/hbase# 
root@ubuntu:/opt/hbase# sudo dpkg -i cdh4-repository_1.0_all.deb 
Selecting previously unselected package cdh4-repository.
(Reading database ... 140999 files and directories currently installed.)
Unpacking cdh4-repository (from cdh4-repository_1.0_all.deb) ...
Setting up cdh4-repository (1.0) ...
gpg: keyring `/etc/apt/secring.gpg' created
gpg: keyring `/etc/apt/trusted.gpg.d/cloudera-cdh4.gpg' created
gpg: key 02A818DD: public key "Cloudera Apt Repository" imported
gpg: Total number processed: 1
gpg:               imported: 1
root@ubuntu:/opt/hbase# 

Install HBase

root@ubuntu:/opt/hbase# sudo apt-get update
Ign http://archive.cloudera.com precise-cdh4 InRelease                                                          
Ign http://security.ubuntu.com precise-security InRelease                                                       
Ign http://extras.ubuntu.com precise InRelease                                                                  
Get:1 http://archive.cloudera.com precise-cdh4 Release.gpg [198 B]                                  
Ign http://us.archive.ubuntu.com precise InRelease                                               
Ign http://us.archive.ubuntu.com precise-updates InRelease           
Ign http://us.archive.ubuntu.com precise-backports InRelease         
Hit http://extras.ubuntu.com precise Release.gpg                     
Hit http://security.ubuntu.com precise-security Release.gpg          
Get:2 http://archive.cloudera.com precise-cdh4 Release [1,682 B]     
Hit http://us.archive.ubuntu.com precise Release.gpg                                                   
Hit http://extras.ubuntu.com precise Release                                                
Hit http://security.ubuntu.com precise-security Release                                      
Get:3 http://archive.cloudera.com precise-cdh4/contrib Sources [6,382 B]                     
Hit http://us.archive.ubuntu.com precise-updates Release.gpg                                          
Hit http://us.archive.ubuntu.com precise-backports Release.gpg                              
Hit http://extras.ubuntu.com precise/main Sources                                           
Get:4 http://archive.cloudera.com precise-cdh4/contrib amd64 Packages [16.8 kB]             
Hit http://security.ubuntu.com precise-security/main Sources                                          
Hit http://us.archive.ubuntu.com precise Release                                                      
Ign http://archive.cloudera.com precise-cdh4/contrib TranslationIndex                                           
Hit http://extras.ubuntu.com precise/main amd64 Packages                                    
Hit http://extras.ubuntu.com precise/main i386 Packages                                     
Hit http://us.archive.ubuntu.com precise-updates Release                                                        
Ign http://extras.ubuntu.com precise/main TranslationIndex                                                      
Hit http://us.archive.ubuntu.com precise-backports Release                                                      
Ign http://archive.cloudera.com precise-cdh4/contrib Translation-en_US                                          
Ign http://archive.cloudera.com precise-cdh4/contrib Translation-en                         
Ign http://extras.ubuntu.com precise/main Translation-en_US                                                     
Ign http://extras.ubuntu.com precise/main Translation-en                                                        
Hit http://us.archive.ubuntu.com precise/main Sources                                                           
Hit http://us.archive.ubuntu.com precise/restricted Sources                                                     
Hit http://us.archive.ubuntu.com precise/universe Sources                                                       
Hit http://us.archive.ubuntu.com precise/multiverse Sources                                                     
Hit http://us.archive.ubuntu.com precise/main amd64 Packages                                                    
Hit http://security.ubuntu.com precise-security/restricted Sources                                              
Hit http://security.ubuntu.com precise-security/universe Sources                                                
Hit http://us.archive.ubuntu.com precise/restricted amd64 Packages                                              
Hit http://security.ubuntu.com precise-security/multiverse Sources                                              
Hit http://security.ubuntu.com precise-security/main amd64 Packages                                             
Hit http://security.ubuntu.com precise-security/restricted amd64 Packages                                       
Hit http://security.ubuntu.com precise-security/universe amd64 Packages                                         
Hit http://us.archive.ubuntu.com precise/universe amd64 Packages                                                
Hit http://security.ubuntu.com precise-security/multiverse amd64 Packages                                       
Hit http://security.ubuntu.com precise-security/main i386 Packages                                              
Hit http://security.ubuntu.com precise-security/restricted i386 Packages                                        
Hit http://us.archive.ubuntu.com precise/multiverse amd64 Packages                                              
Hit http://security.ubuntu.com precise-security/universe i386 Packages                                          
Hit http://security.ubuntu.com precise-security/multiverse i386 Packages                                        
Hit http://security.ubuntu.com precise-security/main TranslationIndex                                           
Hit http://security.ubuntu.com precise-security/multiverse TranslationIndex                                     
Hit http://security.ubuntu.com precise-security/restricted TranslationIndex                                     
Hit http://security.ubuntu.com precise-security/universe TranslationIndex                                       
Hit http://us.archive.ubuntu.com precise/main i386 Packages                                                     
Hit http://security.ubuntu.com precise-security/main Translation-en                                             
Hit http://security.ubuntu.com precise-security/multiverse Translation-en                                       
Hit http://security.ubuntu.com precise-security/restricted Translation-en                                       
Hit http://us.archive.ubuntu.com precise/restricted i386 Packages                                               
Hit http://security.ubuntu.com precise-security/universe Translation-en                                         
Hit http://us.archive.ubuntu.com precise/universe i386 Packages                                                 
Hit http://us.archive.ubuntu.com precise/multiverse i386 Packages
Hit http://us.archive.ubuntu.com precise/main TranslationIndex
Hit http://us.archive.ubuntu.com precise/multiverse TranslationIndex
Hit http://us.archive.ubuntu.com precise/restricted TranslationIndex
Hit http://us.archive.ubuntu.com precise/universe TranslationIndex
Hit http://us.archive.ubuntu.com precise-updates/main Sources
Hit http://us.archive.ubuntu.com precise-updates/restricted Sources
Hit http://us.archive.ubuntu.com precise-updates/universe Sources
Hit http://us.archive.ubuntu.com precise-updates/multiverse Sources
Hit http://us.archive.ubuntu.com precise-updates/main amd64 Packages
Hit http://us.archive.ubuntu.com precise-updates/restricted amd64 Packages
Hit http://us.archive.ubuntu.com precise-updates/universe amd64 Packages
Hit http://us.archive.ubuntu.com precise-updates/multiverse amd64 Packages
Hit http://us.archive.ubuntu.com precise-updates/main i386 Packages
Hit http://us.archive.ubuntu.com precise-updates/restricted i386 Packages
Hit http://us.archive.ubuntu.com precise-updates/universe i386 Packages
Hit http://us.archive.ubuntu.com precise-updates/multiverse i386 Packages
Hit http://us.archive.ubuntu.com precise-updates/main TranslationIndex
Hit http://us.archive.ubuntu.com precise-updates/multiverse TranslationIndex
Hit http://us.archive.ubuntu.com precise-updates/restricted TranslationIndex
Hit http://us.archive.ubuntu.com precise-updates/universe TranslationIndex
Hit http://us.archive.ubuntu.com precise-backports/main Sources
Hit http://us.archive.ubuntu.com precise-backports/restricted Sources
Hit http://us.archive.ubuntu.com precise-backports/universe Sources
Hit http://us.archive.ubuntu.com precise-backports/multiverse Sources
Hit http://us.archive.ubuntu.com precise-backports/main amd64 Packages
Hit http://us.archive.ubuntu.com precise-backports/restricted amd64 Packages
Hit http://us.archive.ubuntu.com precise-backports/universe amd64 Packages
Hit http://us.archive.ubuntu.com precise-backports/multiverse amd64 Packages
Hit http://us.archive.ubuntu.com precise-backports/main i386 Packages
Hit http://us.archive.ubuntu.com precise-backports/restricted i386 Packages
Hit http://us.archive.ubuntu.com precise-backports/universe i386 Packages
Hit http://us.archive.ubuntu.com precise-backports/multiverse i386 Packages
Hit http://us.archive.ubuntu.com precise-backports/main TranslationIndex
Hit http://us.archive.ubuntu.com precise-backports/multiverse TranslationIndex
Hit http://us.archive.ubuntu.com precise-backports/restricted TranslationIndex
Hit http://us.archive.ubuntu.com precise-backports/universe TranslationIndex
Hit http://us.archive.ubuntu.com precise/main Translation-en
Hit http://us.archive.ubuntu.com precise/multiverse Translation-en
Hit http://us.archive.ubuntu.com precise/restricted Translation-en
Hit http://us.archive.ubuntu.com precise/universe Translation-en
Hit http://us.archive.ubuntu.com precise-updates/main Translation-en
Hit http://us.archive.ubuntu.com precise-updates/multiverse Translation-en
Hit http://us.archive.ubuntu.com precise-updates/restricted Translation-en
Hit http://us.archive.ubuntu.com precise-updates/universe Translation-en
Hit http://us.archive.ubuntu.com precise-backports/main Translation-en
Hit http://us.archive.ubuntu.com precise-backports/multiverse Translation-en
Hit http://us.archive.ubuntu.com precise-backports/restricted Translation-en
Hit http://us.archive.ubuntu.com precise-backports/universe Translation-en
Fetched 25.1 kB in 30s (827 B/s)
Reading package lists... Done
root@ubuntu:/opt/hbase# 
root@ubuntu:/opt/hbase# sudo apt-get install hbase hbase-master
Reading package lists... Done
Building dependency tree       
Reading state information... Done
The following extra packages will be installed:
  bigtop-jsvc bigtop-utils hadoop hadoop-hdfs libopts25 ntp zookeeper
Suggested packages:
  ntp-doc
The following NEW packages will be installed:
  bigtop-jsvc bigtop-utils hadoop hadoop-hdfs hbase hbase-master libopts25 ntp zookeeper
0 upgraded, 9 newly installed, 0 to remove and 119 not upgraded.
Need to get 70.1 MB of archives.
After this operation, 82.3 MB of additional disk space will be used.
Do you want to continue [Y/n]? Y
Get:1 http://us.archive.ubuntu.com/ubuntu/ precise/main libopts25 amd64 1:5.12-0.1ubuntu1 [59.9 kB]
Get:2 http://us.archive.ubuntu.com/ubuntu/ precise-updates/main ntp amd64 1:4.2.6.p3+dfsg-1ubuntu3.1 [612 kB]
Get:3 http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4/contrib bigtop-jsvc amd64 0.4+352-1.cdh4.1.0.p0.29~precise-cdh4.1.0 [53.2 kB]
Get:4 http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4/contrib bigtop-utils all 0.4+352-1.cdh4.1.0.p0.28~precise-cdh4.1.0 [2,004 B]
Get:5 http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4/contrib zookeeper all 3.4.3+25-1.cdh4.1.0.p0.28~precise-cdh4.1.0 [4,087 kB]
Get:6 http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4/contrib hadoop all 2.0.0+541-1.cdh4.1.0.p0.27~precise-cdh4.1.0 [16.6 MB]
Get:7 http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4/contrib hadoop-hdfs all 2.0.0+541-1.cdh4.1.0.p0.27~precise-cdh4.1.0 [12.7 MB]
Get:8 http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4/contrib hbase all 0.92.1+154-1.cdh4.1.0.p0.23~precise-cdh4.1.0 [35.9 MB]
Get:9 http://archive.cloudera.com/cdh4/ubuntu/precise/amd64/cdh/ precise-cdh4/contrib hbase-master all 0.92.1+154-1.cdh4.1.0.p0.23~precise-cdh4.1.0 [19.2 kB]
Fetched 70.1 MB in 8min 41s (134 kB/s)                                                                          
Selecting previously unselected package libopts25.
(Reading database ... 141003 files and directories currently installed.)
Unpacking libopts25 (from .../libopts25_1%3a5.12-0.1ubuntu1_amd64.deb) ...
Selecting previously unselected package ntp.
Unpacking ntp (from .../ntp_1%3a4.2.6.p3+dfsg-1ubuntu3.1_amd64.deb) ...
Selecting previously unselected package bigtop-jsvc.
Unpacking bigtop-jsvc (from .../bigtop-jsvc_0.4+352-1.cdh4.1.0.p0.29~precise-cdh4.1.0_amd64.deb) ...
Selecting previously unselected package bigtop-utils.
Unpacking bigtop-utils (from .../bigtop-utils_0.4+352-1.cdh4.1.0.p0.28~precise-cdh4.1.0_all.deb) ...
Selecting previously unselected package zookeeper.
Unpacking zookeeper (from .../zookeeper_3.4.3+25-1.cdh4.1.0.p0.28~precise-cdh4.1.0_all.deb) ...
Selecting previously unselected package hadoop.
Unpacking hadoop (from .../hadoop_2.0.0+541-1.cdh4.1.0.p0.27~precise-cdh4.1.0_all.deb) ...
Selecting previously unselected package hadoop-hdfs.
Unpacking hadoop-hdfs (from .../hadoop-hdfs_2.0.0+541-1.cdh4.1.0.p0.27~precise-cdh4.1.0_all.deb) ...
Selecting previously unselected package hbase.
Unpacking hbase (from .../hbase_0.92.1+154-1.cdh4.1.0.p0.23~precise-cdh4.1.0_all.deb) ...
Selecting previously unselected package hbase-master.
Unpacking hbase-master (from .../hbase-master_0.92.1+154-1.cdh4.1.0.p0.23~precise-cdh4.1.0_all.deb) ...
Processing triggers for ureadahead ...
Processing triggers for man-db ...
Setting up libopts25 (1:5.12-0.1ubuntu1) ...
Setting up ntp (1:4.2.6.p3+dfsg-1ubuntu3.1) ...
 * Starting NTP server ntpd                                                                               [ OK ] 
Setting up bigtop-jsvc (0.4+352-1.cdh4.1.0.p0.29~precise-cdh4.1.0) ...
Setting up bigtop-utils (0.4+352-1.cdh4.1.0.p0.28~precise-cdh4.1.0) ...
Setting up zookeeper (3.4.3+25-1.cdh4.1.0.p0.28~precise-cdh4.1.0) ...
update-alternatives: using /etc/zookeeper/conf.dist to provide /etc/zookeeper/conf (zookeeper-conf) in auto mode.
Setting up hadoop (2.0.0+541-1.cdh4.1.0.p0.27~precise-cdh4.1.0) ...
update-alternatives: using /etc/hadoop/conf.empty to provide /etc/hadoop/conf (hadoop-conf) in auto mode.
Setting up hadoop-hdfs (2.0.0+541-1.cdh4.1.0.p0.27~precise-cdh4.1.0) ...
Setting up hbase (0.92.1+154-1.cdh4.1.0.p0.23~precise-cdh4.1.0) ...
update-alternatives: using /etc/hbase/conf.dist to provide /etc/hbase/conf (hbase-conf) in auto mode.
Setting up hbase-master (0.92.1+154-1.cdh4.1.0.p0.23~precise-cdh4.1.0) ...
Starting Hadoop HBase master daemon: +======================================================================+
|      Error: JAVA_HOME is not set and Java could not be found         |
+----------------------------------------------------------------------+
| Please download the latest Sun JDK from the Sun Java web site        |
|       > http://java.sun.com/javase/downloads/ <                      |
|                                                                      |
| HBase requires Java 1.6 or later.                                    |
| NOTE: This script will find Sun Java whether you install using the   |
|       binary or the RPM based installer.                             |
+======================================================================+
invoke-rc.d: initscript hbase-master, action &quot;start&quot; failed.
dpkg: error processing hbase-master (--configure):
 subprocess installed post-installation script returned error exit status 1
Processing triggers for libc-bin ...
ldconfig deferred processing now taking place
Errors were encountered while processing:
 hbase-master
E: Sub-process /usr/bin/dpkg returned an error code (1)
root@ubuntu:/opt/hbase# 

I got the above error the first time so I checked if JAVA_HOME is set properly.

root@ubuntu:/opt/hbase# echo $JAVA_HOME
/opt/java/jdk1.6.0_31

Since it seems ok, I decided to directly set JAVA_HOME in the hbase-master script

root@ubuntu:/opt/hbase# vi /etc/init.d/hbase-master

# Add this
export JAVA_HOME=/opt/java/jdk1.6.0_31

Lets try installing again

root@ubuntu:/opt/hbase# sudo apt-get install hbase hbase-master
Reading package lists... Done
Building dependency tree       
Reading state information... Done
hbase is already the newest version.
hbase-master is already the newest version.
0 upgraded, 0 newly installed, 0 to remove and 119 not upgraded.
1 not fully installed or removed.
After this operation, 0 B of additional disk space will be used.
Do you want to continue [Y/n]? Y
Setting up hbase-master (0.92.1+154-1.cdh4.1.0.p0.23~precise-cdh4.1.0) ...
Starting Hadoop HBase master daemon: starting master, logging to /var/log/hbase/hbase-hbase-master-ubuntu.out
hbase-master.
root@ubuntu:/opt/hbase# 

This time everything went well.
View the list of HBase configuration files.

root@ubuntu:/opt/hbase# ls -lth /etc/hbase/conf/
total 28K
-rw-r--r-- 1 root root 1.1K Oct 30 12:06 hbase-site.xml
-rw-r--r-- 1 root root 2.3K Sep 29 11:54 hadoop-metrics.properties
-rw-r--r-- 1 root root 4.2K Sep 29 11:54 hbase-env.sh
-rw-r--r-- 1 root root 2.2K Sep 29 11:54 hbase-policy.xml
-rw-r--r-- 1 root root 2.5K Sep 29 11:54 log4j.properties
-rw-r--r-- 1 root root   10 Sep 29 11:54 regionservers
root@ubuntu:/opt/hbase# cat /etc/hbase/conf/regionservers
localhost
root@ubuntu:/opt/hbase#

Lets invoke the HBase shell and test.

root@ubuntu:/opt/hbase# hbase shell
12/10/06 15:07:20 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.92.1-cdh4.1.0, rUnknown, Sat Sep 29 11:55:59 PDT 2012

hbase(main):001:0> status
1 servers, 0 dead, 3.0000 average load

hbase(main):001:0> list
TABLE                                                                                                            
0 row(s) in 0.6370 seconds

hbase(main):002:0> 

hbase(main):002:0> create 'table1','cf1'


^Croot@ubuntu:/opt/hbase# 
root@ubuntu:/opt/hbase# 

Here I encountered the second issue. For some reason the HBase shell would hang.
After googling around for quite some time I found a fix.

Update the /etc/hosts file and ensure that there is no 127.0.1.1 that points to localhost and ubuntu
and comment out the ipv6 lines

root@ubuntu:/opt/hbase# vi /etc/hosts

192.168.38.137  hbase2

127.0.0.1       localhost ubuntu

# The following lines are desirable for IPv6 capable hosts
#::1     ip6-localhost ip6-loopback
#fe00::0 ip6-localnet
#ff00::0 ip6-mcastprefix
#ff02::1 ip6-allnodes
#ff02::2 ip6-allrouters

Also disable ipv6 as follows:

root@ubuntu:~# vi /etc/sysctl.conf 

and add the following lines to the end of it:

# Abhi: Disable ipv6
net.ipv6.conf.all.disable_ipv6 = 1
net.ipv6.conf.default.disable_ipv6 = 1
net.ipv6.conf.lo.disable_ipv6 = 1

Now restart the system

root@ubuntu:/opt/hbase# /etc/init.d/hbase-master restart
Restarting Hadoop HBase master daemon: stopping master...
Starting Hadoop HBase master daemon: starting master, logging to /var/log/hbase/hbase-hbase-master-ubuntu.out
hbase-master.
root@ubuntu:/opt/hbase# 

Check if its running

root@ubuntu:/opt/hbase# jps
16353 Jps
15654 HMaster

Open the Hbase shell and lets play around with few commands

root@ubuntu:/opt/hbase# 
root@ubuntu:/opt/hbase# hbase shell
12/10/06 15:35:18 WARN conf.Configuration: hadoop.native.lib is deprecated. Instead, use io.native.lib.available
HBase Shell; enter 'help' for list of supported commands.
Type "exit" to leave the HBase Shell
Version 0.92.1-cdh4.1.0, rUnknown, Sat Sep 29 11:55:59 PDT 2012

hbase(main):007:0* list
TABLE                                                                                                            
0 row(s) in 0.0040 seconds

hbase(main):008:0> create 'table1','cf1'
0 row(s) in 1.0880 seconds

hbase(main):009:0> list
TABLE                                                                                                            
table1                                                                                                           
1 row(s) in 0.0160 seconds

hbase(main):010:0> scan 'table1'
ROW                           COLUMN+CELL                                                                        
0 row(s) in 0.0200 seconds

hbase(main):012:0> put 'table1','row1','cf1:greeting','Hello'
0 row(s) in 0.0590 seconds

hbase(main):013:0> put 'table1','row1','cf1:name','World'
0 row(s) in 0.0110 seconds

hbase(main):014:0> scan 'table1'
ROW                           COLUMN+CELL                                                                        
 row1                         column=cf1:greeting, timestamp=1349563833359, value=Hello                          
 row1                         column=cf1:name, timestamp=1349563858582, value=World                              
1 row(s) in 0.0350 seconds

hbase(main):016:0> get 'table1','row1'
COLUMN                        CELL                                                                               
 cf1:greeting                 timestamp=1349563833359, value=Hello                                               
 cf1:name                     timestamp=1349563858582, value=World                                               
2 row(s) in 0.0300 seconds

hbase(main):017:0> put 'table1','row2','cf1:greeting','Hi'
0 row(s) in 0.0140 seconds

hbase(main):018:0> put 'table1','row2','cf1:name','Abhi'
0 row(s) in 0.0080 seconds

hbase(main):019:0> scan 'table1'
ROW                           COLUMN+CELL                                                                        
 row1                         column=cf1:greeting, timestamp=1349563833359, value=Hello                          
 row1                         column=cf1:name, timestamp=1349563858582, value=World                              
 row2                         column=cf1:greeting, timestamp=1349563961204, value=Hi                             
 row2                         column=cf1:name, timestamp=1349563973437, value=Abhi                               
2 row(s) in 0.0730 seconds

hbase(main):020:0> get 'table1','row2'
COLUMN                        CELL                                                                               
 cf1:greeting                 timestamp=1349563961204, value=Hi                                                  
 cf1:name                     timestamp=1349563973437, value=Abhi                                                
2 row(s) in 0.0080 seconds

hbase(main):021:0> 

You can open a browser and go to http://localhost:60010/ to access the HBase monitoring WebUI

HBase WebUI

Updated on October 31, 2012
Note: I have changed the hostname of my system from ‘ubuntu’ to ‘hbase2’

Set the HBase Root Directory

Although we are able to play around with HBase – create tables, put and get data etc. the data will get deleted once we restart the system as it is transient. In the stand-alone mode everything is executed within a single Java process and the data/files get stored under /tmp by default. Most OS clear /tmp on reboot thereby removing all the data. To make the data persistent we need to edit the hbase-site.xml file and set the root directory.

abhi@hbase2:~$ sudo mkdir /opt/hbase/data/
abhi@hbase2:~$ sudo chown -cRvf hbase:users /opt/hbase/data/
abhi@hbase2:~$ ls -lth /opt/hbase/
total 8.0K
drwxr-xr-x 7 hbase users 4.0K Oct 30 12:47 data
abhi@hbase2:~$
abhi@hbase2:~$ sudo vi /etc/hbase/conf/hbase-site.xml

<configuration>
    <property>
        <name>hbase.rootdir</name>
        <value>file:///opt/hbase/data</value>
    </property>
</configuration>

Restart hbase-master.

abhi@hbase2:~$ sudo /etc/init.d/hbase-master restart

Now when you create tables and enter data, the data will persist even after system restart.
Lets create a table and enter data.

hbase(main):005:0> create 'users','info'
0 row(s) in 1.1180 seconds

hbase(main):006:0> list
TABLE
users
1 row(s) in 0.0090 seconds

hbase(main):007:0> put 'users','abhi','info:name','abhishek'
0 row(s) in 0.0660 seconds

hbase(main):008:0> put 'users','abhi','info:age','30'
0 row(s) in 0.0110 seconds

hbase(main):009:0> scan 'users'
ROW                                         COLUMN+CELL
 abhi                                       column=info:age, timestamp=1351626512340, value=30
 abhi                                       column=info:name, timestamp=1351626501011, value=abhishek
1 row(s) in 0.0350 seconds

hbase(main):010:0> flush 'users'
0 row(s) in 0.0750 seconds

hbase(main):011:0>

Note: Always do a flush on your tables so that the data gets written as files in your filesystem.

In my next post we’ll see how HBase persists the data physically on the filesystem as files and directories.