We are aware that there are too many parts and pieces in Hadoop. It is easy to misunderstand the functionality of a particular Hadoop piece. One of the piece I found difficulty is the difference between Hadoop and HBase. It is really important to understand the different better there to work with Hadoop ecosystem.
Hadoop
The core of the Hadoop is filesystem. It has a new filesystem HDFS and a processing framework map reduce. HDFS files system stores data in pieces. A single piece is usually replicated thrice in a cheap server. Another feature of HDFS is it's size. Since it is usually used for big data projects the filesystem size is huge. Filesystem like NTFS will have minimum file size is 8 KB where as in HDFS it is minimum 64 MB or 128 MB. You can even choose a bigger size inorder to time it for the needs. In Hadoop files are called as chunks. You can process those natively using Map Reduce program. Map Reduce are written in Java programming language.
HBase
Customers using Hadoop don't want to query data natively. Although it is effective to work with Hadoop natively but it is not accessible for more people in the organization. In a normal work environment we have small number of developers and a larger number of analyst. One solution is to use HBase library. It stores data in wide column. Easier way to think is it looks like spreadsheet with two column where one column have id and the other column contains data. In Hadoop to view the data you need to write a map reduce program which is not possible for all users.
You might be wondering how the table is created in the first place. We need to use create table statement which will performs abstraction on Hadoop filesystem.
Comments
Post a Comment