Home > Big Data > Hadoop File System Commands

Hadoop File System Commands

September 26th, 2012 Leave a comment Go to comments

I just took a Hadoop developer training in the week of September 10. To me, Hadoop is not totally new as I’ve tried HelloWorld sample and Serengeti project. Still, I found it’s nice to get away from daily job and go through a series of lectures and hands-on labs in a training setting. Believe it or not, I felt more tired after training than a typical working day. This post is not much new but just helps me on the commands when needed later.

Hadoop File System (HDFS) is a fundamental building block in Hadoop ecosystem. It’s a file system designed to store big data including input data and result data. For that, HDFS distributes big files across networked data nodes. Although logically continuous, a big file can be split into many chucks, each of which can be saved on a different physical machine.

Time to learn how to "Google" and manage your VMware and clouds in a fast and secure

HTML5 App

You can access the files with APIs, but more often with the command lines (which is, BTW, an application built on top of the HDFS APIs). There are about 30 commands to manage a Hadoop file system remotely, for example from a Linux shell. Don’t confuse the Hadoop file system with your local file system. In some way, you can think of Hadoop file system as a file system on another machine.

Syntax Overview

The basic syntax of HDFS commands is as follows:

$ hadoop fs -command [extra arguments]

For example:

$ hadoop fs -ls

The first part “hadoop fs” is always the same for file system related commands. After that is very much like typical Unix/Linux commands in syntax. Besides managing the HDFS itself, there are commands to import data files from local file system to HDFS, and export data files from HDFS to local file system. These commands are unique therefore deserve most attention.

[-put ... ]
[-copyFromLocal ... ]
[-moveFromLocal ... ]
[-get [-ignoreCrc] [-crc] ]
[-getmerge [addnl]]
[-copyToLocal [-ignoreCrc] [-crc] ]
[-moveToLocal [-crc] ]

A Typical Use Case

When using Hadoop, you need to move your data to a HDFS before processing it, and optionally move the result back to your local file system. Here is a typical flow:

$ hadoop fs -mkdir test
$ hadoop fs -put input.txt test/input.txt
$ hadoop fs -ls test
$ hadoop fs -cat test/input.txt
$ hadoop jar mr.jar WordCount test/input.txt test/output
$ hadoop fs -ls test/output
$ hadoop fs -lss test
$ hadoop fs -get test/output .

Other Useful Commands

There are other commands you will find useful, for example the commands listed below:

$ hadoop fs -chmod 777 test/input.txt
$ hadoop fs -cp test/input.txt test/input1.txt
$ hadoop fs -cp test/input.txt test/input1.txt
$ hadoop fs -rmr test

Space use in bytes for individual files or directories

$ hadoop fs -du

Space used in bytes in summary, therefore only one entry is given

$ hadoop fs -dus
$ hadoop fs -count /test

Getting Help

Lastly but not least is the help command. When in doubt, you can always use help:

$hadoop fs -help

Don’t forget the “-“ before the help, or you will see something similar but different. You can also add specific command you want to get help on, for example,

$hadoop fs -help copyFromLocal

Categories: Big Data Tags: , ,
  1. September 28th, 2012 at 17:50 | #1

    Hadoop File System Commands | http://t.co/YeSdvPPV http://t.co/5uWBlC5I #hadoop

  2. October 3rd, 2012 at 09:55 | #2

    Hadoop File System Commands – http://t.co/0b7mi0eP http://t.co/0b7mi0eP

  3. October 9th, 2012 at 15:03 | #3

    Hadoop File System Commands http://t.co/UfvfpSNs #hadoop #HDFS

  1. September 26th, 2012 at 06:32 | #1
  2. October 2nd, 2012 at 00:00 | #2