Hadoop File System Commands

I just took a Hadoop developer training in the week of September 10. To me, Hadoop is not totally new as I’ve tried HelloWorld sample and Serengeti project. Still, I found it’s nice to get away from daily job and go through a series of lectures and hands-on labs in a training setting. Believe it or not, I felt more tired after training than a typical working day. This post is not much new but just helps me on the commands when needed later.

Hadoop File System (HDFS) is a fundamental building block in Hadoop ecosystem. It’s a file system designed to store big data including input data and result data. For that, HDFS distributes big files across networked data nodes. Although logically continuous, a big file can be split into many chucks, each of which can be saved on a different physical machine.

You can access the files with APIs, but more often with the command lines (which is, BTW, an application built on top of the HDFS APIs). There are about 30 commands to manage a Hadoop file system remotely, for example from a Linux shell. Don’t confuse the Hadoop file system with your local file system. In some way, you can think of Hadoop file system as a file system on another machine.

Syntax Overview

The basic syntax of HDFS commands is as follows:

$ hadoop fs -command [extra arguments]

For example:

$ hadoop fs -ls

The first part “hadoop fs” is always the same for file system related commands. After that is very much like typical Unix/Linux commands in syntax. Besides managing the HDFS itself, there are commands to import data files from local file system to HDFS, and export data files from HDFS to local file system. These commands are unique therefore deserve most attention.

[-put ... ]
[-copyFromLocal ... ]
[-moveFromLocal ... ]
[-get [-ignoreCrc] [-crc] ]
[-getmerge [addnl]]
[-copyToLocal [-ignoreCrc] [-crc] ]
[-moveToLocal [-crc] ]

A Typical Use Case

When using Hadoop, you need to move your data to a HDFS before processing it, and optionally move the result back to your local file system. Here is a typical flow:

$ hadoop fs -mkdir test
$ hadoop fs -put input.txt test/input.txt
$ hadoop fs -ls test
$ hadoop fs -cat test/input.txt
$ hadoop jar mr.jar WordCount test/input.txt test/output
$ hadoop fs -ls test/output
$ hadoop fs -lss test
$ hadoop fs -get test/output .

Other Useful Commands

There are other commands you will find useful, for example the commands listed below:

$ hadoop fs -chmod 777 test/input.txt
$ hadoop fs -cp test/input.txt test/input1.txt
$ hadoop fs -cp test/input.txt test/input1.txt
$ hadoop fs -rmr test

Space use in bytes for individual files or directories

$ hadoop fs -du

Space used in bytes in summary, therefore only one entry is given

$ hadoop fs -dus
$ hadoop fs -count /test

Getting Help

Lastly but not least is the help command. When in doubt, you can always use help:

$hadoop fs -help

Don’t forget the “-“ before the help, or you will see something similar but different. You can also add specific command you want to get help on, for example,

$hadoop fs -help copyFromLocal
This entry was posted in Big Data and tagged , , . Bookmark the permalink. Post a comment or leave a trackback: Trackback URL.

3 Comments

  1. Posted September 28, 2012 at 5:50 pm | Permalink

    Hadoop File System Commands | http://t.co/YeSdvPPV http://t.co/5uWBlC5I #hadoop

  2. Posted October 3, 2012 at 9:55 am | Permalink

    Hadoop File System Commands – http://t.co/0b7mi0eP http://t.co/0b7mi0eP

  3. Posted October 9, 2012 at 3:03 pm | Permalink

    Hadoop File System Commands http://t.co/UfvfpSNs #hadoop #HDFS

2 Trackbacks

  • By Tofa IT » Hadoop File System Commands on September 26, 2012 at 6:32 am

    [...] job and go through a series of lectures and hands-on labs in a training setting. Believe it [...]Hadoop File System Commands originally appeared on DoubleCloud by Steve Jin, author of VMware VI and vSphere SDK (Prentice [...]

  • By Hadoop File System APIs | DoubleCloud.org on October 2, 2012 at 12:00 am

    [...] mentioned in my previous post on Hadoop File System commands, the commands are built on top of the HDFS APIs. These APIs are [...]

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

  • NEED HELP?


    My consulting helps clients with virtualization and cloud computing, including VMware infrastructure automation and orchestration, vSphere management APIs, and deep product integration with hypervisors. Current training offerings include vSphere APIs training, vCenter Orchestrator training, and etc. Should you, or someone you know, need these consulting services or training, please feel free to contact me: steve __AT__ doublecloud.org.

    Me: Steve Jin, VMware vExpert who authored the VMware VI and vSphere SDK by Prentice Hall, and created the de factor open source vSphere Java API while working at VMware engineering. Companies like Cisco, EMC, NetApp, HP, Dell, VMware, are among the users of the API and other tools I developed for their products, internal IT orchestration, and test automation.