July 2015 ~ Hadoop BigData Interview Questions

Most commonly used HDFS Commands

12:36 PM Admin

1. List all the files and directories under root hdfs directory
hdfs dfs -ls /

To list all the files and directories recursively, use lsr command as below.
hdfs dfs -lsr /

2. Copy a file or directory to another directory in hdfs
hdfs dfs -cp /hdfs/src/dir /hdfs/dest/dir/
hdfs dfs -cp /hdfs/src/dir/file1 /hdfs/dest/dir/

3. Move or rename a file or directory to another directory in hdfs

hdfs dfs -mv /hdfs/src/dir /hdfs/dest/dir
hdfs dfs -mv /hdfs/current/dir/file1 /hdfs/current/dir/file2

4. Create a directory in hdfs

hdfs dfs -mkdir /hdfs/new/dir/path

If parent directory not present, -p option can be user to create all the directories at one go.
hdfs dfs -mkdir -p /hdfs/new/dir/path

5. Read a file in hdfs

hdfs dfs -cat /hdfs/dir/file1.dat

If the file is snappy compressed, use text command instead to read the same.
hdfs dfs -text /hdfs/dir/file1.dat.snappy

6. Copy a file from local file system to hdfs file system

hdfs dfs -copyFromLocal /local/dir/path/file1.txt /hdfs/dir/path/file.txt

put command also does the same.
hdfs dfs -put /local/dir/path/file1.txt /hdfs/dir/path/file.txt

7. Copy a file from hdfs file system to local file system

hdfs dfs -copyToLocal /hdfs/dir/path/file.txt /local/dir/path/file1.txt

get command also does the same.
hdfs dfs -get /hdfs/dir/path/file.txt /local/dir/path/file1.txt

8. Delete a file or directory from hdfs

Use rm command to delete a file

hdfs dfs -rm /hdfs/dir/path/file.txt

Use rmr command to delete a directory and it's contents

hdfs dfs -rmr /hdfs/dir/path/

9. Create a zero byte file in hdfs

hdfs dfs -touchz /hdfs/dir/path/file.txt

10. Verify a directory or file using test command in hdfs

hadoop fs -test -[defsz] URI

Options:

-d: f the path is a directory, return 0.

-e: if the path exists, return 0.

-f: if the path is a file, return 0.

-s: if the path is not empty, return 0.

-z: if the file is zero length, return 0.

Example:

hadoop fs -test -e /hdfs/dir/path

apache / Commands / common / copyfromlocal / copytolocal / Hadoop / hadoop developer / hdfs / hdfs commands / interview / lsr / mkdir / questions / touchz

No Comments

1) What is Hive?

Hive is an ETL and Data warehousing tool developed on top of Hadoop Distributed File System (HDFS). It is a data warehouse framework to query and analyse the data that is stored in HDFS. Hive is an open-source-software that lets programmers analyze large data sets on Hadoop.

2) What are the Key components in Hive Architecture?

Command Line Interface (cli)
Hive Web Interface (hwi)
HiveServer (hiveserver)
Metastore
Driver
Execution Engine

3) What is a Hive Metastore?

Hive Metastore is a central repository in Hive. It is used for storing schema information or metadata in the external database.

4) Mention what are the different modes of Hive?

Different modes of Hive depends on the size of data nodes in Hadoop.

These modes are,

Local mode
Map reduce mode

5) What is the use of Hcatalog?

Hcatalog can be used to share data structures with external systems. Hcatalog provides access to hive metastore to users of other tools on Hadoop so that they can read and write data to hive’s data warehouse.

6) What are the differences between Hive and HBase?

Hive enables most of the SQL queries, but HBase does not allow SQL queries
Hive does not support record level insert, update, and delete operations on table
Hive is a data warehouse framework whereas HBase is NoSQL database
Hive run on the top of MapReduce, HBase runs on the top of HDFS

7) Where is table data stored in Apache Hive by default?

hdfs://namenode_server/user/hive/warehouse

8) Write a hive query to view all the databases whose name begins with "db"

hive> SHOW DATABASES LIKE 'db.*';

9) Write a query to rename a table Student to Student_2.

hive> Alter Table Student RENAME to Student_2;

10) How to create an index on a table in Hive?

hive> CREATE INDEX index_salary ON TABLE employee (salary)

AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';

The above query creates an index named index_salary which points to the salary column in the employee table.

11) How to delete the above index named index_salary?

DROP INDEX index_salary ON employee;

12) How to see the present working directory in UNIX from hive. Is it possible to run this command from hive?

Hive allows execution of UNIX commands with the use of exclamatory (!) symbol. Just use the ! Symbol before the command to be executed at the hive prompt. To see the present working directory in UNIX from hive run !pwd at the hive prompt.

hive> !pwd

apache / base / catalog / database / Hadoop / hdfs / Hive / hive developer / hiveql / index / interview / metastore / nosql / questions / table

No Comments

Hadoop BigData Interview Questions

Sunday, July 19, 2015

More Hadoop Administration Interview Questions

More Interview Questions on Hadoop Map reduce Programming

More Pig Interview Questions

More Hadoop interview questions

Most commonly used HDFS Commands

Top 10 Hadoop Administration interview questions

Top 10 Hive Developer interview questions

1) What is Hive?

2) What are the Key components in Hive Architecture?

3) What is a Hive Metastore?

4) Mention what are the different modes of Hive?

5) What is the use of Hcatalog?

6) What are the differences between Hive and HBase?

7) Where is table data stored in Apache Hive by default?

8) Write a hive query to view all the databases whose name begins with "db"

9) Write a query to rename a table Student to Student_2.

10) How to create an index on a table in Hive?

11) How to delete the above index named index_salary?

12) How to see the present working directory in UNIX from hive. Is it possible to run this command from hive?

Top 10 Map Reduce interview questions

Top 10 PIG interview questions

Top 10 Hadoop interview questions

Popular Posts

Categories

Blog Archive

About

Blogroll