Sunday, July 19, 2015
Most commonly used HDFS Commands
12:36 PM Admin
1. List all the files and directories under root hdfs directory
hdfs dfs -ls /
To list all the files and directories recursively, use lsr command as below.
hdfs dfs -lsr /
2. Copy a file or directory to another directory in hdfs
hdfs dfs -cp /hdfs/src/dir /hdfs/dest/dir/
hdfs dfs -cp /hdfs/src/dir/file1 /hdfs/dest/dir/
3. Move or rename a file or directory to another directory in hdfs
hdfs dfs -mv /hdfs/src/dir /hdfs/dest/dir
hdfs dfs -mv /hdfs/current/dir/file1 /hdfs/current/dir/file2
4. Create a directory in hdfs
hdfs dfs -mkdir /hdfs/new/dir/path
If parent directory not present, -p option can be user to create all the directories at one go.
hdfs dfs -mkdir -p /hdfs/new/dir/path
5. Read a file in hdfs
hdfs dfs -cat /hdfs/dir/file1.dat
If the file is snappy compressed, use text command instead to read the same.
hdfs dfs -text /hdfs/dir/file1.dat.snappy
6. Copy a file from local file system to hdfs file system
hdfs dfs -copyFromLocal /local/dir/path/file1.txt /hdfs/dir/path/file.txt
put command also does the same.
hdfs dfs -put /local/dir/path/file1.txt /hdfs/dir/path/file.txt
7. Copy a file from hdfs file system to local file system
hdfs dfs -copyToLocal /hdfs/dir/path/file.txt /local/dir/path/file1.txt
get command also does the same.
hdfs dfs -get /hdfs/dir/path/file.txt /local/dir/path/file1.txt
hdfs dfs -ls /
To list all the files and directories recursively, use lsr command as below.
hdfs dfs -lsr /
2. Copy a file or directory to another directory in hdfs
hdfs dfs -cp /hdfs/src/dir /hdfs/dest/dir/
hdfs dfs -cp /hdfs/src/dir/file1 /hdfs/dest/dir/
3. Move or rename a file or directory to another directory in hdfs
hdfs dfs -mv /hdfs/src/dir /hdfs/dest/dir
hdfs dfs -mv /hdfs/current/dir/file1 /hdfs/current/dir/file2
4. Create a directory in hdfs
hdfs dfs -mkdir /hdfs/new/dir/path
If parent directory not present, -p option can be user to create all the directories at one go.
hdfs dfs -mkdir -p /hdfs/new/dir/path
5. Read a file in hdfs
hdfs dfs -cat /hdfs/dir/file1.dat
If the file is snappy compressed, use text command instead to read the same.
hdfs dfs -text /hdfs/dir/file1.dat.snappy
6. Copy a file from local file system to hdfs file system
hdfs dfs -copyFromLocal /local/dir/path/file1.txt /hdfs/dir/path/file.txt
put command also does the same.
hdfs dfs -put /local/dir/path/file1.txt /hdfs/dir/path/file.txt
hdfs dfs -copyToLocal /hdfs/dir/path/file.txt /local/dir/path/file1.txt
get command also does the same.
hdfs dfs -get /hdfs/dir/path/file.txt /local/dir/path/file1.txt
8. Delete a file or directory from hdfs
Use rm command to delete a file
hdfs dfs -rm /hdfs/dir/path/file.txt
Use rmr command to delete a directory and it's contents
hdfs dfs -rmr /hdfs/dir/path/
9. Create a zero byte file in hdfs
hdfs dfs -touchz /hdfs/dir/path/file.txt
10. Verify a directory or file using test command in hdfs
hadoop fs -test -[defsz] URI
Options:
-d: f the path is a directory, return 0.
-e: if the path exists, return 0.
-f: if the path is a file, return 0.
-s: if the path is not empty, return 0.
-z: if the file is zero length, return 0.
Example:
hadoop fs -test -e /hdfs/dir/path
apache / Commands / common / copyfromlocal / copytolocal / Hadoop / hadoop developer / hdfs / hdfs commands / interview / lsr / mkdir / questions / touchz
Top 10 Hadoop Administration interview questions
12:32 PM Admin
1. Difference between FIFO and Capacity scheduler
2. How do you executer job on cluster using FIFO scheduler
3. How do you identify a long running job in a large busy cluster?
2. How do you executer job on cluster using FIFO scheduler
3. How do you identify a long running job in a large busy cluster?
Top 10 Hive Developer interview questions
12:31 PM Admin
1) What is Hive?
Hive is an ETL and Data warehousing tool developed on top of Hadoop Distributed File System (HDFS). It is a data warehouse framework to query and analyse the data that is stored in HDFS. Hive is an open-source-software that lets programmers analyze large data sets on Hadoop.2) What are the Key components in Hive Architecture?
- Command Line Interface (cli)
- Hive Web Interface (hwi)
- HiveServer (hiveserver)
- Metastore
- Driver
- Execution Engine
3) What is a Hive Metastore?
Hive Metastore is a central repository in Hive. It is used for storing schema information or metadata in the external database.4) Mention what are the different modes of Hive?
Different modes of Hive depends on the size of data nodes in Hadoop.These modes are,
- Local mode
- Map reduce mode
5) What is the use of Hcatalog?
Hcatalog can be used to share data structures with external systems. Hcatalog provides access to hive metastore to users of other tools on Hadoop so that they can read and write data to hive’s data warehouse.6) What are the differences between Hive and HBase?
- Hive enables most of the SQL queries, but HBase does not allow SQL queries
- Hive does not support record level insert, update, and delete operations on table
- Hive is a data warehouse framework whereas HBase is NoSQL database
- Hive run on the top of MapReduce, HBase runs on the top of HDFS
7) Where is table data stored in Apache Hive by default?
hdfs://namenode_server/user/hive/warehouse
8) Write a hive query to view all the databases whose name begins with "db"
hive> SHOW DATABASES LIKE 'db.*';
9) Write a query to rename a table Student to Student_2.
hive> Alter Table Student RENAME to Student_2;
10) How to create an index on a table in Hive?
hive> CREATE INDEX index_salary ON TABLE employee (salary)
AS 'org.apache.hadoop.hive.ql.index.compact.CompactIndexHandler';
The above query creates an index named index_salary which points to the salary column in the employee table.
11) How to delete the above index named index_salary?
DROP INDEX index_salary ON employee;
12) How to see the present working directory in UNIX from hive. Is it possible to run this command from hive?
Hive allows execution of UNIX commands with the use of exclamatory (!) symbol. Just use the ! Symbol before the command to be executed at the hive prompt. To see the present working directory in UNIX from hive run !pwd at the hive prompt.
hive> !pwd
Top 10 PIG interview questions
12:29 PM Admin
1. What is PIG script?
2. Write the skeleton of a pig script.
3. What is the difference between STORE and DUMP command?
4. What is the use of FILTER in PIG?
5. Can you use joins in PIG?
6. Can you have multiple inputs to a pig script?
7. What is the use of UDF?
8. What is the use of GROUP BY in PIG?
9. What is the use of UNION in PIG?
10. What is a touple?
2. Write the skeleton of a pig script.
3. What is the difference between STORE and DUMP command?
4. What is the use of FILTER in PIG?
5. Can you use joins in PIG?
6. Can you have multiple inputs to a pig script?
7. What is the use of UDF?
8. What is the use of GROUP BY in PIG?
9. What is the use of UNION in PIG?
10. What is a touple?
Top 10 Hadoop interview questions
12:28 PM Admin
1. What is Big Data?
2. What is Hadoop?
3. What are the components of a Hadoop Cluster?
4. What is single point of failure in Hadoop and why?
5. List down the functions of NameNode.
6. What is the function of Job Tracker?
7. What are the phases in a Map Reduce Job Processing?
8. How do you copy data from local to cluster and vice versa?
9. What are the different file formats in Hadoop?
10. What do you mean by decommission a data node in Hadoop?
2. What is Hadoop?
3. What are the components of a Hadoop Cluster?
4. What is single point of failure in Hadoop and why?
5. List down the functions of NameNode.
6. What is the function of Job Tracker?
7. What are the phases in a Map Reduce Job Processing?
8. How do you copy data from local to cluster and vice versa?
9. What are the different file formats in Hadoop?
10. What do you mean by decommission a data node in Hadoop?
Subscribe to:
Posts (Atom)