Accenture hadoop interview questions

Now a day interviewers not asking direct question maximum they are following real time issue wise scenarios.

1    1.     I have 4mb hive jar file and that is set crontab to run daily bases. After 6 month we got a 
           disk space issue while running the job. But in sever disk space available.
      Where we are getting disk space issue and how to resolve that.?

Ans:
This might be the Heap size Error (Expected Reason). 
This can be avoid by 
1. Providing maximum heap size to tasks.
    set mapreduce.map.memory.mb=5120;
   set mapreduce.map.java.opts=-Xmx4608m;

   set mapreduce.reduce.memory.mb=5120;
   set mapreduce.reduce.java.opts=-Xmx4608m;

  2. Clear the executor logs on periodic basis along with the job execution (with help of Shell Script & CronTab).

OR

I guess if its HDFS, one possible answer could be space/quota allocated to a folder/user exhausted while space in cluster/server is available
or some tmp/staging space is not cleaned up.

2 What are the complex data types in PIG?


Complex Types: Pig supports three complex data types. They are listed below:

Tuple : An ordered set of fields. Tuple is represented by braces. Example: (1,2)
Bag : A set of tuples is called a bag. Bag is represented by flower or curly braces. Example: {(1,2),(3,4)}
Map : A set of key value pairs. Map is represented in a square brackets. Example: [key#value] . The # is used to separate key and value.

3      How to pass data run time in Hive query crontab ?

Like Pig and other scripting languages, Hive provides you with the ability to create parameterized scripts – greatly increasing the re-usability of the scripts.  To take advantage, write your Hive scripts like this:

select yearid, sum(HR)
from   batting_stats
where  teamid = '${hiveconf:TEAMID}'
group  by yearid
order  by yearid desc;
Note that the restriction on teamid is ‘${hiveconf:TEAMID}’ rather than an actual value.  This is an instruction to read this variable’s value from the hiveconf namespace.  When you execute the script, you’ll run it as shown below:

hive -f batting.hive -hiveconf TEAMID='LAA'
If you define the parameter in the script but fail to specify a value at run-time, you won’t get any error like you would with Pig.  Instead, the restriction effectively becomes “where teamid = ””.  If you have blanks then you might get a result back; if not, you’ll go through all the necessary mechanics of executing the script sans the results.


3 comments





I have seen a lot of blogs and Info. on other Blogs and Web sites But in this Hadoop Blog Information is useful very thanks for sharing it........

Good Post..Thanks for sharing such a wonderful article. Hadoop training in Hyderabad
Today there is a great hype among the youth because Hadoop is considered as a highly available storage and processing power which is being drawn by many organizations.

Hi,Great post! I am actually getting ready to across this information about BIG Data, It’s very helpful for this blog. Also great with all of the valuable information you have Keep up the good work you are doing well.
Salesforce Training in Chennai

Salesforce Online Training in Chennai

Salesforce Training in Bangalore

Salesforce Training in Hyderabad

salesforce training in ameerpet

Salesforce Training in Pune

Salesforce Online Training

Salesforce Training


EmoticonEmoticon