Now a
day interviewers not asking direct question maximum they are following real
time issue wise scenarios.
1 1. I have 4mb hive jar file and that is set
crontab to run daily bases. After 6 month we got a
disk space issue while running the job. But in sever disk space available.
disk space issue while running the job. But in sever disk space available.
Where we are getting disk space
issue and how to resolve that.?
Ans:
This
might be the Heap size Error (Expected Reason).
This can be avoid by
1. Providing maximum heap size to tasks.
2. Clear the executor logs on periodic basis along with the job execution (with help of Shell Script & CronTab).
This can be avoid by
1. Providing maximum heap size to tasks.
set mapreduce.map.memory.mb=5120;
set mapreduce.map.java.opts=- Xmx4608m;
set
mapreduce.reduce.memory.mb= 5120;
set mapreduce.reduce.java. opts=-Xmx4608m;
2. Clear the executor logs on periodic basis along with the job execution (with help of Shell Script & CronTab).
OR
I guess if its HDFS, one possible answer could
be space/quota allocated to a folder/user exhausted while space in
cluster/server is available
or some tmp/staging
space is not cleaned up.
2 What are the complex
data types in PIG?
Complex Types: Pig supports three
complex data types. They are listed below:
Tuple : An ordered set
of fields. Tuple is represented by braces. Example: (1,2)
Bag : A set of tuples is called a bag. Bag is represented by
flower or curly braces. Example: {(1,2),(3,4)}
Map : A set of key value pairs. Map is represented in a square
brackets. Example: [key#value] . The # is used to separate key and value.
3 How
to pass data run time in Hive query crontab ?
Like
Pig and other scripting languages, Hive provides you with the ability to create
parameterized scripts – greatly increasing the re-usability of the
scripts. To take advantage, write your Hive scripts like this:
select yearid, sum(HR)
from batting_stats
where teamid = '${hiveconf:TEAMID}'
group by yearid
order by yearid desc;
Note
that the restriction on teamid is ‘${hiveconf:TEAMID}’ rather than an actual
value. This is an instruction to read this variable’s value from the
hiveconf namespace. When you execute the script, you’ll run it as shown
below:
hive -f batting.hive -hiveconf TEAMID='LAA'
If
you define the parameter in the script but fail to specify a value at run-time,
you won’t get any error like you would with Pig. Instead, the restriction
effectively becomes “where teamid = ””. If you have blanks then you might
get a result back; if not, you’ll go through all the necessary mechanics of
executing the script sans the results.
3 comments
I have seen a lot of blogs and Info. on other Blogs and Web sites But in this Hadoop Blog Information is useful very thanks for sharing it........
Good Post..Thanks for sharing such a wonderful article. Hadoop training in Hyderabad
Today there is a great hype among the youth because Hadoop is considered as a highly available storage and processing power which is being drawn by many organizations.
Hi,Great post! I am actually getting ready to across this information about BIG Data, It’s very helpful for this blog. Also great with all of the valuable information you have Keep up the good work you are doing well.
Salesforce Training in Chennai
Salesforce Online Training in Chennai
Salesforce Training in Bangalore
Salesforce Training in Hyderabad
salesforce training in ameerpet
Salesforce Training in Pune
Salesforce Online Training
Salesforce Training
EmoticonEmoticon