DistCp gets stuck with build listing
One elephant keeper tells me, his DistCp between two HDFS clusters A and B works for push from A, but fails to pull from B. Both A and B are secure clusters with Kerberos enabled. The DistCP just gets stuck forever. He gets the console log of the DistCp process as following:
INFO tools.DistCp: Input Options: DistCpOptions ooxx
The checklist for this kind of DistCp issues I’ve done is:
- The first is to check if HDFS service per se is tightly related: is cp working? Try
hadoop fs cp clusterA://file clusterB://file. If this is not working, probably the HDFS is not working as expected. There may be many reasons including bad configurations (don’t laugh; I am always suspicious to my dear customers’ configurations), bad NameNode service state, bad networking gateway, bad Kerberos settings. My support told me,
hadoop fs -cpworks for both pull and push between both clusters.
- My second question is about the DistCp options. I have some cases where wired problems came up when copying preserved data, authentication fallback to simple problems from secure to insecure cluster, copying from HDFS to S3, running with diff option but snapshots were not correctly synced, and many more. My support told me it’s not using any options. Okey dokey.
The equally import comment is to get the DistCp progress aka internal state: where is it stuck? Was the MapReduce jobs submitted to YARN? How was the application log? My elephant keeper then told me, sorry it’s not about YARN, because the DistCp has not submitted the job yet. And from the
jstack, the main thread is like:
"main" prio=10 tid=0xooxx nid=0xooxx runnable [0xooxx]
at sun.reflect.Reflection.getCallerClass(Native Method)
jstackI asked him to monitor it periodically. We then found that the top of the stack are changing, but all are withing the
SortPass.runmethod. Hmmmm, how come? DistCp is not able to sort the input file list? As it is not stuck with some deadlock, or some infinite waiting, we know there must be a loop that is running forever. But we only have one file in the list! Maybe there is something wrong to process even a small sequence file for MapReduce.
- If there is anything wrong with MapReduce, we should check the configurations first.
- If all the configuration works, we should run some sample MapReduce tasks to process sequetial files.
After reviewing the configurations including
mapred-site.xml, my elephant keeper found the config
mapreduce.task.io.sort.mb value was added recently as 4096. The meaning of the configure is “the total amount of buffer memory to use while sorting files, in megabytes”. Someone might have set that value globally for applications that use a large amount of per-task memory. After deleting that configuration (and using its default value 100), DistCp works.
What is the root cause analysis (RCA)?
MapTask.MapOutputBuffer implements its buffer with a byte of size
mapreduce.task.io.sort.mb in bytes, and is sanity checking the size before allocating the array. The problem is that Java arrays can’t have more than
2^31 - 1 elements (even with a 64-bit JVM), which is a limitation of the Java language specificiation itself. This means the config
mapreduce.task.io.sort.mb is actually limited to <= 2047. Specifying a value of 4096 will not help the applications of large per-task memory, instead, it will break all MapReduce applications. There is a related discussion.
So, misconfiguration, right? Unfortunately, this secret limit is not documented (please correct me if I missed anything), and the DistCp will not fail fast because of this. It keeps retrying the sort pass in an infinite internal loop at
SortPass.run() method and gets stuck to users. Perhaps we can file a JIRA to address this.