One elephant keeper tells me, his DistCp between two HDFS clusters A and B works for push from A, but fails to pull from B. Both A and B are secure clusters with Kerberos enabled. The DistCP just gets stuck forever. He gets the console log of the DistCp process as following:

1
2
3
4
5
6
INFO tools.DistCp: Input Options: DistCpOptions ooxx
INFO client.AHSProxy: Connecting to Application History server at ooxx
INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 256 for oo at xx
INFO security.TokenCache: Got dt for hdfs://clusterA:8020; Kind: HDFS_DELEGATION_TOKEN, Service: ...
INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
INFO tools.SimpleCopyListing: Build file listing completed.

The checklist for this kind of DistCp issues I’ve done is:

  1. The first is to check if HDFS service per se is tightly related: is cp working? Try hadoop fs cp clusterA://file clusterB://file. If this is not working, probably the HDFS is not working as expected. There may be many reasons including bad configurations (don’t laugh; I am always suspicious to my dear customers’ configurations), bad NameNode service state, bad networking gateway, bad Kerberos settings. My support told me, hadoop fs -cp works for both pull and push between both clusters.
  2. My second question is about the DistCp options. I have some cases where wired problems came up when copying preserved data, authentication fallback to simple problems from secure to insecure cluster, copying from HDFS to S3, running with diff option but snapshots were not correctly synced, and many more. My support told me it’s not using any options. Okey dokey.
  3. The equally import comment is to get the DistCp progress aka internal state: where is it stuck? Was the MapReduce jobs submitted to YARN? How was the application log? My elephant keeper then told me, sorry it’s not about YARN, because the DistCp has not submitted the job yet. And from the jstack, the main thread is like:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    "main" prio=10 tid=0xooxx nid=0xooxx runnable [0xooxx]
    java.lang.Thread.State: RUNNABLE
    at sun.reflect.Reflection.getCallerClass(Native Method)
    at java.lang.Class.getConstructor(Class.java:1730)
    at java.security.Provider$Service.newInstance(Provider.java:1239)
    at sun.security.jca.GetInstance.getInstance(GetInstance.java:236)
    at sun.security.jca.GetInstance.getInstance(GetInstance.java:164)
    at java.security.Security.getImpl(Security.java:695)
    at java.security.MessageDigest.getInstance(MessageDigest.java:167)
    at org.apache.hadoop.io.SequenceFile$Writer.<init>(SequenceFile.java:854)
    at org.apache.hadoop.io.SequenceFile.createWriter(SequenceFile.java:273)
    at org.apache.hadoop.io.SequenceFile$Sorter$SortPass.flush(SequenceFile.java:3024)
    at org.apache.hadoop.io.SequenceFile$Sorter$SortPass.run(SequenceFile.java:2968)
    at org.apache.hadoop.io.SequenceFile$Sorter.sortPass(SequenceFile.java:2866)
    at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2814)
    at org.apache.hadoop.io.SequenceFile$Sorter.sort(SequenceFile.java:2855)
    at org.apache.hadoop.tools.util.DistCpUtils.sortListing(DistCpUtils.java:363)
    at org.apache.hadoop.tools.CopyListing.validateFinalListing(CopyListing.java:148)
    at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:93)
    at org.apache.hadoop.tools.GlobbedCopyListing.doBuildListing(GlobbedCopyListing.java:90)
    at org.apache.hadoop.tools.CopyListing.buildListing(CopyListing.java:86)
    at org.apache.hadoop.tools.DistCp.createInputFileListing(DistCp.java:398)
    at org.apache.hadoop.tools.DistCp.createAndSubmitJob(DistCp.java:190)
    at org.apache.hadoop.tools.DistCp.execute(DistCp.java:155)
    at org.apache.hadoop.tools.DistCp.run(DistCp.java:128)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.tools.DistCp.main(DistCp.java:462)
  4. For the jstack I asked him to monitor it periodically. We then found that the top of the stack are changing, but all are withing the SortPass.run method. Hmmmm, how come? DistCp is not able to sort the input file list? As it is not stuck with some deadlock, or some infinite waiting, we know there must be a loop that is running forever. But we only have one file in the list! Maybe there is something wrong to process even a small sequence file for MapReduce.

    • If there is anything wrong with MapReduce, we should check the configurations first.
    • If all the configuration works, we should run some sample MapReduce tasks to process sequetial files.

After reviewing the configurations including core-site.xml, hdfs-site.xml and mapred-site.xml, my elephant keeper found the config mapreduce.task.io.sort.mb value was added recently as 4096. The meaning of the configure is “the total amount of buffer memory to use while sorting files, in megabytes”. Someone might have set that value globally for applications that use a large amount of per-task memory. After deleting that configuration (and using its default value 100), DistCp works.

What is the root cause analysis (RCA)? MapTask.MapOutputBuffer implements its buffer with a byte[] of size mapreduce.task.io.sort.mb in bytes, and is sanity checking the size before allocating the array. The problem is that Java arrays can’t have more than 2^31 - 1 elements (even with a 64-bit JVM), which is a limitation of the Java language specificiation itself. This means the config mapreduce.task.io.sort.mb is actually limited to <= 2047. Specifying a value of 4096 will not help the applications of large per-task memory, instead, it will break all MapReduce applications. There is a related discussion.

So, misconfiguration, right? Unfortunately, this secret limit is not documented (please correct me if I missed anything), and the DistCp will not fail fast because of this. It keeps retrying the sort pass in an infinite internal loop at SortPass.run() method and gets stuck to users. Perhaps we can file a JIRA to address this.