One elephant keeper asked me, should he be concerned if his standby NameNode hangs occasionally, from 10 seconds to 30 seconds. Sometimes he found it’s not responsive to block reports, failover requests, or other operations; fortunately the standby NN was able to recover later. Maybe there are other short hangs that he was not aware of.

The 1st simple question: is there long paused GC? Did you check your NN heap settings per this beautiful doc?

Of course, he said. The GC was fine.

OK pal. Why not take the jstack periodically when the standby NameNode hangs? And the frequent result is like following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
"Edit log tailer" #601 prio=5 os_prio=0 tid=0xooxx nid=0xooxx runnable [0xooxx]
java.lang.Thread.State: RUNNABLE
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:887)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuotaRecursively(FSImage.java:883)
at org.apache.hadoop.hdfs.server.namenode.FSImage.updateCountForQuota(FSImage.java:868)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:851)
at org.apache.hadoop.hdfs.server.namenode.FSImage.loadEdits(FSImage.java:818)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer.doTailEdits(EditLogTailer.java:232)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.doWork(EditLogTailer.java:331)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.access$200(EditLogTailer.java:284)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread$1.run(EditLogTailer.java:301)
at org.apache.hadoop.security.SecurityUtil.doAsLoginUserOrFatal(SecurityUtil.java:444)
at org.apache.hadoop.hdfs.server.namenode.ha.EditLogTailer$EditLogTailerThread.run(EditLogTailer.java:297)

So the standby NN was busy loading edit logs in FSImage#loadEdits() which calls updateCountForQuota() to recalculate and to verify quotas for the entire namespace. This method is called every minute and it’s a time consuming operation. No wonder the standby NN appears to hung for a while - they have been using quotas heavily! I told hime I was sorry. I didn’t think I could help here because the standby NN needs to update the quotas: it may be asked to transit to “active” any time and it was not able to serve requests without this information.

Wait, let’s check if someone else has complained this before. After a little search at Apache JIRA, I found HDFS-6763. From there, Yahoo! engineers reported a broader problem:

Even ANN will traverse the entire tree and update the quota each time it loads an edit segment on start-up. The only place it is needed is when a NN is transitioning to active. All other use cases of loadEdits() do not need quota to be updated: BackupImage/BackupNode, Checkpointer, EditLogTailer. Quota is used only when serving. We can simply make FSNamesystem do it once whenever it transitions to active.

That makes perfect sense to me. The fix here is quite simple: leverage what has been done in the open-source community. I backport the above fix to his HDP version, and he’s good to go. If you have your own Hadoop version and release, you probably have been basing on Apache Hadoop 2.7.3, which unfortunately does not have this fix. As Hadoop 2.8 is out, you can try it for sure.