An elephant keeper told me he was concerned about his NameNode server because some users may abuse it. To keep NameNode from suffering from too many, too hasty clients, actually we can check the following action list. If you have done all of this, your NameNode should be more reliable. Or if it’s not, you should have more related context for root cause analysis. This is also to address common potential performance problems. However, if you cluster is small or idle, no bother.

  1. Enable async HDFS audit logging as the internal synchronization of log4j causes massive contention between the call handlers.
  2. Enable dedicated service RPC port. Arpit is to always enable service RPC port in Hadoop 3.
  3. Enable FairCallQueue on client RPC port (default NN RPC port)
  4. Enable backoff on client RPC port
  5. Enable RPC caller context to track the “bad” jobs
  6. Monitoring JMX for namenode client RPC call queue length and average queue time. See my list of important NameNode JMX metrics.
  7. Run NNtop to check the list of top users of the HDFS name node and gain insight about which users are sending majority of each traffic type to the name node.

My colleague Xiaoyu Yao at Hortonworks wrote a very good article HDFS Namenode Protection Checklist in Hortonworks Community Connection for this topic, which contains some configuration details.