So I asked an elephant keeper, for a too many under-replicated blocks problem, please check the NameNode status via JMX. He found the JMX returned too verbose, and he was not sure what were the most important JMX metrics. I remember the old naive days when myself just started working on HDFS and troubleshooting NameNode issues, I was also wondering which JMX metrics were majory or general. I don’t keep my list secret so here it is.

  • CallQueueLength
  • CapacityRemainingGB
  • CapacityTotalGB
  • CapacityUsedGB
  • DeadNodes
  • MemHeapMaxM
  • MemHeapUsedM
  • NumDecomLiveDataNodes
  • NumLiveDataNodes
  • NumberOfMissingBlocks
  • PendingDeletionBlocks
  • PendingReplicationBlocks
  • RpcQueueTimeAvgTime
  • Safemode
  • TotalBlocks
  • TotalFiles
  • UnderReplicatedBlocks
  • UpgradeFinalized

Full list is at Hadoop official documentation website. There is a good article about how to collect Hadoop metrics. Please refer to that.