My HDFS mover is slow

One elephant keeper told me he needed to move 7PB data into archival storage and ran into performance issues with the Mover tool. This generally was a good question because he’s using tiered storage: HOT, WARM, COLD, All_SSD, One_SSD, Lazy_Persist. I made the old balancer joke again: your Mover is supposed to run slowly in the background so take it easy. Per the doc:

It periodically scans the files in HDFS to check if the block placement satisfies the storage policy. For the blocks violating the storage policy, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Note that it always tries to move block replicas within the same node whenever possible,

Haha, he agreed it’s a nice one. However, he’s not enjoying this humor because he had to move the backup data to the archival storage ASAP so that the SSDs can be freed up for production. It took around 10 minutes to move 40GB file. As a result, moving 7PB data would cost 3 years. Oh man. Am I doing math wrongly (again)? Talk is cheap, show me the commands.

Read More

DistCp gets stuck with build listing

One elephant keeper tells me, his DistCp between two HDFS clusters A and B works for push from A, but fails to pull from B. Both A and B are secure clusters with Kerberos enabled. The DistCP just gets stuck forever. He gets the console log of the DistCp process as following:

INFO tools.DistCp: Input Options: DistCpOptions ooxx
INFO client.AHSProxy: Connecting to Application History server at ooxx
INFO hdfs.DFSClient: Created HDFS_DELEGATION_TOKEN token 256 for oo at xx
INFO security.TokenCache: Got dt for hdfs://clusterA:8020; Kind: HDFS_DELEGATION_TOKEN, Service: ...
INFO tools.SimpleCopyListing: Paths (files+dirs) cnt = 1; dirCnt = 0
INFO tools.SimpleCopyListing: Build file listing completed.

Read More

Five infrequently known HDFS commands

This is the blog post I wrote last year for Hortonworks Community Connection: The blog post talks about the five infrequently know commands to debug your HDFS cluster issues. Some of them are even forgotten by HDFS experts. The fives comamnds include debug, triggerBlockReport, verify, metasave and getconf.

Read More

Using S3Guard for Amazon S3 consistency

Today Steve, Rajesh and I co-published a blog post at The topic is about Amazon S3 consistency model challenges to Hadoop applications. Many Hadoop applications runnin in Amazon Web Services have been using S3 as the direct destination of work. The fact that the API gives it the appearance of a filesystem means that people can try to use it to replace HDFS as the destination of Hive, Spark and MapReduce queries. This is something which appears to work, albeit slowly, but which is insidiously dangerous (due to the S3 eventual consistency model which is not the same requirement as a filesystem). In that blog, we proposed S3Guard and how it helps eliminates this consistency situation.

Read More

My NameNode is vulnerable to too many clients

An elephant keeper told me he was concerned about his NameNode server because some users may abuse it. To keep NameNode from suffering from too many, too hasty clients, actually we can check the following action list. If you have done all of this, your NameNode should be more reliable. Or if it’s not, you should have more related context for root cause analysis. This is also to address common potential performance problems. However, if you cluster is small or idle, no bother.

Read More

What I talk about when I talk about NameNode JMX

So I asked an elephant keeper, for a too many under-replicated blocks problem, please check the NameNode status via JMX. He found the JMX returned too verbose, and he was not sure what were the most important JMX metrics. I remember the old naive days when myself just started working on HDFS and troubleshooting NameNode issues, I was also wondering which JMX metrics were majory or general. I don’t keep my list secret so here it is.

Read More

My NameNodes are failing over too frequently

An elephant keeper tells me his HDFS NameNodes are failing over too frequently. He’s concerned about this because it’s a sign of something wrong in the High Availability (HA) cluster.

If you don’t want failover to happen that fast, I told him, you can simply increase config key to whatever you want. Joking aside, the above is something that can help mitigate, but that would be a temporary fix rather than addressing root cause. There are several common causes of NameNode stalls and failovers. The first task is to find evidence of them happening in the NN logs.

Read More

My HDFS balancer is slow

An elephant keeper tells me his HDFS balancer is slow and he can’t sleep well at night. He asks me if I can help speed it up.

OK, by design the HDFS balancer runs slowly in background, balancing the whole cluster periodically. It’s fine to be slow, I tell him, so that it does not affect the normal cluster activities. Your users submit jobs, copy datas in and out, and operate the cluster for fun, without knowing that a balancer is running in the meantime. So go to sleep and sleep well. Don’t worry about slow balancer.

Read More

My Standby NameNode hangs from time to time

One elephant keeper asked me, should he be concerned if his standby NameNode hangs occasionally, from 10 seconds to 30 seconds. Sometimes he found it’s not responsive to block reports, failover requests, or other operations; fortunately the standby NN was able to recover later. Maybe there are other short hangs that he was not aware of.

Read More

Distcp to Amazon S3 reports FileNotFoundException

An elephant keeper told me that he was trying to copy the data from his HDFS to S3 and he saw quite a few FileNotFoundException. However, when he checked the failing files immediately from Amazon S3 web console, he was able to see them in S3 Bucket. I then kindly asked him one question: Did you use the -p option in your Distcp command line? He said, yes, ‘cause he does not want to lose the file metadata so he thought it’s a good practise to keep file attributes when copying files.

Read More