One elephant keeper told me he needed to move 7PB data into archival storage and ran into performance issues with the Mover tool. This generally was a good question because he’s using tiered storage: HOT, WARM, COLD, All_SSD, One_SSD, Lazy_Persist. I made the old balancer joke again: your Mover is supposed to run slowly in the background so take it easy. Per the doc:
It periodically scans the files in HDFS to check if the block placement satisfies the storage policy. For the blocks violating the storage policy, it moves the replicas to a different storage type in order to fulfill the storage policy requirement. Note that it always tries to move block replicas within the same node whenever possible,
Haha, he agreed it’s a nice one. However, he’s not enjoying this humor because he had to move the backup data to the archival storage ASAP so that the SSDs can be freed up for production. It took around 10 minutes to move 40GB file. As a result, moving 7PB data would cost 3 years. Oh man. Am I doing math wrongly (again)? Talk is cheap, show me the commands.