Coordinating Metadata Replication: Survival Strategy for Distributed Systems.
- Apache Hadoop: Foundations of Scalability.
- Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation.
- Apache HDFS: Distributed Storage for Vast Quantities of Data. (Podcast)
- HDFS Design Principles and the Scale-out-Ability of Distributed Storage.
- Apache Hadoop 0.22 and Other Versions.
- Automatic-Hot HA for HDFS NameNode.
- Hadoop Gateway: Cluster Virtualization Framework.
- Distributed Computing with Apache Hadoop. Introduction to MapReduce.
- Distributed Computing with Apache Hadoop. Technology Overview.
- Scaling Storage and Computation with Hadoop. (Video) in russian.
- Scalability of the Hadoop Distributed File System.
- Scaling Hadoop to 4000 nodes at Yahoo!
- The Hadoop Distributed File System requirements.
- Coordinated replication of the namespace using ConsensusNode. HDFS-6469.
- Introduce Coordination Engine. HADOOP-10641.
- Hadoop release 0.22.0 available.
- Warm HA NameNode going Hot. HDFS-2064.
- Stress Test and Live Data Verification (S-Live) design. HDFS-708.
- Sequential generation of block ids. HDFS-898.
- Appending to an HDFS file. HDFS-265.
- BackupNode maintains the up-to-date state of the namespace by receiving edits from the NameNode. HADOOP-4539.
- DFSIO - a MapReduce based benchmark to measures performance of writes, appends, and sequential and random reads.
MAPREDUCE-4651, HDFS-663. HADOOP-193,
Slot utilization measures the actual job load on a map-reduce cluster and characterizes the overall cluster productivity.
The utilization is measured by analysing job history logs. HDFS-459.
- File size distribution analysis. HDFS-461.
- Quadruple memory size reduction for the name-node by
redesigning memory data structures HADOOP-1687.
and removing checksum files from the name-node. HADOOP-1134.
- Distributed cluster upgrade framework. HADOOP-1286.
- Faster cluster startup. HADOOP-3022.
File system snapshots.
A snapshot of the previous state of the file system is taken during software upgrades in order to avoid data loss caused by software bugs or administrators mistakes. HADOOP-702.
- NNThroughputBenchmark - a pure name-node benchmark.
- Chain reaction caused by simultaneous failure of a few DataNodes.
- Safe mode is a read-only state of the name-node. HADOOP-306.
- Integrity of HDFS cluster components. HADOOP-124.