Presentations
- Hadoop and HBase on the Cloud: A Case Study on Performance and Isolation.
- Apache HDFS: Distributed Storage for Vast Quantities of Data. (Podcast)
- HDFS Design Principles and the Scale-out-Ability of Distributed Storage.
- Apache Hadoop 0.22 and Other Versions.
- Automatic-Hot HA for HDFS NameNode.
- Hadoop Gateway: Cluster Virtualization Framework.
- Distributed Computing with Apache Hadoop. Introduction to MapReduce.
- Distributed Computing with Apache Hadoop. Technology Overview.
- Scaling Storage and Computation with Hadoop. (Video) in russian.
HDFS texts
- Scalability of the Hadoop Distributed File System.
- Scaling Hadoop to 4000 nodes at Yahoo!
- The Hadoop Distributed File System requirements.
Favorite Issues
- Hadoop release 0.22.0 available.
- Warm HA NameNode going Hot. HDFS-2064.
- Stress Test and Live Data Verification (S-Live) design. HDFS-708.
- Sequential generation of block ids. HDFS-898.
- Appending to an HDFS file. HDFS-265.
- BackupNode maintains the up-to-date state of the namespace by receiving edits from the NameNode. HADOOP-4539.
- DFSIO - a MapReduce based benchmark to measures performance of writes, appends, and sequential and random reads.
MAPREDUCE-4651, HDFS-663. HADOOP-193, -
Slot utilization measures the actual job load on a map-reduce cluster and characterizes the overall cluster productivity.
The utilization is measured by analysing job history logs. HDFS-459. - File size distribution analysis. HDFS-461.
- Quadruple memory size reduction for the name-node by
redesigning memory data structures HADOOP-1687.
and removing checksum files from the name-node. HADOOP-1134. - Distributed cluster upgrade framework. HADOOP-1286.
- Faster cluster startup. HADOOP-3022.
-
File system snapshots.
A snapshot of the previous state of the file system is taken during software upgrades in order to avoid data loss caused by software bugs or administrators mistakes. HADOOP-702.
- NNThroughputBenchmark - a pure name-node benchmark.
HADOOP-2149, HADOOP-3860. - Chain reaction caused by simultaneous failure of a few DataNodes.
HADOOP-572. - Safe mode is a read-only state of the name-node. HADOOP-306.
- Integrity of HDFS cluster components. HADOOP-124.
