Apache Hadoop is a software framework for running applications on large clusters built of commodity hardware. Hadoop provides a distributed file-system and a parallel processing framework based on the Map-Reduce programming paradigm. I've contributed to Apache Hadoop full-time since the inception of the project in early 2006, over 5 years now. I am a long-term Hadoop Committer and a member of the Apache Hadoop Project Management Committee.
I am the Founder and Architect of the Hortonworks Inc., a software company that is helping to accelerate the development and adoption of Apache Hadoop. Hortonworks was formed by the key architects and core Hadoop committers from the Yahoo! Hadoop software engineering team in June 2011 in order to accelerate the development and adoption of Apache Hadoop. Funded by Yahoo! and Benchmark Capital, one of the preeminent technology investors, our goal is to ensure that Apache Hadoop becomes the standard platform for storing, processing, managing and analyzing big data.
Previously, I was the architect and lead of the Yahoo Hadoop Map-Reduce development team and was ultimately responsible, technically, for providing Hadoop Map-Reduce as a service for all of Yahoo - currently running on nearly 50,000 machines!
I've been a part of the Yahoo! Hadoop team from very, very early days - 5 years and pushing. I am really proud with what we have accomplished (we've gone from a single 20-node cluster to lots of clusters with ~4000 nodes each!), and excited to be really pushing the envelope further.
My responsibilities include architecture, planning, interfacing with everyone connected to Hadoop i.e. Platforms like Pig, QA, Operations, Program/Product & Solutions teams, Capacity planning for Hadoop grids i.e. hardware and queue/user management for Yahoo Hadoop grids.
I'm responsible for every bit of Hadoop Map-Reduce code and configuration which hits any of 50,000 of machines at Yahoo. I manage to eke out the time to hack on Hadoop too! It's a heady life!
I'm currently leading the effort to build the Next Generation Hadoop Map-Reduce framework.
I jointly hold the world data-sorting records (2009) by using Hadoop Map-Reduce: http://sortbenchmark.org/.
I've been a contributing to several open-source projects - Apache Hadoop, PHP (APC), Apache Pig etc.
I love challenging myself, being around smart people and learning from them. I've been in love with programming since early schoolboy days and can't think of anything better to do. I really enjoy the process of going from a blank-slate to a full-fleged system which accomplishes tangible, useful stuff.