Download PDF

Ahmed Radwan

a.aboelela@gmail.com


INTERESTS

I have a diversity of interests focused around areas of cloud computing, data/metadata management, semantics and data integration, with emphasis on using advancements in these areas to build solutions that are useful for customers.


 

WORK EXPERIENCE

Cloudera Inc., Nov. 2010 – present

Palo Alto, CA

Senior R&D Engineer – Platforms

  1. Designed and developed solutions for efficient transfer and processing of massive amounts of structured and unstructured data on distributed/cloud computing environments. This work involved challenging problems in terms of devising efficient techniques for data extraction/loading using optimized import/export interfaces supported by the different databases and Enterprise Data Warehouses.  Other challenges included the metadata management and data integration problems across such autonomous distributed systems, and the analysis and optimization of the performance and scalability of such solutions. Investigating techniques for better resources utilization for MapReduce jobs in terms of resource management and scheduling strategies.
  1. Contributions to various open-source projects including Apache Hadoop Common, MapReduce and Yarn, Apache Sqoop and Apache Flume.
  1. Apache Sqoop committer and member of the Project Management Committee (PMC). Sqoop is a open-source tool designed for efficiently transferring bulk data between Hadoop and structured datastores such as relational databases.
  2. Apache Flume committer and PMC member. Flume is an open-source distributed, reliable, and available service for efficiently collecting, aggregating, and moving large amounts of log data.

Yahoo! Inc., Nov. 2008 – Nov. 2010

Sunnyvale, CA

Senior Software Engineer – Cloud Computing

  1. Building solutions for managing massive amounts of structured and unstructured data on distributed/cloud computing environments. Design of efficient models and techniques for metadata management, data processing and performance optimization on the cloud.
  2. Conceived, designed and led the development of MapReduce-Legos; a data processing abstraction layer on top of Hadoop MapReduce; this layer provides a refined model for MapReduce jobs, enabling an optimized way of describing and running Extract Transform and Load (ETL) workflows on Hadoop MapReduce clusters. The project is used by production systems to process petabytes of data on daily basis at Yahoo! Inc. The work was also published and demoed in a number of internal Yahoo! conferences. A peer-reviewed article describing this achievement was published in the International Journal of Cloud Computing.
  3. Designed and developed a declarative SQL query engine on top of Hadoop MapReduce and distributed file system.

IBM Research, May – Aug. 2006, May – Aug. 2007

San Jose, CA

Graduate Research Intern - Information Integration (IBM Almaden Research Center).

  1. Designed a novel similarity measure and top-k enumeration algorithm used to quantify the distance between schema concepts in the schema integration problem to efficiently calculate the best “k” candidate integrated schemas. This work has significant importance in metadata management in cloud computing systems as it facilitates the process of federating data from multiple autonomous data sources and generating a unified non-redundant representation of the data. This work had led to the publication of an article in the ACM SIGMOD conference.
  2. Studied the problem of expressing Extract, Transform and Load (ETL) dataflows using declarative mapping semantics, and vice versa. This work is adopted by IBM and is being productized as the FastTrack component of IBM Information Server. These contributions led to the submission of a patent disclosure describing this work. The work was also demoed in the IBM Information On Demand (IOD) conference and an article detailing the work was published in the IEEE ICDE.

 

University of Miami, Jan 2005 – Oct. 2008

Miami, FL

Research/Teaching Assistant – Electrical and Computer Engineering

  1. Conducting research on information integration in a grid environment with applications on bioinformatics. Designed a web services-based data federation architecture for bioinformatics applications. The system is called “Biofederator” and was awarded the prestigious IBM Faculty Award in 2009. Based on collaborations with bioinformatics researchers, several domain-specific data federation challenges and needs are identified. The BioFederator addresses such challenges and provides an architecture that incorporates a series of utility services. These address issues like automatic workflow composition, domain semantics, and the distributed nature of the data. It also incorporates a series of data-oriented services that facilitate the actual integration of data. The BioFederator is deployed on a grid environment over the web. An article describing the design was published in the AAAI IIWeb; additional details and applications were presented in a book chapter published by IOS Press.
  2. Studied and developed novel data integration and processing techniques for data intensive applications, this work was applied to a bioscience study where, for the first time, we presented a whole genome prediction of nucleosome exclusion regions for the human genome. The details of this work were published in an article in the BMC Genomics journal in 2008 and featured in the 57th Annual Meeting of the American Society of Human Genetics (ASHG). Also the output results were made available to the scientific community as part of the University of California at Santa Cruz (UCSC) Genome Browser custom data tracks.
  3. Research team member of the Latin American Grid (LAGrid-BioGrid) project; BioGrid is addressing research issues for enabling grid computing technologies in bioinformatics applications. My work focused on: studying, designing and developing Grid/Web services for bioinformatics applications.

Electronics Research Institute (ERI), Feb. 1999 – Aug. 2004

Cairo, Egypt

Researcher – Electrical and Computer Engineering

  1. Member of the Parallel Processing Team and team leader for a number of projects sponsored by the National Science Foundation (NSF) U.S.A. and the European Union.                                         
  1. Designed parallel/distributed texture segmentation techniques that were applied on real-time distributed surface inspection systems. I have discovered simple, elegant, and yet very powerful and useful parallel algorithms that advanced the real-time distributed computing inspection systems. These scientific contributions led to publishing an article in the IEEE SMC conference, and another detailed article in Elsevier PRL journal.

MentorGraphics Corporation, Jul. 2001 – Jul. 2002

Cairo, Egypt

R&D Engineer – Modeling and Interconnectix (ICX)

  1. Studying problems in Electronics Design Automation (EDA) and designing and developing EDA tools and simulation packages, these tools were used in The IBIS to Spice converter, to generate SPICE models from IBIS data sheet files. My studies and designs enhanced the modeling process in terms of time and accuracy.


PUBLICATIONS

Peer-reviewed Book Chapters:

  1. Rosa Badia, Gargi Dasgupta, Onyeka Ezenwoye, Liana Fong, Howard Ho, Sawsan Khuri, Yanbin Liu, Steve Luis, Anthony Praino, Jean-Pierre Prost, Ahmed Radwan, Seyed Masoud Sadjadi, Shivkumar Shivaji, Balaji Viswanathan, Patrick Welsh, and Akmal Younis, "High Performance Computing and Grids in Action, chapter Innovative Grid Technologies Applied to Bioinformatics and Hurricane Mitigation," pp. 436-462, IOS Press, ISBN 978-1-58603-839-7, Amsterdam, 2008.

Peer-reviewed Articles in Journals:

  1. Ahmed Radwan, Akmal Younis, Santhosh Srinivasan and Abhay Gupta, “MR-LEGOS: A Refined MapReduce Model,” International Journal of Cloud Computing  (IJCC) 1(1), 2011, pp. 58-80.
  1. Ahmed Radwan, Akmal Younis, Peter Luykx and Sawsan Khuri, "Prediction and analysis of nucleosome exclusion regions in the human genome," BMC Genomics, 2008, pp. 9:186.
  1. Ahmed Abouelela Radwan, Hazem M. Abbas, Hesham Eldeeb, Abdelmonem A. Wahdan and Salwa M. Nassar, "Automated Vision System for Localizing Structural Defects in Textile Fabrics," Elsevier Pattern Recognition Letters, 26, 2005, pp. 1435-1443.

Peer-reviewed Articles in Conferences:

  1. Ahmed Radwan, Lucian Popa, Ioana Roxana Stanoi, Akmal A. Younis, “Top-k generation of integrated schemas based on directed and weighted correspondences,” ACM SIGMOD Conference, 2009, pp. 641-654.
  1. Stefan Dessloch, Mauricio A. Hernandez, Ryan Wisnesky, Ahmed Radwan, Jindan Zhou, “Orchid:Integrating Schema Mapping and ETL,” IEEE International Conference on Data Engineering (ICDE), 2008, pp. 1307-1316.
  1. Ahmed Radwan, Akmal Younis, Mauricio Hernandez, Howard Ho, Lucian Popa, Shivkumar Shivaji, and Sawsan Khuri, "BioFederator: A Data Federation System for Bioinformatics on the Web," Proc. AAAI Sixth Int. Workshop on Information Integration on the Web (IIWeb) 2007, pp. 92-97.
  1. A. Abouelela Radwan, H. Abbas, H. El deeb, S. Nassar, "A statistical approach for textile fault detection," Proc. IEEE conference System, Man, Cybernetics (SMC), 2000, pp. 2857-2861.

Presentations, Abstracts and Posters:

  1. Ahmed Radwan, Santhosh Srinivasan and Kalyan Ayloo, “MR-LEGOS: A Data Warehousing ETL Toolkit,” Yahoo! TechPulse conference, 2010.
  1. Ahmed Radwan, Ryota Egashira, Brian Keefe, “A MapReduce Approach for Efficient Data Extraction from Database Management Systems,” Yahoo! TechPulse conference, 2010.
  1. Ahmed Radwan and Abhay Gupta, “Lotus MapReduce Legos,” Yahoo! TechPulse conference, 2009.
  1. Sawsan Khuri, Ahmed Radwan, Peter Luykx and Akmal Younis, “Nucleosome Exclusion Regions across the Human Genome,” American Society of Human Genetics (ASHG) 57th Annual Meeting, San Diego, California, 23-27 October 2007.
  2. Ahmed Radwan, Lucian Popa and Ioana R. Stanoi, “Calculating Confidences and A Cost Function for Ranking Schema Integration Alternatives,” IBM Almaden Research Center Intern Showcase, 2007.
  1. Ahmed M. Radwan, Ryan Wisnesky, Jindan Zhou, Didier Garcia, Bo Shao, Stefan Dessloch, Mauricio A. Hernandez, Lucian Popa and Howard Ho, “Orchid: ETL ‹–› Mapping Transformation with Clio,” IBM Almaden Research Center Intern Showcase, 2006.


EDUCATION

• Doctor of Philosophy (Ph.D.) in Electrical and Computer Engineering.

Thesis Title: Information Integration in a Grid Environment - Applications in the Bioinformatics Domain. University of Miami, U.S.A, Dec. 2010,  GPA: 4.0.

• Master of Science (MS) in Electrical and Computer Engineering.

Thesis title: Image processing - Statistical Approach for Texture segmentation - An implementation on a parallel inspection system. Ain Shams University, Cairo, Egypt, Aug. 2002.

• Bachelor of Science (BS) in Electrical and Computer Engineering.

Ain Shams University, Cairo, Egypt, 1998. Graduation Project: V-CAD: An FPGA based Design Flow. (Grade: Distinction). Electronic design automation tool including a schematic capture, a VHDL netlister, an automatic test pattern generation and a PLA synthesis tool. The tool was featured in the Design Automation & Test in Europe (DATE) conference in the year 1999, and was developed using Visual C++.


HONORS

- Membership of the Eta Kappa Nu “HKN” International Honor Society for Electrical Engineers (2006 – Present).

- Membership of the Institute of Electrical and Electronics Engineering “IEEE” (2008-Present).

- Membership of the Association for Computing Machinery “ACM” (2009-Present).

- My BioFederator research work was awarded the prestigious IBM faculty award in 2009.


SERVICE TO PROFESSION

• Conference Reviewer: VLDB 2007 – The 33rd Very Large Data Bases Conference.

• Conference Reviewer: ICDE 2008 – The IEEE 24th International Conference on Data Engineering.

• Conference Reviewer: ICMT 2010 – International Conference on Model Transformation.


TECHNICAL SKILLS

• Programming using Java, C/C++, Visual C++, Pascal, Prolog, x86 assembly and network programming using sockets.

• Familiar with the following programming environments: MS Win95/2000/XP, Solaris UNIX, Red Hat Linux, PARIX (Parallel UNIX), and PVM (Parallel Virtual Machine).

• Special purpose languages: VHDL, JavaCC, Lex & Yacc, SQL, XQuery, SPARQL, PHP, JSP, and MATLAB.

• ETL and data warehousing tools (IBM Datastage).

• Eclipse, Rational Rose, UML, EMF, Apache Axis, web services, HTML, XML, RDF and OWL semantic web technologies, Hadoop MapReduce, Pig Latin.