Banner Image

Skills

  • Amazon
  • Amazon Web Services
  • Analysis
  • Analytics
  • Apache
  • Azure
  • C
  • C++
  • Cassandra
  • Clustering
  • Collaborative Filtering
  • Data Warehousing
  • ERwin
  • Etl
  • Hadoop

Sign up or Log in to see more.

Services

  • BIG DATA

    $125/hr Starting at $95 Ongoing

    Dedicated Resource

    Richard D. Hagedorn Certifications: Summary: • 10 plus years’ experience in HADOOP both development and architecture with 6 plus years as an architect. My initial work included work at Berkeley 2003,...

    AmazonAmazon Web ServicesAnalysisAnalyticsApache

About

I am an individual but I work through my corporation on a c2c basis.

Senior Hadoop & Data Lake Architect

Certifications:
• MapR Administration
• Hortonworks 2.0 Architect certified
• Cloudera CDH 5.8 Architect/Developer for Apache Hadoop
• Hadoop/MapR certified
• Enterprise NoSQL
• IBM certified
• Teradata certified

Summary:
• 10 plus years’ experience in HADOOP both development and architecture with 6 plus years as an architect. My initial work included work at Berkeley 2003, on initial release of Hadoop and next with AWS, the Amazon Web Services, Cloudera CDH 5.8, and Hortonworks/Azure. ETL process including scripting 14+ years, Zookeeper, HMaster, HBase database, HFile, Apache: Flume (log files) 2 years, Oozie (sched. Workflow) 1 + year, Sqoop (xfers data) 3 years, Python (2.7 w/SPSS Statistics 23 ) 7 years, Dev Tools such as Spark (with Perf, & Caching) 2 years, HBase 7 years, Pig 4 years, Analysis with: Drill (SQL) 2 years, Hive (HQL) 4 years, Mahout (Clustering, Classification, Collaborative filtering) 6 mos., Performance tuning with Cassandra, Additionally C & C++, and Shell. I have extensive use of MDM tools, and Erwin and additionally power designer and IBM’s ER tool. I have extensive work on Apache Hadoop which is a highly scalable storage platform designed to process very large data sets across hundreds to thousands of computing nodes that operate in parallel. Hadoop provides a cost effective storage solution on commodity hardware for large data volumes with no format requirements. Additionally, extensive work with MapReduce, the programming paradigm that allows for this massive scalability, is the heart of Hadoop. Note that the term MapReduce actually refers to two separate and distinct tasks that Hadoop programs perform. Hadoop has two main components- HDFS and YARN.

Work Terms

I work either onsite or remote. Payment terms are net 15. I am looking for long term engagements.

Attachments (Click to Preview)