Home » Software Architecture

Cascading through Hadoop: A DSL for Simpler MapReduce

23 April 2012 No Comment

Hadoop is a MapReduce framework that has literally sprung into the vernacular of “big data” developers everywhere. But coding to the raw Hadoop APIs can be a real chore. Data analysts can express what they want in more English-like vocabularies, but it seems the Hadoop APIs require us to be the translator to a less comprehensible functional and data-centric DSL.

The Cascading framework gives developers a convenient higher level abstraction for querying and scheduling complex jobs on a Hadoop cluster. Programmers can think more holistically about the questions being asked of the data and the flow that such data will take without concern for the minutia.

This video discusses how to set up, code to, and leverage the Cascading API on top of a Hadoop cluster for a more effective way to code MapReduce applications all while being able to think in a more natural (less than fully MapReduce) way. During this presentation, we’ll also explore Cascading’s Clojure-based derivative, Cascalog, and how functional programming paradigms and language syntax are emerging as the next important step in big-data thinking and processing.

Video Producer: JavaZone Conference

Related Videos:

Comments:

Add your comment below, or trackback from your own site. You can also subscribe to these comments via RSS.

Be nice. Keep it clean. Stay on topic. No spam.

You can use these tags:
<a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

This is a Gravatar-enabled weblog. To get your own globally-recognized-avatar, please register at Gravatar.

*