Managing Hadoop and Accumulo with Clojure
The role of data engineer becomes more important as Hadoop clusters grow and multiply in the Enterprise. Current scripting languages used to manipulate data, files, and computer systems such as bash, python, and ruby are not easily integrated into the java-based Hadoop ecosystem, are not performant, and do not easily scale. Clojure offers data engineers a better way to manage data sets within single and across multiple clusters that is natural, composable, and concurrent. The homoiconic nature of Clojure makes it a seamless match for data manipulation in the cloud. The interactive development environment of the Clojure REPL can be turned into a distributed command line for efficiently manipulating HDFS, Accumulo, and other Hadoop software components. How to interact with data in HDFS, Accumulo and legacy RDBMS with Clojure will be demonstrated.