Hadoop Tutorial

Hadoop Tutorial Hadoop MapReduce is a software framework for writing applications that process vast amounts of data (multi-petabyte datasets) in parallel on large clusters consisting of thousands of nodes of commodity hardware in a reliable, fault-tolerant manner. via Overview.

Hadoop Tutorial – YDN

Introduction Welcome to the Yahoo! Hadoop tutorial! This series of tutorial documents will walk you through many aspects of the Apache Hadoop system. You will be shown how to set up simple and advanced cluster configurations, use the distributed file system, and develop complex Hadoop MapReduce applications. Other related systems are also reviewed. via Hadoop …

More

Disco MapReduce

Disco is a lightweight, open-source framework for distributed computing based on the MapReduce paradigm. Disco is powerful and easy to use, thanks to Python. Disco distributes and replicates your data, and schedules your jobs efficiently. Disco even includes the tools you need to index billions of data points and query them in real-time. Disco was …

More