Hadoop in Practice, 2nd Edition
|File size||9.9 MB|
Hadoop in Practice, Second Edition provides over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. Brand new chapters cover YARN and integrating Kafka, Impala, and Spark SQL with Hadoop. You'll also get new and updated techniques for Flume, Sqoop...
Beginning Big Data with Power BI and Excel 2013
|File size||20.9 MB|
In Beginning Big Data with Power BI and Excel 2013, you will learn to solve business problems by tapping the power of Microsoft's Excel and Power BI to import data from NoSQL and SQL databases and other sources, create relational data models, and analyze business problems through sophisticated dashboards and data-driven maps. While Beginning Big Data with Power BI and Excel 2013 covers prom...
Enterprise Data Workflows with Cascading
|File size||12.7 MB|
There is an easier way to build Hadoop applications. With this hands-on book, you'll learn how to use Cascading, the open source abstraction framework for Hadoop that lets you easily create and manage powerful enterprise-grade data processing applications - without having to learn the intricacies of MapReduce. Working with sample apps based on Java and other JVM languages, you'll quickly le...
|File size||7.7 MB|
You've heard the hype about Hadoop: it runs petabyte - scale data mining tasks insanely fast, it runs gigantic tasks on clouds for absurdly cheap, it's been heavily committed to by tech giants like IBM, Yahoo!, and the Apache Project, and it's completely open-source. But what exactly is it, and more importantly, how do you even get a Hadoop cluster up and running? From Apress, the name you'...
Fast Data Processing with Spark
|File size||11.0 MB|
Spark is a framework for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and inbuilt tools for interactive query analysis (Shark), large-scale graph processing and analysis (Bagel), and real-time analysis (Spark Streaming), it can be interactivel...
Programming Elastic MapReduce
|File size||19.2 MB|
Although you don't need a large computing infrastructure to process massive amounts of data with Apache Hadoop, it can still be difficult to get started. This practical guide shows you how to quickly launch data analysis projects in the cloud by using Amazon Elastic MapReduce (EMR), the hosted Hadoop framework in Amazon Web Services (AWS). Authors Kevin Schmidt and Christopher Phillips demo...
Big Data Analytics with R and Hadoop
|File size||3.6 MB|
Big data analytics is the process of examining large amounts of data of a variety of types to uncover hidden patterns, unknown correlations, and other useful information. Such information can provide competitive advantages over rival organizations and result in business benefits, such as more effective marketing and increased revenue. New methods of working with big data, such as Hadoop and MapRed...
Microsoft SQL Server 2012 with Hadoop
|File size||2.8 MB|
With the explosion of data, the open source Apache Hadoop ecosystem is gaining traction, thanks to its huge ecosystem that has arisen around the core functionalities of its distributed file system (HDFS) and Map Reduce. As of today, being able to have SQL Server talking to Hadoop has become increasingly important because the two are indeed complementary. While petabytes of unstructured data can be...
Learning Apache Mahout
|File size||13.8 MB|
In the past few years the generation of data and our capability to store and process it has grown exponentially. There is a need for scalable analytics frameworks and people with the right skills to get the information needed from this Big Data. Apache Mahout is one of the first and most prominent Big Data machine learning platforms. It implements machine learning algorithms on top of distributed ...
Fast Data Processing with Spark, 2nd Edition
|File size||14.2 MB|
Spark is a framework used for writing fast, distributed programs. Spark solves similar problems as Hadoop MapReduce does, but with a fast in-memory approach and a clean functional style API. With its ability to integrate with Hadoop and built-in tools for interactive query analysis (Spark SQL), large-scale graph processing and analysis (GraphX), and real-time analysis (Spark Streaming), it can be ...