Big Data: Principles and best practices of scalable realtime data systems

Big Data: Principles and best practices of scalable realtime data systems
By Nathan Marz, James Warren

List Price: $49.99
Price: $34.83 Details

Availability: Usually ships in 24 hours
Ships from and sold by Amazon.com

72 new or used available from $24.32

Average customer review:
(34 customer reviews)

Product Description

Summary

Big Data teaches you to build big data systems using an architecture that takes advantage of clustered hardware along with new tools designed specifically to capture and analyze web-scale data. It describes a scalable, easy-to-understand approach to big data systems that can be built and run by a small team. Following a realistic example, this book guides readers through the theory of big data systems, how to implement them in practice, and how to deploy and operate them once they're built.

Purchase of the print book includes a free eBook in PDF, Kindle, and ePub formats from Manning Publications.

About the Book

Web-scale applications like social networks, real-time analytics, or e-commerce sites deal with a lot of data, whose volume and velocity exceed the limits of traditional database systems. These applications require architectures built around clusters of machines to store and process data of any size, or speed. Fortunately, scale and simplicity are not mutually exclusive.

Big Data teaches you to build big data systems using an architecture designed specifically to capture and analyze web-scale data. This book presents the Lambda Architecture, a scalable, easy-to-understand approach that can be built and run by a small team. You'll explore the theory of big data systems and how to implement them in practice. In addition to discovering a general framework for processing big data, you'll learn specific technologies like Hadoop, Storm, and NoSQL databases.

This book requires no previous exposure to large-scale data analysis or NoSQL tools. Familiarity with traditional databases is helpful.

What's Inside

  • Introduction to big data systems
  • Real-time processing of web-scale data
  • Tools like Hadoop, Cassandra, and Storm
  • Extensions to traditional database skills

About the Authors

Nathan Marz is the creator of Apache Storm and the originator of the Lambda Architecture for big data systems. James Warren is an analytics architect with a background in machine learning and scientific computing.

Table of Contents

  1. A new paradigm for Big Data
  2. PART 1 BATCH LAYER
  3. Data model for Big Data
  4. Data model for Big Data: Illustration
  5. Data storage on the batch layer
  6. Data storage on the batch layer: Illustration
  7. Batch layer
  8. Batch layer: Illustration
  9. An example batch layer: Architecture and algorithms
  10. An example batch layer: Implementation
  11. PART 2 SERVING LAYER
  12. Serving layer
  13. Serving layer: Illustration
  14. PART 3 SPEED LAYER
  15. Realtime views
  16. Realtime views: Illustration
  17. Queuing and stream processing
  18. Queuing and stream processing: Illustration
  19. Micro-batch stream processing
  20. Micro-batch stream processing: Illustration
  21. Lambda Architecture in depth


Product Details

  • Amazon Sales Rank: #148205 in Books
  • Published on: 2015-05-10
  • Original language: English
  • Number of items: 1
  • Dimensions: 9.10" h x .60" w x 7.30" l, .0 pounds
  • Binding: Paperback
  • 328 pages

Editorial Reviews

About the Author

Nathan Marz is currently working on a new startup. Previously, he was the lead engineer at BackType before being acquired by Twitter in 2011. At Twitter, he started the streaming compute team which provides and develops shared infrastructure to support many critical realtime applications throughout the company. Nathan is the creator of Cascalog and Storm, open-source projects which are relied upon by over 50 companies around the world, including Yahoo!, Twitter, Groupon, The Weather Channel, Taobao, and many more companies.

James Warren is an analytics architect at Storm8 with a background in big data processing, machine learning and scientific computing.


Customer Reviews

Most helpful customer reviews

4 of 4 people found the following review helpful.
5Written by a specialis
By Dimitri K
This book is written by a specialist in big data. I know that because I worked on the big data pipeline. And now I read the book and I see that all my problems are addressed in this book. Virtually every problem discussed appeared in my pipeline too, as if the author worked with me on my project.

The other very useful for me feature of this book is that it is the first book where I could find a concise explanation of Storm Trident framework, even though the book is not about Storm.

3 of 3 people found the following review helpful.
5If you are looking for a survey of different approaches ...
By Amazon Customer
If you are looking for a survey of different approaches of handling big data, you want to read "ELEMENTS OF SCALE: COMPOSING AND SCALING DATA PLATFORMS". ([...]) This book is dedicated to Lambda Architecture (one that is surveyed in the above article.)

The book is very organized. Introduction in chapter 1 will be the road map of the whole book. Motivating with a simple web application based on RDBMS, the author showed how the approach to scale it becomes undesirable. After enumerating a list of desired properties, he proposed Lambda architecture, an approach in contrast to fully incremental architecture (with RDBMS).

The Lambda architecture is partitioned into three layers:
1. batch layer that computes different views on big data
2. serving layer that answers user queries using views from the batch layer and speed layer.
3. speed layer that compensates an approximate answer over a period time when the batch layer is working on the complete answers.

In the remaining chapters, the author dive deep into the rationale and requirements of all the different pieces of Lambda Architecture.

To under the context of Lambda Architecture, also refer to the wikipedia for crticism.

1 of 1 people found the following review helpful.
3Lambda
By Robert
Good theoretical review of Big Data architecture. Not so great for implementation details using current frameworks.

See all 34 customer reviews...

Connect with defaultLogic
What We've Done
Led Digital Marketing Efforts of Top 500 e-Retailers.
Worked with Top Brands at Leading Agencies.
Successfully Managed Over $50 million in Digital Ad Spend.
Developed Strategies and Processes that Enabled Brands to Grow During an Economic Downturn.
Taught Advanced Internet Marketing Strategies at the graduate level.


Manage research, learning and skills at defaultLogic. Create an account using LinkedIn or facebook to manage and organize your IT knowledge. defaultLogic works like a shopping cart for information -- helping you to save, discuss and share.


  Contact Us