Combine the incredible powers of Spark, Mesos, Akka, Cassandra, and Kafka to build data processing platforms that can take on even the hardest of your data troubles!
* This highly practical guide shows you how to use the best of the big data technologies to solve your response-critical problems
* Learn the art of making cheap-yet-effective big data architecture without using complex Greek-letter architectures
* Use this easy-to-follow guide to build more » fast data processing systems for your organization
SMACK is an open source full stack for big data architecture. It is a combination of Spark, Mesos, Akka, Cassandra, and Kafka. This stack is the newest technique to tackle critical real-time analytics for big data. This highly practical guide will teach you how to integrate these technologies to create a highly efficient data analysis system for fast data processing.
We'll start off with an introduction to SMACK and show you when to use it. First you'll get to grips with functional thinking and problem solving using Scala. Next you'll come to understand the Akka architecture. Then you'll get to know how to improve the data structure architecture and optimize resources using Apache Spark.
Moving forward, you'll learn how to perform linear scalability in databases with Apache Cassandra. You'll grasp the high throughput distributed messaging systems using Apache Kafka. We'll show you how to build a cheap but effective cluster infrastructure with Apache Mesos. Finally, you will deep dive into the different aspects of SMACK and you'll get the chance to practice these aspects of SMACK through a few study cases.
By the end of the book, you will be able to integrate all the components of the SMACK stack and use them together to achieve highly effective and fast data processing.
You will start off with introduction to SMACK and when to use the same. In the later chapters you will be deep diving into the different aspects of SMACK. You will be starting with functional thinking and problem solving using Scala. You will understand Akka architecture. You will know how to improve the architecture and optimize resources using Apache Spark. You will learn how to make linear scalability in Databases with Apache Cassandra. You will understand the high throughput distributed messaging systems using Apache Kafka. You will learn how to build a cheap but effective cluster infrastructure with Apache Mesos. You will be able to practice these aspects of SMACk with few study cases.
By the end of the book you will be able to integrate all the components of the SMACK stack and use them together for highly effective and fast data processing.
WHAT YOU WILL LEARN
* Build an affordable yet powerful cluster infrastructure
* Make queries, reports, and graphs based on your business' demands
* Manage and exploit unstructured and No-SQL data sources
* Use tools to monitor the performance of your architecture
* Integrate all the technology to decide which one is better than the other in replacing or reinforcing « less
This books fills the need for an easy and holistic book on essential Big Data technologies. Written in a lucid and simple language free from jargon and code, this book provides an intuition for Big Data from business as well as technological perspectives. This book is designed to provide the reader with more » the intuition behind this evolving area, along with a solid toolset of the major big data processing technologies such as Hadoop, MapReduce, Spark Streaming, and NoSql databases. A complete case study of developing a web log analyzer is included. The book also contains two primers on Cloud computing and Data Mining. It also contains two tutorials on installing Hadoop and Spark. The book contains caselets from real-world stories.
Students across a variety of academic disciplines including business, computer science, statistics, engineering, and others attracted to the idea of harnessing Big Data for new insights and ideas from data, can use this as a textbook.
Professionals in various domains, including executives, managers, analysts, professors, doctors, accountants, and others can use this book to learn in a few hours how to make the most of Big Data to monitor their infrastructure, discover new insights, and develop new data-based products. It is a flowing book that one can finish in one sitting, or one can return to it again and again for insights and techniques.
Table of Contents
1.Wholeness of Big Data
2.Big Data Applications
3.Big Data Architectures
4.Distributed Systems with Hadoop
5.Parallel Programming with MapReduce
6.Advanced NoSQL databases
7.Stream programming with Spark
8.Data Ingest with Kafka
9.Cloud Computing Primer
10. Web Log Analyzer development
11.Data Mining Primer
12.Appendix 1 on Installing Hadoop on AWS cloud
13.Appendix 2 on Installing Spark « less
A Practical, Case-Study Approach
Build straightforward and maintainable APIs to create services that are usable and maintainable. Although this book focuses on distributed services, it also emphasizes how the core principles apply even to pure OOD and OOP constructs.
The overall context of Creating Maintainable APIs is to classify more » the topics into four main areas: classes and interfaces, HTTP REST APIs, messaging APIs, and message payloads (XML, JSON and JSON API as well as Apache Avro).
What You Will Learn:
* Use object-oriented design constructs and their APIs
* Create and manage HTTP REST APIs
* Create and manage maintainable messaging APIs, including the use of Apache Kafka as a principal messaging hub
* Handle message payloads via JSON
Who This Book Is For:This book is for any level software engineers and very experienced programmers. « less
Start from scratch and learn how to administer Apache Kafka effectively for messaging
Kafka is one of those systems that is very simple to describe at a high level but has an incredible depth of technical detail when you dig deeper.
Learning Apache Kafka Second Edition provides you with step-by-step, practical examples that help you take advantage of the real power of Kafka and handle more » hundreds of megabytes of messages per second from multiple clients. This book teaches you everything you need to know, right from setting up Kafka clusters to understanding basic blocks like producer, broker, and consumer blocks. Once you are all set up, you will then explore additional settings and configuration changes to achieve ever more complex goals. You will also learn how Kafka is designed internally and what configurations make it more effective. Finally, you will learn how Kafka works with other tools such as Hadoop, Storm, and so on. « less
Set up Apache Kafka clusters and develop custom message producers and consumers using practical, hands-on examples
Message publishing is a mechanism of connecting heterogeneous applications together with messages that are routed between them, for example by using a message broker like Apache Kafka. Such solutions deal with real-time volumes of information and route it to multiple consumers without letting information more » producers know who the final consumers are.
Apache Kafka is a practical, hands-on guide providing you with a series of step-by-step practical implementations, which will help you take advantage of the real power behind Kafka, and give you a strong grounding for using it in your publisher-subscriber based architectures. « less