Design and implement a series of Flume agents to send streamed data into Hadoop
ABOUT THIS BOOK
* Construct a series of Flume agents using the Apache Flume service to efficiently collect, aggregate, and move large amounts of event data
* Configure failover paths and load balancing to remove single points of failure
* Use this step-by-step guide to stream logs from application more » servers to Hadoop's HDFS
WHO THIS BOOK IS FOR
If you are a Hadoop programmer who wants to learn about Flume to be able to move datasets into Hadoop in a timely and replicable manner, then this book is ideal for you. No prior knowledge about Apache Flume is necessary, but a basic knowledge of Hadoop and the Hadoop File System (HDFS) is assumed.
WHAT YOU WILL LEARN
* Understand the Flume architecture, and also how to download and install open source Flume from Apache
* Follow along a detailed example of transporting weblogs in Near Real Time (NRT) to Kibana/Elasticsearch and archival in HDFS
* Learn tips and tricks for transporting logs and data in your production environment
* Understand and configure the Hadoop File System (HDFS) Sink
* Use a morphline-backed Sink to feed data into Solr
* Create redundant data flows using sink groups
* Configure and use various sources to ingest data
* Inspect data records and move them between multiple destinations based on payload content
* Transform data en-route to Hadoop and monitor your data flows
Apache Flume is a distributed, reliable, and available service used to efficiently collect, aggregate, and move large amounts of log data. It is used to stream logs from application servers to HDFS for ad hoc analysis.
This book starts with an architectural overview of Flume and its logical components. It explores channels, sinks, and sink processors, followed by sources and channels. By the end of this book, you will be fully equipped to construct a series of Flume agents to dynamically transport your stream data and logs from your systems into Hadoop.
A step-by-step book that guides you through the architecture and components of Flume covering different approaches, which are then pulled together as a real-world, end-to-end use case, gradually going from the simplest to the most advanced features. « less
Flexible, Scalable, and Reliable Data Streaming
How can you get your data from frontend servers to Hadoop in near real time? With this complete reference guide, you’ll learn Flume’s rich set of features for collecting, aggregating, and writing large amounts of streaming data to the Hadoop Distributed File System (HDFS), Apache HBase, SolrCloud, Elastic more » Search, and other systems.
Using Flume shows operations engineers how to configure, deploy, and monitor a Flume cluster, and teaches developers how to write Flume plugins and custom components for their specific use-cases. You’ll learn about Flume’s design and implementation, as well as various features that make it highly scalable, flexible, and reliable. Code examples and exercises are available on GitHub.
* Learn how Flume provides a steady rate of flow by acting as a buffer between data producers and consumers
* Dive into key Flume components, including sources that accept data and sinks that write and deliver it
* Write custom plugins to customize the way Flume receives, modifies, formats, and writes data
* Explore APIs for sending data to Flume agents from your own applications
* Plan and deploy Flume in a scalable and flexible way—and monitor your cluster once it’s running « less