Books: 5

Hive

CoverTitleYear
A Comprehensive Guide to Machine Learning
This book is inspired by the Machine Learning Model Building Process Flow, which provides the reader the ability to understand a ML algorithm and apply the entire process of building a ML model from the raw data. This new paradigm of teaching Machine Learning will bring about a radical change in perception more » for many of those who think this subject is difficult to learn. Though theory sometimes looks difficult, especially when there is heavy mathematics involved, the seamless flow from the theoretical aspects to example-driven learning provided in Blockchain and Capitalism makes it easy for someone to connect the dots. For every Machine Learning algorithm covered in this book, a 3-D approach of theory, case-study and practice will be given. And where appropriate, the mathematics will be explained through visualization in R. All practical demonstrations will be explored in R, a powerful programming language and software environment for statistical computing and graphics. The various packages and methods available in R will be used to explain the topics. In the end, readers will learn some of the latest technological advancements in building a scalable machine learning model with Big Data. Who This Book is For: Data scientists, data science professionals and researchers in academia who want to understand the nuances of Machine learning approaches/algorithms along with ways to see them in practice using R. The book will also benefit the readers who want to understand the technology behind implementing a scalable machine learning model using Apache Hadoop, Hive, Pig and Spark. What you will learn: 1. ML model building process flow2. Theoretical aspects of Machine Learning3. Industry based Case-Study4. Example based understanding of ML algorithm using R5. Building ML models using Apache Hadoop and Spark « less
2017
Designing and Building Effective Analytics at Scale
The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. Practical Data Science with Hadoop® and Spark is your complete guide to doing just that. Drawing more » on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials. The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization. Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP). This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives. Learn * What data science is, how it has evolved, and how to plan a data science career * How data volume, variety, and velocity shape data science use cases * Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark * Data importation with Hive and Spark * Data quality, preprocessing, preparation, and modeling * Visualization: surfacing insights from huge data sets * Machine learning: classification, regression, clustering, and anomaly detection * Algorithms and Hadoop tools for predictive modeling * Cluster analysis and similarity functions * Large-scale anomaly detection * NLP: applying data science to human language « less
2016
Immerse yourself on a fantastic journey to discover the attributes of big data by using Hive
ABOUT THIS BOOK * Discover how Hive can coexist and work with other tools in the Hadoop ecosystem to create big data solutions * Grasp the skills needed, learn the best practices, and avoid the pitfalls in writing efficient Hive queries to analyze the big data * Create an environment to analyze big more » data using practical, example-oriented scenarios WHO THIS BOOK IS FOR If you are a data analyst, developer, or simply someone who wants to use Hive to explore and analyze data in Hadoop, this is the book for you. Whether you are new to big data or an expert, with this book, you will be able to master both the basic and the advanced features of Hive. Since Hive is an SQL-like language, some previous experience with the SQL language and databases is useful to have a better understanding of this book. WHAT YOU WILL LEARN * Create and set up the Hive environment * Discover how to use Hive's definition language to describe data * Discover interesting data by joining and filtering datasets in Hive * Transform data by using Hive sorting, ordering, and functions * Aggregate and sample data in different ways * Boost Hive query performance and enhance data security in Hive * Customize Hive to your needs by using user-defined functions and integrate it with other tools IN DETAIL In this book, we prepare you for your journey into big data by firstly introducing you to backgrounds in the big data domain along with the process of setting up and getting familiar with your Hive working environment. Next, the book guides you through discovering and transforming the values of big data with the help of examples. It also hones your skill in using the Hive language in an efficient manner. Towards the end, the book focuses on advanced topics such as performance, security, and extensions in Hive, which will guide you on exciting adventures on this worthwhile big data journey. By the end of the book, you will be familiar with Hive and able to work efficiently to find solutions to big data problems. « less
2015
Perform interactive, real-time in-memory analytics on large amounts of data using the massive parallel processing engine Cloudera Impala
Everything you need to know about Cloudera Impala is here – from installation onwards. Your raw data processing in Hadoop takes on new dimensions of speed and volume with this hands-on tutorial. Overview * Step-by-step guidance to get you started with Impala on your Hadoop cluster * Manipulate more » your data rapidly by writing proper SQL statements * Explore the concepts of Impala security, administration, and troubleshooting in detail to maintain your Impala cluster In Detail If you have always wanted to crunch billions of rows of raw data on Hadoop in a couple of seconds, then Cloudera Impala is the number one choice for you. Cloudera Impala provides fast, interactive SQL queries directly on your Apache Hadoop data stored in HDFS or HBase. In addition to using the same unified storage platform, Impala also uses the same metadata, SQL syntax (Hive SQL), ODBC driver, and user interface (Hue Beeswax) as Apache Hive. This provides a familiar and unified platform for batch-oriented or real-time queries. In this practical, example-oriented book, you will learn everything you need to know about Cloudera Impala so that you can get started on your very own project. The book covers everything about Cloudera Impala from installation, administration, and query processing, all the way to connectivity with other third party applications. With this book in your hand, you will find yourself empowered to play with your data in Hadoop. As a reader of this book, you will learn about the origin of Impala and the technology behind it that allows it to run on thousands of machines. You will learn how to install, run, manage, and troubleshoot Impala in your own Hadoop cluster using the step-by-step guidance provided in the book. The book covers tenets of data processing such as loading data stored in Hadoop into Impala tables and querying data using Impala SQL statements, all with various code illustrations and a real-world example. The book is written to get you started with Impala by providing rich information so you can understand what Impala is, what it can do for you, and finally how you can use it to achieve your objective. What you will learn from this book * Understand the various ways of installing Impala in your Hadoop cluster * Use the Impala shell API to interact with Impala components * Utilize Impala Query Language and built-in functions to play with data * Administrate and fine-tune Impala for high availability * Identify and troubleshoot problems in a variety of ways * Get acquainted with various input data formats in Hadoop and how to use them with Impala * Comprehend how third party applications can connect with Impala to provide data visualization and various other enhancements Approach This book is an easy-to-follow, step-by-step tutorial where each chapter takes your knowledge to the next level. The book covers practical knowledge with tips to implement this knowledge in real-world scenarios. A chapter with a real-life example is included to help you understand the concepts in full. Who this book is written for Using Cloudera Impala is for those who really want to take advantage of their Hadoop cluster by processing extremely large amounts of raw data in Hadoop at real-time speed. Prior knowledge of Hadoop and some exposure to HIVE and MapReduce is expected. « less
2013
Data Warehouse and Query Language for Hadoop
Need to move a relational database application to Hadoop? This comprehensive guide introduces you to Apache Hive, Hadoop's data warehouse infrastructure. You'll quickly learn how to use Hive's SQL dialect - HiveQL - to summarize, query, and analyze large datasets stored in Hadoop's distributed filesystem. This more » example-driven guide shows you how to set up and configure Hive in your environment, provides a detailed overview of Hadoop and MapReduce, and demonstrates how Hive works within the Hadoop ecosystem. You'll also find real-world case studies that describe how companies have used Hive to solve unique problems involving petabytes of data. « less
2012