Books: 47

Data Science

CoverTitleYear
KEY FEATURES * This book is based on the latest 2.0 version of Apache Spark and 2.7 version of Hadoop integrated with most commonly used tools. * Learn all Spark stack components including latest topics such as DataFrames, DataSets, GraphFrames, Structured Streaming, DataFrame based ML Pipelines and more » SparkR. * Integrations with frameworks such as HDFS, YARN and tools such as Jupyter, Zeppelin, NiFi, Mahout, HBase Spark Connector, GraphFrames, H2O and Hivemall. BOOK DESCRIPTION Big Data Analytics book aims at providing the fundamentals of Apache Spark and Hadoop. All Spark components – Spark Core, Spark SQL, DataFrames, Data sets, Conventional Streaming, Structured Streaming, MLlib, Graphx and Hadoop core components – HDFS, MapReduce and Yarn are explored in greater depth with implementation examples on Spark + Hadoop clusters. It is moving away from MapReduce to Spark. So, advantages of Spark over MapReduce are explained at great depth to reap benefits of in-memory speeds. DataFrames API, Data Sources API and new Data set API are explained for building Big Data analytical applications. Real-time data analytics using Spark Streaming with Apache Kafka and HBase is covered to help building streaming applications. New Structured streaming concept is explained with an IOT (Internet of Things) use case. Machine learning techniques are covered using MLLib, ML Pipelines and SparkR and Graph Analytics are covered with GraphX and GraphFrames components of Spark. Readers will also get an opportunity to get started with web based notebooks such as Jupyter, Apache Zeppelin and data flow tool Apache NiFi to analyze and visualize data. WHAT YOU WILL LEARN * Find out and implement the tools and techniques of big data analytics using Spark on Hadoop clusters with wide variety of tools used with Spark and Hadoop * Understand all the Hadoop and Spark ecosystem components * Get to know all the Spark components: Spark Core, Spark SQL, DataFrames, DataSets, Conventional and Structured Streaming, MLLib, ML Pipelines and Graphx * See batch and real-time data analytics using Spark Core, Spark SQL, and Conventional and Structured Streaming * Get to grips with data science and machine learning using MLLib, ML Pipelines, H2O, Hivemall, Graphx, SparkR and Hivemall. ABOUT THE AUTHOR Venkat Ankam has over 18 years of IT experience and over 5 years in big data technologies, working with customers to design and develop scalable big data applications. Having worked with multiple clients globally, he has tremendous experience in big data analytics using Hadoop and Spark. He is a Cloudera Certified Hadoop Developer and Administrator and also a Databricks Certified Spark Developer. He is the founder and presenter of a few Hadoop and Spark meetup groups globally and loves to share knowledge with the community. Venkat has delivered hundreds of trainings, presentations, and white papers in the big data sphere. While this is his first attempt at writing a book, many more books are in the pipeline. TABLE OF CONTENTS 1. Big Data Analytics at 10,000 foot view 2. Getting Started with Apache Hadoop and Apache Spark 3. Deep Dive into Apache Spark 4. Big Data Analytics with Spark SQL, DataFrames, and Datasets 5. Real-Time Analytics with Spark Streaming and Structured Streaming 6. Notebooks and Dataflows with Spark and Hadoop 7. Machine Learning with Spark and Hadoop 8. Building Recommendation Systems with Spark and Mahout 9. Graph Analytics with GraphX 10. Interactive Analytics with SparkR « less
2016
KEY FEATURES * A quick way to get started with Spark – and reap the rewards * From analytics to engineering your big data architecture, we've got it covered * Bring your Scala and Java knowledge – and put it to work on new and exciting problems BOOK DESCRIPTION When people want a way to process more » big data at speed, Spark is invariably the solution. With its ease of development (in comparison to the relative complexity of Hadoop), it's unsurprising that it's becoming popular with data analysts and engineers everywhere. Beginning with the fundamentals, we'll show you how to get set up with Spark with minimum fuss. You'll then get to grips with some simple APIs before investigating machine learning and graph processing – throughout we'll make sure you know exactly how to apply your knowledge. You will also learn how to use the Spark shell, how to load data before finding out how to build and run your own Spark applications. Discover how to manipulate your RDD and get stuck into a range of DataFrame APIs. As if that's not enough, you'll also learn some useful Machine Learning algorithms with the help of Spark MLlib and integrating Spark with R. We'll also make sure you're confident and prepared for graph processing, as you learn more about the GraphX API. WHAT YOU WILL LEARN * Install and set up Spark in your cluster * Prototype distributed applications with Spark's interactive shell * Perform data wrangling using the new DataFrame APIs * Get to know the different ways to interact with Spark's distributed representation of data (RDDs) * Query Spark with a SQL-like query syntax * See how Spark works with big data * Implement machine learning systems with highly scalable algorithms * Use R, the popular statistical language, to work with Spark * Apply interesting graph algorithms and graph processing with GraphX ABOUT THE AUTHOR Krishna Sankar is a Senior Specialist—AI Data Scientist with Volvo Cars focusing on Autonomous Vehicles. His earlier stints include Chief Data Scientist at http://cadenttech.tv/, Principal Architect/Data Scientist at Tata America Intl. Corp., Director of Data Science at a bioinformatics startup, and as a Distinguished Engineer at Cisco. He has been speaking at various conferences including ML tutorials at Strata SJC and London 2016, Spark Summit [goo.gl/ab30lD], Strata-Spark Camp, OSCON, PyCon, and PyData, writes about Robots Rules of Order [goo.gl/5yyRv6], Big Data Analytics—Best of the Worst [goo.gl/ImWCaz], predicting NFL, Spark [http://goo.gl/E4kqMD], Data Science [http://goo.gl/9pyJMH], Machine Learning [http://goo.gl/SXF53n], Social Media Analysis [http://goo.gl/D9YpVQ] as well as has been a guest lecturer at the Naval Postgraduate School. His occasional blogs can be found at https://doubleclix.wordpress.com/. His other passion is flying drones (working towards Drone Pilot License (FAA UAS Pilot) and Lego Robotics—you will find him at the St.Louis FLL World Competition as Robots Design Judge. TABLE OF CONTENTS 1. Installing Spark and Setting Up Your Cluster 2. Using the Spark Shell 3. Building and Running a Spark Application 4. Creating a SparkSession Object 5. Loading and Saving Data in Spark 6. Manipulating Your RDD 7. Spark 2.0 Concepts 8. Spark SQL 9. Foundations of Datasets/DataFrames – The Proverbial Workhorse for DataScientists 10. Spark with Big Data 11. Machine Learning with Spark ML Pipelines 12. GraphX « less
2016
Explore the world of data science from scratch with Julia by your side
KEY FEATURES * An in-depth exploration of Julia's growing ecosystem of packages * Work with the most powerful open-source libraries for deep learning, data wrangling, and data visualization * Learn about deep learning using Mocha.jl and give speed and high performance to data analysis on large data more » sets BOOK DESCRIPTION Julia is a fast and high performing language that's perfectly suited to data science with a mature package ecosystem and is now feature complete. It is a good tool for a data science practitioner. There was a famous post at Harvard Business Review that Data Scientist is the sexiest job of the 21st century. (https://hbr.org/2012/10/data-scientist-the-sexiest-job-of-the-21st-century). This book will help you get familiarised with Julia's rich ecosystem, which is continuously evolving, allowing you to stay on top of your game. This book contains the essentials of data science and gives a high-level overview of advanced statistics and techniques. You will dive in and will work on generating insights by performing inferential statistics, and will reveal hidden patterns and trends using data mining. This has the practical coverage of statistics and machine learning. You will develop knowledge to build statistical models and machine learning systems in Julia with attractive visualizations. You will then delve into the world of Deep learning in Julia and will understand the framework, Mocha.jl with which you can create artificial neural networks and implement deep learning. This book addresses the challenges of real-world data science problems, including data cleaning, data preparation, inferential statistics, statistical modeling, building high-performance machine learning systems and creating effective visualizations using Julia. WHAT YOU WILL LEARN * Apply statistical models in Julia for data-driven decisions * Understanding the process of data munging and data preparation using Julia * Explore techniques to visualize data using Julia and D3 based packages * Using Julia to create self-learning systems using cutting edge machine learning algorithms * Create supervised and unsupervised machine learning systems using Julia. Also, explore ensemble models * Build a recommendation engine in Julia * Dive into Julia’s deep learning framework and build a system using Mocha.jl ABOUT THE AUTHOR Anshul Joshi is a data science professional with more than 2 years of experience primarily in data munging, recommendation systems, predictive modeling, and distributed computing. He is a deep learning and AI enthusiast. Most of the time, he can be caught exploring GitHub or trying anything new on which he can get his hands on. He blogs on anshuljoshi.xyz. TABLE OF CONTENTS 1. The Groundwork – Julia's Environment 2. Data Munging 3. Data Exploration 4. Deep Dive into Inferential Statistics 5. Making Sense of Data Using Visualization 6. Supervised Machine Learning 7. Unsupervised Machine Learning 8. Creating Ensemble Models 9. Time Series 10. Collaborative Filtering and Recommendation System 11. Introduction to Deep Learning « less
2016
A practical guide to obtaining, transforming, exploring, and analyzing data using Python, MongoDB, and Apache Spark
Key Features Learn to use various data analysis tools and algorithms to classify, cluster, visualize, simulate, and forecast your data Apply Machine Learning algorithms to different kinds of data such as social networks, time series, and images A hands-on guide to understanding the nature of data more » and how to turn it into insight Book Description Beyond buzzwords like Big Data or Data Science, there are a great opportunities to innovate in many businesses using data analysis to get data-driven products. Data analysis involves asking many questions about data in order to discover insights and generate value for a product or a service. This book explains the basic data algorithms without the theoretical jargon, and you'll get hands-on turning data into insights using machine learning techniques. We will perform data-driven innovation processing for several types of data such as text, Images, social network graphs, documents, and time series, showing you how to implement large data processing with MongoDB and Apache Spark. What you will learn Acquire, format, and visualize your data Build an image-similarity search engine Generate meaningful visualizations anyone can understand Get started with analyzing social network graphs Find out how to implement sentiment text analysis Install data analysis tools such as Pandas, MongoDB, and Apache Spark Get to grips with Apache Spark Implement machine learning algorithms such as classification or forecasting About the Author Hector Cuesta is founder and Chief Data Scientist at Dataxios, a machine intelligence research company. Holds a BA in Informatics and a M.Sc. in Computer Science. He provides consulting services for data-driven product design with experience in a variety of industries including financial services, retail, fintech, e-learning and Human Resources. He is an enthusiast of Robotics in his spare time. « less
2016
Find out how to build smarter machine learning systems with R. Follow this three module course to become a more fluent machine learning practitioner.
ABOUT THIS BOOK * Build your confidence with R and find out how to solve a huge range of data-related problems * Get to grips with some of the most important machine learning techniques being used by data scientists and analysts across industries today * Don't just learn – apply your knowledge by more » following featured practical projects covering everything from financial modeling to social media analysis WHO THIS BOOK IS FOR Aimed for intermediate-to-advanced people (especially data scientist) who are already into the field of data science WHAT YOU WILL LEARN * Get to grips with R techniques to clean and prepare your data for analysis, and visualize your results * Implement R machine learning algorithms from scratch and be amazed to see the algorithms in action * Solve interesting real-world problems using machine learning and R as the journey unfolds * Write reusable code and build complete machine learning systems from the ground up * Learn specialized machine learning techniques for text mining, social network data, big data, and more * Discover the different types of machine learning models and learn which is best to meet your data needs and solve your analysis problems * Evaluate and improve the performance of machine learning models * Learn specialized machine learning techniques for text mining, social network data, big data, and more IN DETAIL R is the established language of data analysts and statisticians around the world. And you shouldn't be afraid to use it... This Learning Path will take you through the fundamentals of R and demonstrate how to use the language to solve a diverse range of challenges through machine learning. Accessible yet comprehensive, it provides you with everything you need to become more a more fluent data professional, and more confident with R. In the first module you'll get to grips with the fundamentals of R. This means you'll be taking a look at some of the details of how the language works, before seeing how to put your knowledge into practice to build some simple machine learning projects that could prove useful for a range of real world problems. For the following two modules we'll begin to investigate machine learning algorithms in more detail. To build upon the basics, you'll get to work on three different projects that will test your skills. Covering some of the most important algorithms and featuring some of the most popular R packages, they're all focused on solving real problems in different areas, ranging from finance to social media. This Learning Path has been curated from three Packt products: * R Machine Learning By Example By Raghav Bali, Dipanjan Sarkar * Machine Learning with R Learning - Second Edition By Brett Lantz * Mastering Machine Learning with R By Cory Lesmeister STYLE AND APPROACH This is an enticing learning path that starts from the very basics to gradually pick up pace as the story unfolds. Each concept is first defined in the larger context of things succinctly, followed by a detailed explanation of their application. Each topic is explained with the help of a project that solves a real-world problem involving hands-on work thus giving you a deep insight into the world of machine learning. « less
2016
Analyze your data and delve deep into the world of machine learning with the latest Spark version, 2.0
About This Book Perform data analysis and build predictive models on huge datasets that leverage Apache Spark Learn to integrate data science algorithms and techniques with the fast and scalable computing features of Spark to address big data challenges Work through practical examples on real-world more » problems with sample code snippets Who This Book Is For This book is for anyone who wants to leverage Apache Spark for data science and machine learning. If you are a technologist who wants to expand your knowledge to perform data science operations in Spark, or a data scientist who wants to understand how algorithms are implemented in Spark, or a newbie with minimal development experience who wants to learn about Big Data Analytics, this book is for you! What You Will Learn Consolidate, clean, and transform your data acquired from various data sources Perform statistical analysis of data to find hidden insights Explore graphical techniques to see what your data looks like Use machine learning techniques to build predictive models Build scalable data products and solutions Start programming using the RDD, DataFrame and Dataset APIs Become an expert by improving your data analytical skills In Detail This is the era of Big Data. The words ‘Big Data’implies big innovation and enables a competitive advantage for businesses. Apache Spark was designed to perform Big Data analytics at scale, and so Spark is equipped with the necessary algorithms and supports multiple programming languages. Whether you are a technologist, a data scientist, or a beginner to Big Data analytics, this book will provide you with all the skills necessary to perform statistical data analysis, data visualization, predictive modeling, and build scalable data products or solutions using Python, Scala, and R. « less
2016
Taking up where the bestselling "A Simple Introduction to Data Science" leaves off, Lars Nielsen's "A Simple Introduction to Data Science, BOOK TWO" expands on elementary concepts introduced in the first volume while at the same time embracing several new and key topics. Coverage includes the art and more » practice of introducing Data Science to the culture of the enterprise ... Data Science ethics and privacy concerns ... key concepts in data visualization ... the role of Artificial Intelligence, Machine Learning, and Deep Learning ... Data Curation and the "Tribal Knowledge" problem ... Hadoop, R, and Python ... and discussion of how the Data Scientist role will evolve in future. « less
2015
Discover how data science can help you gain in-depth insight into your business – the easy way! Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles in organizations. Data Science For Dummies is the perfect starting point for IT more » professionals and students interested in making sense of their organization’s massive data sets and applying their findings to real-world business scenarios. From uncovering rich data sources to managing large amounts of data within hardware and software limitations, ensuring consistency in reporting, merging various data sources, and beyond, you’ll develop the know-how you need to effectively interpret data and tell a story that can be understood by anyone in your organization. * Provides a background in data science fundamentals before moving on to working with relational databases and unstructured data and preparing your data for analysis * Details different data visualization techniques that can be used to showcase and summarize your data * Explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques * Includes coverage of big data processing tools like MapReduce, Hadoop, Dremel, Storm, and Spark It’s a big, big data world out there – let Data Science For Dummies help you harness its power and gain a competitive edge for your organization. « less
2015
First Principles with Python
Data science libraries, frameworks, modules, and toolkits are great for doing data science, but they’re also a good way to dive into the discipline without actually understanding data science. In this book, you’ll learn how many of the most fundamental data science tools and algorithms work by implementing more » them from scratch. If you have an aptitude for mathematics and some programming skills, author Joel Grus will help you get comfortable with the math and statistics at the core of data science, and with hacking skills you need to get started as a data scientist. Today’s messy glut of data holds answers to questions no one’s even thought to ask. This book provides you with the know-how to dig those answers out. * Get a crash course in Python * Learn the basics of linear algebra, statistics, and probability—and understand how and when they're used in data science * Collect, explore, clean, munge, and manipulate data * Dive into the fundamentals of machine learning * Implement models such as k-nearest Neighbors, Naive Bayes, linear and logistic regression, decision trees, neural networks, and clustering * Explore recommender systems, natural language processing, network analysis, MapReduce, and databases « less
2015
Hadoop is mostly written in Java, but that doesn't exclude the use of other programming languages with this distributed storage and processing framework, particularly Python. With this concise book, you'll learn how to use Python with the Hadoop Distributed File System (HDFS), MapReduce, the Apache Pig more » platform and Pig Latin script, and the Apache Spark cluster-computing framework. Authors Zachary Radtka and Donald Miner from the data science firm Miner & Kasch take you through the basic concepts behind Hadoop, MapReduce, Pig, and Spark. Then, through multiple examples and use cases, you'll learn how to work with these technologies by applying various Python tools. Use the Python library Snakebite to access HDFS programmatically from within Python applications Write MapReduce jobs in Python with mrjob, the Python MapReduce library Extend Pig Latin with user-defined functions (UDFs) in Python Use the Spark Python API (PySpark) to write Spark programs with Python Learn how to use the Luigi Python workflow scheduler to manage MapReduce jobs and Pig scripts « less
2015