Discover how data science can help you gain in-depth insight into your business – the easy way!
Jobs in data science abound, but few people have the data science skills needed to fill these increasingly important roles in organizations. Data Science For Dummies is the perfect starting point for IT more » professionals and students interested in making sense of their organization’s massive data sets and applying their findings to real-world business scenarios. From uncovering rich data sources to managing large amounts of data within hardware and software limitations, ensuring consistency in reporting, merging various data sources, and beyond, you’ll develop the know-how you need to effectively interpret data and tell a story that can be understood by anyone in your organization.
* Provides a background in data science fundamentals before moving on to working with relational databases and unstructured data and preparing your data for analysis
* Details different data visualization techniques that can be used to showcase and summarize your data
* Explains both supervised and unsupervised machine learning, including regression, model validation, and clustering techniques
* Includes coverage of big data processing tools like MapReduce, Hadoop, Dremel, Storm, and Spark
It’s a big, big data world out there – let Data Science For Dummies help you harness its power and gain a competitive edge for your organization. « less
Moving beyond MapReduce and Batch Processing with Apache Hadoop 2
Apache Hadoop is helping drive the Big Data revolution. Now, its data processing has been completely overhauled: Apache Hadoop YARN provides resource management at data center scale and easier ways to create distributed applications that process petabytes of data. And now in Apache Hadoop YARN, two Hadoop more » technical leaders show you how to develop new applications and adapt existing code to fully leverage these revolutionary advances.
YARN project founder Arun Murthy and project lead Vinod Kumar Vavilapalli demonstrate how YARN increases scalability and cluster utilization, enables new programming models and services, and opens new options beyond Java and batch processing. They walk you through the entire YARN project lifecycle, from installation through deployment. « less
Enabling Competitive Differentiation through Business Analytics
Your business generates reams of data, but what do you do with it? Reporting is only the beginning. Your data holds the key to innovation and growth â you just need the proper analytics. In Big Data, Big Innovation: Enabling Competitive Differentiation Through Business Analytics, author Evan Stubbs more » explores the potential gold hiding in your un-mined data. As Chief Analytics Officer for SAS Australia/New Zealand, Stubbs brings an industry insider's perspective to guide you through pattern recognition, analysis, and implementation.
Big Data, Big Innovation details a groundbreaking approach to ensuring your company's upward trajectory. Use this guide to leverage your customer information, financial reports, performance metrics, and more to build a rock-solid foundation for future growth. « less
Value Creation for Business Leaders and Practitioners
Big data is big business. But having the data and the computational power to process it isn't nearly enough to produce meaningful results. Big Data, Data Mining, and Machine Learning: Value Creation for Business Leaders and Practitioners is a complete resource for technology and marketing executives more » looking to cut through the hype and produce real results that hit the bottom line. Providing an engaging, thorough overview of the current state of big data analytics and the growing trend toward high performance computing architectures, the book is a detail-driven look into how big data analytics can be leveraged to foster positive change and drive efficiency. « less
C++ For Dummies, 7th Edition is the best-selling C++ guide on the market, fully revised for the 2014 update. With over 60% new content, this updated guide reflects the new standards, and includes a new Big Data focus that highlights the use of C++ among popular Big Data software solutions. The book provides more » step-by-step instruction from the ground up, helping beginners become programmers and allowing intermediate programmers to sharpen their skills. The companion website provides all code mentioned in the text, an updated GNU_C++, the new C++ compiler, and other applications. By the end of the first chapter, you will have programmed your first C++ application! « less
Hadoop in Practice, Second Edition provides over 100 tested, instantly useful techniques that will help you conquer big data, using Hadoop. This revised new edition covers changes and new features in the Hadoop core architecture, including MapReduce 2. Brand new chapters cover YARN and integrating Kafka, more » Impala, and Spark SQL with Hadoop. You'll also get new and updated techniques for Flume, Sqoop, and Mahout, all of which have seen major new versions recently. In short, this is the most practical, up-to-date coverage of Hadoop available anywhere.
It's always a good time to upgrade your Hadoop skills! Hadoop in Practice, Second Edition provides a collection of 104 tested, instantly useful techniques for analyzing real-time streams, moving data securely, machine learning, managing large-scale clusters, and taming big data using Hadoop. This completely revised edition covers changes and new features in Hadoop core, including MapReduce 2 and YARN. You'll pick up hands-on best practices for integrating Spark, Kafka, and Impala with Hadoop, and get new and updated techniques for the latest versions of Flume, Sqoop, and Mahout. In short, this is the most practical, up-to-date coverage of Hadoop available.
Readers need to know a programming language like Java and have basic familiarity with Hadoop.
* Thoroughly updated for Hadoop 2
* How to write YARN applications
* Integrate real-time technologies like Storm, Impala, and Spark
* Predictive analytics using Mahout and RR
* Readers need to know a programming language like Java and have basic familiarity with Hadoop.
About the Author
Alex Holmes works on tough big-data problems. He is a software engineer, author, speaker, and blogger specializing in large-scale Hadoop projects.
Table of Contents
1. PART 1 BACKGROUND AND FUNDAMENTALS
2. Hadoop in a heartbeat
3. Introduction to YARN
4. PART 2 DATA LOGISTICS
5. Data serialization—working with text and beyond
6. Organizing and optimizing data in HDFS
7. Moving data into and out of Hadoop
8. PART 3 BIG DATA PATTERNS
9. Applying MapReduce patterns to big data
10. Utilizing data structures and algorithms at scale
11. Tuning, debugging, and testing
12. PART 4 BEYOND MAPREDUCE
13. SQL on Hadoop
14. Writing a YARN application « less
Pro Apache Hadoop, Second Edition brings you up to speed on Hadoop - the framework of big data. Revised to cover Hadoop 2.0, the book covers the very latest developments such as YARN (aka MapReduce 2.0), new HDFS high-availability features, and increased scalability in the form of HDFS Federations. All more » the old content has been revised too, giving the latest on the ins and outs of MapReduce, cluster design, the Hadoop Distributed File System, and more.
This book covers everything you need to build your first Hadoop cluster and begin analyzing and deriving value from your business and scientific data. Learn to solve big-data problems the MapReduce way, by breaking a big problem into chunks and creating small-scale solutions that can be flung across thousands upon thousands of nodes to analyze large data volumes in a short amount of wall-clock time. Learn how to let Hadoop take care of distributing and parallelizing your software - you just focus on the code; Hadoop takes care of the rest. « less
Querying and Updating with SPARQL 1.1
Gain hands-on experience with SPARQL, the RDF query language that's bringing new possibilities to semantic web, linked data, and big data projects. This updated and expanded edition shows you how to use SPARQL 1.1 with a variety of tools to retrieve, manipulate, and federate data from the public web more » as well as from private sources.
Author Bob DuCharme has you writing simple queries right away before providing background on how SPARQL fits into RDF technologies. Using short examples that you can run yourself with open source software, you'll learn how to update, add to, and delete data in RDF datasets. « less
Learn exciting new ways to build efficient, high performance enterprise search repositories for Big Data using Hadoop and Solr
As data grows exponentially day-by-day, extracting information becomes a tedious activity in itself. Technologies like Hadoop are trying to address some of the concerns, while Solr provides high-speed faceted search. Bringing these two technologies together is helping organizations resolve the problem more » of information extraction from Big Data by providing excellent distributed faceted search capabilities.
Scaling Big Data with Hadoop and Solr is a step-by-step guide that helps you build high performance enterprise search engines while scaling data. Starting with the basics of Apache Hadoop and Solr, this book then dives into advanced topics of optimizing search with some interesting real-world use cases and sample Java code. « less
Current Perspectives from O'Reilly Radar
This collection represents the full spectrum of data-related content we've published on O'Reilly Radar over the last year. Mike Loukides kicked things off in June 2010 with "What is data science?" and from there we've pursued the various threads and themes that naturally emerged. Now, roughly a year more » later, we can look back over all we've covered and identify a number of core data areas. « less