* Optimize your work flow with Spark in data science, and get solutions to all your big data problems
* Large-scale data science made easy with Spark
* Get recipes to make the most of Spark's power and speed in predictive analytics
Spark has emerged as the big data more » platform of choice for data scientists. The real power and value proposition of Apache Spark is its platform to execute data science tasks. Spark's unique use case is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations to allow data scientists to tackle the complexities that come with raw unstructured data sets.
This hands-on, practical resource will allow you to dive in and become comfortable and confident in working with Spark for data science. We will walk you through various techniques to deal with simple and complex data science tasks with Spark. We'll effectively offer solutions to problematic concepts in data science using Spark's data science libraries. The book will help you derive intelligent information at every step of the way through simple yet efficient recipes that will not only show you how to implement algorithms, but also optimize your work.
WHAT YOU WILL LEARN
* Explore the topics of data mining, text mining, NLP, information retrieval, and machine learning
* Solve real-world analytical problems with large data sets
* Get the flavor of challenges in data science and address them with a variety of analytical tools on a distributed system like Spark (apt for iterative algorithms), which offers in-memory processing and more flexibility for data analysis at scale « less
A Practical Real-World Approach to Gaining Actionable Insights from your Data
Derive useful insights from your data using Python. Learn the techniques related to natural language processing and text analytics, and gain the skills to know which technique is best suited to solve a particular problem.
Text Analytics with Pythonteaches you both basic and advanced concepts, including more » text and language syntax, structure, semantics. You will focus on algorithms and techniques, such as text classification, clustering, topic modeling, and text summarization.
A structured and comprehensive approach is followed in this book so that readers with little or no experience do not find themselves overwhelmed. You will start with the basics of natural language and Python and move on to advanced analytical and machine learning concepts. You will look at each technique and algorithm with both a bird's eye view to understand how it can be used as well as with a microscopic view to understand the mathematical concepts and to implement them to solve your own problems.
* Provides complete coverage of the major concepts and techniques of natural language processing (NLP) and text analytics
* Includes practical real-world examples of techniques for implementation, such as building a text classification system to categorize news articles, analyzing app or game reviews using topic modeling and text summarization, and clustering popular movie synopses and analyzing the sentiment of movie reviews
* Shows implementations based on Python and several popular open source libraries in NLP and text analytics, such as the natural language toolkit (nltk), gensim, scikit-learn, spaCy and Pattern
What you will learn:
• Natural Language concepts
• Analyzing Text syntax and structure
• Text Classification
• Text Clustering and Similarity analysis
• Text Summarization
• Semantic and Sentiment analysisReadership
The book is for IT professionals, analysts, developers, linguistic experts, data scientists, and anyone with a keen interest in linguistics, analytics, and generating insights from textual data. « less
Designing and Building Effective Analytics at Scale
The Complete Guide to Data Science with Hadoop—For Technical Professionals, Businesspeople, and Students
Demand is soaring for professionals who can solve real data science problems with Hadoop and Spark. Practical Data Science with Hadoop® and Spark is your complete guide to doing just that. Drawing more » on immense experience with Hadoop and big data, three leading experts bring together everything you need: high-level concepts, deep-dive techniques, real-world use cases, practical applications, and hands-on tutorials.
The authors introduce the essentials of data science and the modern Hadoop ecosystem, explaining how Hadoop and Spark have evolved into an effective platform for solving data science problems at scale. In addition to comprehensive application coverage, the authors also provide useful guidance on the important steps of data ingestion, data munging, and visualization.
Once the groundwork is in place, the authors focus on specific applications, including machine learning, predictive modeling for sentiment analysis, clustering for document analysis, anomaly detection, and natural language processing (NLP).
This guide provides a strong technical foundation for those who want to do practical data science, and also presents business-driven guidance on how to apply Hadoop and Spark to optimize ROI of data science initiatives.
* What data science is, how it has evolved, and how to plan a data science career
* How data volume, variety, and velocity shape data science use cases
* Hadoop and its ecosystem, including HDFS, MapReduce, YARN, and Spark
* Data importation with Hive and Spark
* Data quality, preprocessing, preparation, and modeling
* Visualization: surfacing insights from huge data sets
* Machine learning: classification, regression, clustering, and anomaly detection
* Algorithms and Hadoop tools for predictive modeling
* Cluster analysis and similarity functions
* Large-scale anomaly detection
* NLP: applying data science to human language « less
Real-World Machine Learning is a practical guide designed to teach working developers the art of ML project execution. Without overdosing you on academic theory and complex mathematics, it introduces the day-to-day practice of machine learning, preparing you to successfully build and deploy more » powerful ML systems.
About the Technology
Machine learning systems help you find valuable insights and patterns in data, which you'd never recognize with traditional methods. In the real world, ML techniques give you a way to identify trends, forecast behavior, and make fact-based recommendations. It's a hot and growing field, and up-to-speed ML developers are in demand.
About the Book
Real-World Machine Learning will teach you the concepts and techniques you need to be a successful machine learning practitioner without overdosing you on abstract theory and complex mathematics. By working through immediately relevant examples in Python, you'll build skills in data acquisition and modeling, classification, and regression. You'll also explore the most important tasks like model validation, optimization, scalability, and real-time streaming. When you're done, you'll be ready to successfully build, deploy, and maintain your own powerful ML systems.
* Predicting future behavior
* Performance evaluation and optimization
* Analyzing sentiment and making recommendations
About the Reader
No prior machine learning experience assumed. Readers should know Python.
About the Authors
Henrik Brink, Joseph Richards and Mark Fetherolf are experienced data scientists engaged in the daily practice of machine learning.
Table of Contents
1. THE MACHINE-LEARNING WORKFLOW
2. What is machine learning?
3. Real-world data
4. Modeling and prediction
5. Model evaluation and optimization
6. Basic feature engineering
7. PRACTICAL APPLICATION
8. Example: NYC taxi data
9. Advanced feature engineering
10. Advanced NLP example: movie review sentiment
11. Scaling machine-learning workflows
12. Example: digital display advertising « less
Explore various approaches to organize and extract useful text from unstructured data using Java
Natural Language Processing (NLP) is an important area of application development and its relevance in addressing contemporary problems will only increase in the future. There has been a significant increase in the demand for natural language-accessible applications supported by NLP tasks.
Natural more » Language Processing with Java will explore how to automatically organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization. It covers concepts of NLP that even those of you without a background in statistics or natural language processing can understand.
***** Who This Book Is For *****
If you are a Java programmer who wants to learn about the fundamental tasks underlying natural language processing, this book is for you. You will be able to identify and use NLP tasks for many common problems, and integrate them in your applications to solve more difficult problems. Readers should be familiar/experienced with Java software development. « less
Over 60 effective recipes to develop your Natural Language Processing (NLP) skills quickly and effectively
NLP is at the core of web search, intelligent personal assistants, marketing, and much more, and LingPipe is a toolkit for processing text using computational linguistics.
This book starts with the foundational but powerful techniques of language identification, sentiment classifiers, and evaluation more » frameworks. It goes on to detail how to build a robust framework to solve common NLP problems, before ending with advanced techniques for complex heterogeneous NLP systems.
This is a recipe and tutorial book for experienced Java developers with NLP needs. A basic knowledge of NLP terminology will be beneficial. This book will guide you through the process of how to build NLP apps with minimal fuss and maximal impact.
***** Who This Book Is For *****
This book is for experienced Java developers with NLP needs, whether academics, industrialists, or hobbyists. A basic knowledge of NLP terminology will be beneficial. « less