Learn the techniques and math you need to start making sense of your data
* Enhance your knowledge of coding with data science theory for practical insight into data science and analysis
* More than just a math class, learn how to perform real-world data science tasks with R and Python
* Create actionable insights and transform raw data into tangible value
BOOK more » DESCRIPTION
Need to turn your skills at programming into effective data science skills? Principles of Data Science is created to help you join the dots between mathematics, programming, and business analysis. With this book, you'll feel confident about asking―and answering―complex and sophisticated questions of your data to move from abstract and raw statistics to actionable ideas.
With a unique approach that bridges the gap between mathematics and computer science, this books takes you through the entire data science pipeline. Beginning with cleaning and preparing data, and effective data mining strategies and techniques, you'll move on to build a comprehensive picture of how every piece of the data science puzzle fits together. Learn the fundamentals of computational mathematics and statistics, as well as some pseudocode being used today by data scientists and analysts. You'll get to grips with machine learning, discover the statistical models that help you take control and navigate even the densest datasets, and find out how to create powerful visualizations that communicate what your data means.
WHAT YOU WILL LEARN
* Get to know the five most important steps of data science
* Use your data intelligently and learn how to handle it with care
* Bridge the gap between mathematics and programming
* Learn about probability, calculus, and how to use statistical models to control and clean your data and drive actionable results
* Build and evaluate baseline machine learning models
* Explore the most effective metrics to determine the success of your machine learning models
* Create data visualizations that communicate actionable insights
* Read and apply machine learning concepts to your problems and make actual predictions
ABOUT THE AUTHOR
Sinan Ozdemir is a data scientist, startup founder, and educator living in the San Francisco Bay Area with his dog, Charlie; cat, Euclid; and bearded dragon, Fiero. He spent his academic career studying pure mathematics at Johns Hopkins University before transitioning to education. He spent several years conducting lectures on data science at Johns Hopkins University and at the General Assembly before founding his own start-up, Legion Analytics, which uses artificial intelligence and data science to power enterprise sales teams.
After completing the Fellowship at the Y Combinator accelerator, Sinan has spent most of his days working on his fast-growing company, while creating educational material for data science.
TABLE OF CONTENTS
1. How to Sound Like a Data Scientist
2. Types of Data
3. The Five Steps of Data Science
4. Basic Mathematics
5. Impossible or Improbable – A Gentle Introduction to Probability
6. Advanced Probability
7. Basic Statistics
8. Advanced Statistics
9. Communicating Data
10. How to Tell If Your Toaster Is Learning – Machine Learning Essentials
11. Predictions Don't Grow on Trees – or Do They?
12. Beyond the Essentials
13. Case Studies « less
* Optimize your work flow with Spark in data science, and get solutions to all your big data problems
* Large-scale data science made easy with Spark
* Get recipes to make the most of Spark's power and speed in predictive analytics
Spark has emerged as the big data more » platform of choice for data scientists. The real power and value proposition of Apache Spark is its platform to execute data science tasks. Spark's unique use case is that it combines ETL, batch analytics, real-time stream analysis, machine learning, graph processing, and visualizations to allow data scientists to tackle the complexities that come with raw unstructured data sets.
This hands-on, practical resource will allow you to dive in and become comfortable and confident in working with Spark for data science. We will walk you through various techniques to deal with simple and complex data science tasks with Spark. We'll effectively offer solutions to problematic concepts in data science using Spark's data science libraries. The book will help you derive intelligent information at every step of the way through simple yet efficient recipes that will not only show you how to implement algorithms, but also optimize your work.
WHAT YOU WILL LEARN
* Explore the topics of data mining, text mining, NLP, information retrieval, and machine learning
* Solve real-world analytical problems with large data sets
* Get the flavor of challenges in data science and address them with a variety of analytical tools on a distributed system like Spark (apt for iterative algorithms), which offers in-memory processing and more flexibility for data analysis at scale « less
To tap into the power of Python's open data science stack—including NumPy, Pandas, Matplotlib, Scikit-learn, and other tools—you first need to understand the syntax, semantics, and patterns of the Python language. This report provides a brief yet comprehensive introduction to Python for engineers, researchers, more » and data scientists who are already familiar with another programming language.
Author Jake VanderPlas, an interdisciplinary research director at the University of Washington, explains Python’s essential syntax and semantics, built-in data types and structures, function definitions, control flow statements, and more, using Python 3 syntax.
- Python syntax basics and running Python code
Basic semantics of Python variables, objects, and operators
- Built-in simple types and data structures
- Control flow statements for executing code blocks conditionally
- Methods for creating and using reusable functions
Iterators, list comprehensions, and generators
- String manipulation and regular expressions
- Python’s standard library and third-party modules
- Python’s core data science tools
- Recommended resources to help you learn more « less
This textbook on practical data analytics unites fundamental principles, algorithms, and data. Algorithms are the keystone of data analytics and the focal point of this textbook. Clear and intuitive explanations of the mathematical and statistical foundations make the algorithms transparent. But practical more » data analytics requires more than just the foundations. Problems and data are enormously variable and only the most elementary of algorithms can be used without modification. Programming fluency and experience with real and challenging data is indispensable and so the reader is immersed in Python and R and real data analysis. By the end of the book, the reader will have gained the ability to adapt algorithms to new problems and carry out innovative analyses.
This book has three parts:(a) Data Reduction: Begins with the concepts of data reduction, data maps, and information extraction. The second chapter introduces associative statistics, the mathematical foundation of scalable algorithms and distributed computing. Practical aspects of distributed computing is the subject of the Hadoop and MapReduce chapter.(b) Extracting Information from Data: Linear regression and data visualization are the principal topics of Part II. The authors dedicate a chapter to the critical domain of Healthcare Analytics for an extended example of practical data analytics. The algorithms and analytics will be of much interest to practitioners interested in utilizing the large and unwieldly data sets of the Centers for Disease Control and Prevention's Behavioral Risk Factor Surveillance System.(c) Predictive Analytics Two foundational and widely used algorithms, k-nearest neighbors and naive Bayes, are developed in detail. A chapter is dedicated to forecasting. The last chapter focuses on streaming data and uses publicly accessible data streams originating from the Twitter API and the NASDAQ stock market in the tutorials.
This book is intended for a one- or two-semester course in data analytics for upper-division undergraduate and graduate students in mathematics, statistics, and computer science. The prerequisites are kept low, and students with one or two courses in probability or statistics, an exposure to vectors and matrices, and a programming course will have no difficulty. The core material of every chapter is accessible to all with these prerequisites. The chapters often expand at the close with innovations of interest to practitioners of data science. Each chapter includes exercises of varying levels of difficulty. The text is eminently suitable for self-study and an exceptional resource for practitioners. « less
Learn Data Science Programming in Python including munging, aggregating, and visualizing data.
Over 60 practical recipes to help you explore Python and its robust data science capabilities
ABOUT THIS BOOK
* The book is packed with simple and concise Python code examples to effectively demonstrate advanced concepts in action
* Explore concepts such as programming, data mining, data analysis, data visualization, and machine learning using Python
* Get up to speed on machine learning algorithms more » with the help of easy-to-follow, insightful recipes
WHO THIS BOOK IS FOR
This book is intended for all levels of Data Science professionals, both students and practitioners, starting from novice to experts. Novices can spend their time in the first five chapters getting themselves acquainted with Data Science. Experts can refer to the chapters starting from 6 to understand how advanced techniques are implemented using Python. People from non-Python backgrounds can also effectively use this book, but it would be helpful if you have some prior basic programming experience.
WHAT YOU WILL LEARN
* Explore the complete range of Data Science algorithms
* Get to know the tricks used by industry engineers to create the most accurate data science models
* Manage and use Python libraries such as numpy, scipy, scikit learn, and matplotlib effectively
* Create meaningful features to solve real-world problems
* Take a look at Advanced Regression methods for model building and variable selection
* Get a thorough understanding of the underlying concepts and implementation of Ensemble methods
* Solve real-world problems using a variety of different datasets from numerical and text data modalities
* Get accustomed to modern state-of-the art algorithms such as Gradient Boosting, Random Forest, Rotation Forest, and so on
Python is increasingly becoming the language for data science. It is overtaking R in terms of adoption, it is widely known by many developers, and has a strong set of libraries such as Numpy, Pandas, scikit-learn, Matplotlib, Ipython and Scipy, to support its usage in this field. Data Science is the emerging new hot tech field, which is an amalgamation of different disciplines including statistics, machine learning, and computer science. It's a disruptive technology changing the face of today's business and altering the economy of various verticals including retail, manufacturing, online ventures, and hospitality, to name a few, in a big way.
This book will walk you through the various steps, starting from simple to the most complex algorithms available in the Data Science arsenal, to effectively mine data and derive intelligence from it. At every step, we provide simple and efficient Python recipes that will not only show you how to implement these algorithms, but also clarify the underlying concept thoroughly.
The book begins by introducing you to using Python for Data Science, followed by working with Python environments. You will then learn how to analyse your data with Python. The book then teaches you the concepts of data mining followed by an extensive coverage of machine learning methods. It introduces you to a number of Python libraries available to help implement machine learning and data mining routines effectively. It also covers the principles of shrinkage, ensemble methods, random forest, rotation forest, and extreme trees, which are a must-have for any successful Data Science Professional.
STYLE AND APPROACH
This is a step-by-step recipe-based approach to Data Science algorithms, introducing the math philosophy behind these algorithms. « less
Along with these general skills, the authors illustrate several applications that are relevant to data scientists, such as reading and writing spreadsheet documents both locally and via Google Docs, creating interactive and dynamic visualizations, displaying spatial-temporal displays with Google Earth, and generating code from descriptions of data structures to read and write data. These topics demonstrate the rich possibilities and opportunities to do new things with these modern technologies. The book contains many examples and case-studies that readers can use directly and adapt to their own work. The authors have focused on the integration of these technologies with the R statistical computing environment. However, the ideas and skills presented here are more general, and statisticians who use other computing environments will also find them relevant to their work.
Deborah Nolan is Professor of Statistics at University of California, Berkeley.
Duncan Temple Lang is Associate Professor of Statistics at University of California, Davis and has been a member of both the S and R development teams. « less