Always Learning

Advanced Search

Pandas for Everyone

Pandas for Everyone

Python Data Analysis

Daniel Chen

Feb 2018, Paperback, 416 pages
ISBN13: 9780134546933
ISBN10: 0134546938
Special online offer - Save 30%
Was 29.99, Now 20.99Save: 9.00
  • Print pagePrint page
  • Email this pageEmail page
  • Share

This tutorial teaches everything you need to get started with Python programming for the fast-growing field of data analysis. Daniel Chen tightly links each new concept with easy-to-apply, relevant examples from modern data analysis.

Unlike other beginner's books, this guide helps today's newcomers learn both Python and its popular Pandas data science toolset in the context of tasks they'll really want to perform. Following the proven Software Carpentry approach to teaching programming, Chen introduces each concept with a simple motivating example, slowly offering deeper insights and expanding your ability to handle concrete tasks.

Each chapter is illuminated with a concept map: an intuitive visual index of what you'll learn -- and an easy way to refer back to what you've already learned. An extensive set of easy-to-read appendices help you fill knowledge gaps wherever they may exist. Coverage includes:

  • Setting up your Python and Pandas environment
  • Getting started with Pandas dataframes
  • Using dataframes to calculate and perform basic statistical tasks
  • Plotting in Matplotlib
  • Cleaning data, reshaping dataframes, handling missing values, working with dates, and more
  • Building basic data analytics models
  • Applying machine learning techniques: both supervised and unsupervised
  • Creating reproducible documents using literate programming techniques

Part I. Introduction
0. Setting Up
1. Introduction to Panda's Dataframes
2. Dataframe Components
3. Performing Statistics and Calculations on Sliced and Grouped Dataframes
4. Plotting in Matplotlib

Part II. Data Munging
5. Basic Data Cleaning
6. Reshaping Dataframes
7. Missing Values
8. Working with Dates
9. Working with Multiple Dataframes
10. Working with Databases

Part III. Modeling
11. Basic Statistics
12. Linear Models and Regression
13. Survival Analysis
14. Model Selection and Diagnostics
15. Time Series

Part IV. Machine Learning
16. Supervised Learning
17. Unsupervised Learning

Part V. Reproducible Documents (Literate Programming)
18. Jupyter Notebook
19. Pweave

Appendices

  • Establishes a solid foundation for all the Pandas basics needed to be effective
  • Covers dataframes, statistical calculations, data munging, modeling, machine learning, reproducible documents, and much more
  • Teaches step-by-step through easy, incremental examples, with plenty of opportunities to "code along"

Daniel Chen is a graduate student in the interdisciplinary PhD program in Genetics, Bioinformatics & Computational Biology (GBCB) at Virginia Tech. He is involved with Software Carpentry as an instructor and lesson maintainer. He completed his master’s degree in public health at Columbia University Mailman School of Public Health in Epidemiology, and currently works at the Social and Decision Analytics Laboratory under the Biocomplexity Institute of Virginia Tech where he is working with data to inform policy decision-making. He is the author of Pandas for Everyone and Pandas Data Analysis with Python Fundamentals LiveLessons.