Wrangling Major League Baseball Pitchf/x Data with Python

Geek Alert! This is fundamentally a class in programming using the Python language. We chart MLB GameDay - PitchF/x data using Jupyter Notebooks, Python, Pandas and MatPlotLib.

Beginner 0(0 Ratings) 5 students enrolled English
created by Chaz Henry
Last updated Thu, 29-Sep-2022
+ View more
Course overview

In the 2006 playoffs, Major League Baseball debuted a pitch tracking camera system called PitchF/x. Now installed in every MLB stadium, the system has been continually extended and re-branded. From cameras to TrackMan radar, from StatCast, to GameDay – MLB now tracks every pitch and every player's movement on each pitch. The data are made public on the MLB web site and SaberMetricians world-wide pour over every detail. The teams themselves, average five or more statisticians dedicated to analyzing the data to aid in selecting and improving players.

I'm Chaz Henry - a software engineer, 12 year little league coach and founder of the PowerChalk dot com website. In this class, we're going to open a fresh Jupyter Notebook, grab the MLB game data from Clayton Kershaw's 2014 no-hitter and wrangle that data in Python. It's an introduction in SaberMetrics - the empirical study of baseball statistics.

We'll use built-in Python libraries and graph the pitches with MatPlotLib and PyPlot. Along the way we'll talk about best practices for Jupyter Notebook, Python coding, XML parsing and maybe a little baseball.

So, if you're a coder, a SaberMetrician or a just a baseball fan who wants to peek behind the curtain at what's driving MoneyBall and the next wave of player development, sign up for the course and let's start scrubbing the pitch data from one of the greatest pitching performances in MLB history.

What will I learn?

  • How to create and program a Jupyter Notebook in Python.
  • How to extract XML pitch data from the MLB website.
  • How to coerce XML tree data into a Pandas Dataframe.
  • How to extract Dataframe slices into multiple views.
  • How to plot pitch data with Matplotlib and Pyplot graphs.
  • Adding data columns to a Pandas Dataframe.
  • Plotting pitch tendency as pie charts (by ball-strike count).
Requirements
  • Beginner or intermediate Python programmers.
  • SaberMetric baseball fans.
curriculum for this course
30 Lessons 02:29:46
Introduction
9 Lessons 00:20:03
  • Intro
    preview 00:02:00
  • The Strike Zone
    preview 00:02:01
  • XML Addendum
    00:00:29
  • XML links (for copy and paste)
    .
  • GameDay - Pitch f/x Data
    00:05:53
  • Jupyter Notebook Intro
    00:03:06
  • Installing/Running Jupyter Notebok
    00:01:31
  • Jupyter Notebook Basics
    00:02:50
  • Jupyter Notebook Rich Documentation
    00:02:13
Coding
9 Lessons 01:07:05
  • Imports
    00:06:10
  • Player XML to Element Tree
    00:05:25
  • XML Element Tree to Python Dictionary
    00:05:24
  • Inning and Pitch Data
    00:08:11
  • For Each At-Bat
    00:13:18
  • For Each Pitch
    00:06:11
  • Each Pitch to XML Element Tree
    00:04:10
  • Innings to Pandas Dataframe
    00:11:40
  • More Pandas Dataframe
    00:06:36
Plotting
5 Lessons 00:32:31
  • Plotting: Line/Scatter
    preview 00:01:00
  • Dataframe Slices
    00:03:11
  • Plotting the Strikezone
    00:10:43
  • Charting Pitches against the StrikeZone
    preview 00:06:49
  • Charting Pitch Location - R&L Handed Batters
    00:10:48
Plotting Variations
3 Lessons 00:07:41
  • Dickerson at Bat
    00:02:26
  • Labeled Pitches
    00:02:45
  • Legends
    00:02:30
Kershaw Pitch Tendencies
2 Lessons 00:19:51
  • Adding Ball/Strike Count Columns
    00:13:08
  • Plotting Kershaw Tendency by Count
    00:06:43
Wrap Up
2 Lessons 00:02:35
  • Recap
    00:00:58
  • Thank You!
    00:01:37
+ View more
other related courses
00:31:07
updated Tue, 20-Dec-2022
0 22 Free
about instructor

Chaz Henry

In 2000 Chaz sold the company that he built from his Computer Science Masters thesis at NC State to a public company in Silicon Valley. Since then he has built an online video game (StMulligan), a Facebook Chatbot for Ticketmaster, a cloud based video analysis system used by the Los Angeles Dodgers (PowerChalk) and a Raspberry Pi based sports camera system. He is a 12 year Little League baseball coach and an avid sports fan.

0 Reviews | 24 Students | 2 Courses
java programming python baseball hitting pitching
http://www.linkedin.com/in/chazhenry/
student feedback
0
0 Reviews
  • (0)
  • (0)
  • (0)
  • (0)
  • (0)

Reviews

Free
includes: