Research Project – Best Pitch to Throw in Each Count (2022 Houston Astros)

What truly moves the needle and drives success for baseball teams? At all levels, it’s fairly common to turn to pitching, with the idea that hitting a baseball is so fundamentally hard that it can be nearly impossible against a well-executed pitch.

Stemming from this idea is the research question many baseball teams are asking right now; what type of pitch is the best to throw in each individual count?

There are too much situational data and confounding variables that force our question to be a little more detailed; for each pitcher, which pitch had the best results in each individual count? (Even this doesn’t eliminate all confounding variables).

Here is how I went about answering this question:

Data Collection

All data for this project was obtained via This link is the unique filter tool that I used to download the following pieces of information in the form of a csv file:

  • Team (I did Houston Astros as defending champs with dominant pitching staff)
  • Pitcher name
  • Pitch type
  • Count
  • Spin rate
  • Velocity
  • wOBA allowed

The first file used for this research contains 332 rows of data for Houston Astros pitchers, with variables to distinguish pitch count and pitch type.

The next file used has 3,423 rows of data for a large sample of all MLB pitchers (not representative yet) that includes the same information as the previous file but for every qualified pitcher across the league. It also contains more detailed data points such as velocity, spin rate & whiffs that should be used for further research or analysis.

There are some instances in which the wOBA allowed for all pitches from a sample was 0, in the count + pitch combo of 2-0 changeups and 2-0 curveballs. There are also the following caveats to certain data:

n = number of pitches tracked, p = number of unique pitchers to throw the pitch at least once

  • 3-0 changeup
    • n = 5, p = 5
  • 3-0 curve
    • n = 3, p = 2
  • 3-0 slider
    • n = 6, p = 4
  • 3-1 curve
    • n = 10, p = 4
  • 3-1 changeup
    • n = 21, p = 7

Define ‘Success’

Within the selected variables is the one that I have designated as a ‘success’ metric for the purpose of this research (wOBA allowed).

wOBA is used because it poses the highest correlational strength with runs scored for an offense, and it is easy to calculate although not easy to obtain raw data for. This is because the coefficients/multipliers used in the calculation change each season based on which outcomes lead to the most runs. Essentially, it assigns a varying weight for each outcome of an at bat, using the intuitive knowledge that a walk is not worth the same to a baseball team as a home run, but in a traditional stat like OBP, they’re considered equal.

I have to remind myself often that when we measure and analyze these types of statistics for pitchers, the lower number is typically more desired, meaning lower is better. Regardless, this is how Fangraphs classifies wOBA categories:

wOBA Rules of Thumb

Above Average.340
Below Average.310

However, with the available data we can gather a more conclusive answer of categorization.

Findings + Interpretation

Below is the set of data visualization pieces created in IBM SPSS Statistics software with the pitch data previously mentioned. These scatter plots illustrate which pitch type has the most/least success for 8 different HOU pitchers in the 2022 season.

For a more detailed view, the tabular representation of the data is below:

Luis Garcia

Best pitch on 2-2: Changeup

Justin Verlander

Best pitch on 0-0: Fastball

Jose Urquidy

Best pitch on 1-2: Fastball

Ryan Pressly

Best pitch on 0-0: Curveball

Future Research

Although not included in this project, I can envision two areas where future research can be directed;

  1. Compare more metrics such as whiff rate or velocity
  2. Obtain multiple seasons of data for individual pitchers

NFL QB Power Rankings: Week 18 – Statistical Data Model

To save time and debate, I created an objective answer to who the best QB in the NFL is right now. Below is the statistical ratings for each QB, and the categories/weights used to conduct the data analysis. For a more detailed and custom view, download the file below the embedded ratings.

The following chart illustrates the statistical relationship between variables and Win %, an important dependent variable in this case. These insights give us reason to use EPA, QBR, ANY/A, PFF grade and Passer Rating in our data model to measure QB performance.

Data visualization by Tej Seth & Joey DiCresce for Michigan Football Analytics Society

While measuring the relationship to wins in the same season is valuable, identifying the relationship to predict future season data is extremely relevant in this case. This gives us ideas for the weight of each category (which can always be adjusted), and PFF grade should clearly be the highest. Followed by a substantial gap with EPA and ANY/A and finished off with QBR and Passer Rating.

Data visualization by Tej Seth & Joey DiCresce for Michigan Football Analytics Society. *The r-squared values are low, as the r-squared value for win% and next season win% is really low at .07. Football is hard to predict.

The data will be collected from Pro Football Reference (Passer Rating, QBR, ANY/A), PFF (PFF Grades), and ESPN (EPA).

The TRUTH Behind ‘Featured’ Parlays or ‘Super Boosts’ on Sports Betting Apps

In the past two weeks, there was 61 total ‘boosted’ parlays across these major sportsbooks; BetMGM, FanDuel, Caesars and Barstool. 55 of those boosted parlays lost.

Now, we will acknowledge the odds of these parlays, because even though some of them are boosted, they are still relatively low probability bets. Given the data imported and scraped from Sports Betting Dime, we see that the average odds for these parlays was +522.38.

If a user of these major sportsbook placed $10 on each of these parlays over the two weeks, that unlucky user would have lost $416.98, with a return on investment of -216.03%.

These shocking insights would suggest a strategy to identify what the boosts are for specific games, and pinpoint if there are any individual markets to target a ‘fade’ of these boosted parlays. The sportsbooks offer these ‘boosts’ and ‘promotions’ for a reason!

Here is a copy of the scraped data and calcuations:

    March Madness: Comparing Seeds, Analyzing Statistics with

    Deciding who to pick in each game of a 67-game tournament is exceedingly difficult, but I have created an interactive dashboard that displays offense and defense efficiency for each tournament team. I highly recommend checking it out near the end (it’s FREE).

    You can use this tool to decide which teams could be on upset alert, which games are evenly matched, and/or which top tier teams have what it takes to win it all. By being able to filter based on team, the analysis possibilities are endless with this dashboard.

    To use, simply scroll below, select two or more teams from the left-side filter, and interpret the data shown on the graph. Here is an example of what can be done, viewing the top five teams according to

    This unique comparison likely won’t be needed until the final four, however it’s still evident that Gonzaga & Arizona are not just among the best with their offense, they run their offenses extremely fast. Remember, higher score is better for offense, but lower score is better for defense, similar to an actual basketball game. That would indicate how dominant Gonzaga has been – their gap between offense and defense is noticeably wider than anyone else in the top five, so they’d be able to match the pace of Arizona while limiting more of their opportunities.

    (Keep in mind it looks messy at first until you select two teams. To select multiple teams on Windows, hold ‘control’ as you select each team. On Mac, hold ‘command’ as you select each team. On mobile, swipe each team name box to the right (towards the graph) to add to the visual.

    Now, try it for yourself!

    Blog at

    Up ↑