It’s early in the minor league season, which causes one particular issue in prospect coverage: thus far, the scouting report and prospect list coverage is in the works, so there is an information asymmetry in terms of statistics versus scouting information. This is problematic because, in the first place, minor league statistics do not have the same type of contextual meaning that MLB statistics demonstrate; minor league players may be working on specific player development assignments from front office, they may be at varying development points, and the varying leagues may have different levels of competition (for example, Class-A ball would not be viewed as the same type of development environment as Class-AA ball, and there are completely different things one might expect from prospects at each level). Second, scouting information attaches physical grades to players that may provide insights about potential MLB roles that are not visible through statistics One example here might be someone like Kyle Wren, who is advanced minor league depth that thus destroys the baseball, but does not necessarily receive acclaimed MLB roles through scouting reports; an alternate example might be someone like KJ Harrison, who has poor stats to start 2018 but would not have that reality impact his actual scouting role or ascent to developing a potential MLB role. Minor league stats are meaningless for these reasons, as there is no published connection between physical grades, conceptual grades (such as what the context might be for a hit tool or role risk), and statistical performance on the field.
2017 Minors: Bats
For this reason, I periodically publish statistical indices that link individual player performance to the statistical contexts of their respective leagues. This should not be viewed as a scouting exercise, or an assessment that has any validity in actually designing an MLB role. At best, players registering surprising index rankings due to the statistical context of their respective leagues should simply be viewed as opportunities to look for further scouting assessments or to dig into existing scouting assessments. These data should be viewed as important for outlining which prospects are facing tough competition and which prospects are facing weak competition; which prospects are old for their respective levels and which prospects are young; which prospects have faced favorable park environments, and which prospects have faced tough parks; and, new for this year’s indices, which players have produced “lucky” Batting Averages on Balls in Play and which players have not.
This post is presented in reverse order. I understand some will just want to see the results, so the results are published next. I’ve also included a spreadsheet with cleaned data for further manipulation or analysis. Following the table will be an explanation of methodology to accompany the spreadsheets.
Player (50+ PA): This is the name of the player, included where PA >= 50
Team: This is the Baseball Prospectus code for the player’s team
Age: This is the player’s age
OPS: This is the player’s harmonic On-Base-Percentage-plus-Slugging-Percentage (OPS) index, which assesses a BABIP-adjusted OPS against (a) the player’s opposing OPS (i.e., strength of opposing pitchers) scaled by the player’s age and park (i.e., difficulty of environment), and (b) the player’s opposing OPS compared to the league median opposing OPS (for players with 50+ PA).
TAv: This is the player’s True Average (TAv) index, which assesses (a) the player’s opposing TAv scaled by age and park, and (b) the player’s TAv compared to league median opposing TAv (for players with 50+ PA).
LeagueRank: This is a rough approximation of the player’s index “percentile” within their respective league (higher = better rank).
1.00 or greater means that the player’s performance based on contextual statistics is likely “better than median” within their respective league.
Less than 1.00 means that the player’s performance based on contextual statistics is likely “worse than median” within their respective league.
Brewers Minor League System
Below is a table of Brewers minor leaguers with 50 or more plate appearances through May 9, ranked by their contextual statistic league percentile.
|Player (50+ PA)||Team||Age||OPS||TAV||LeagueRank|
What does this mean? Let’s directly compare two minor leaguers to show the impact of minor league context. Consider Nate Orf, everyone’s favorite utility player (call up Orf!), and Wendell Rijo, a nearly-forgotten prospect acquired from Boston in the Aaron Hill deal. Rijo has never really gotten it going in Milwaukee’s system, while Orf has come on strong hitting the last few years. By both TAv and OPS, as well as level (Orf is one level beneath MLB), Orf appears ahead of Rijo on the surface:
|Nate Orf||Pacific Coast||28||.313||.966|
However, contextually, some key differences emerge. While neither Orf nor Rijo have faced difficult competition, Rijo is notably younger for Double-A Biloxi than Orf is for Triple-A Colorado Springs. Meanwhile, Rijo and Orf have had completely different luck in terms of BABIP, and Orf’s park impacts his production as well.
Thus, using these factors to provide context for each player’s TAv and OPS, the surface view is obscured: Rijo’s .309 TAv and .849 OPS appear much more impressive given his age, park, and BABIP, and that assessment holds steady even given Rijo’s relatively mediocre competition. Note, once again, that none of this says anything about MLB role; Nate Orf should still be considered in MLB roster depth conversations if necessary, due to the merits of his flexible glove and overall hit tool. What this should say is that fans and analysts might want to give Rijo an extended look, or seek additional scouting information about Rijo (a good, early, pre-trade look at Rijo can be found at the excellent SoxProspects).
In order to construct contextual OPS and TAv, I used two different methods, and sought the harmonic mean between those two indices. The goal here is to recognize that (a) there is no proper or easy way to measure context, and (b) outliers can have an extreme effect on data. To reconcile (a), I used two completely different calculations to assess strength of opposition, and to reconcile (b) I used the harmonic mean, which irons out the extremes between two numbers moreso than standard average (for example, using [(2*X*Y)/(X+Y)] with 1.5 and 1.0, standard average calculation says 1.25 is the mean, while harmonic mean says the mean is 1.2. Thus the distance between 1.5 and 1.0 is expressed and equalized more effectively by using harmonic mean).
After collecting a number of statistics from Baseball Prospectus CSV (“Individual Stats – by Team” and “Batter Quality of Opponent Faced,” both retrieved May 10, 2018), I excluded players with fewer than 50 PA, collected TAv and OPS; as well as oppOPS and oppTAv (opposing OPS and TAv, respectively, of pitchers faced), and opposing RPA+ (the opposing pitchers’ runs per plate appearance relative to the league); and, BABIP, Age, and Park (Baseball Prospectus BPF, for batter’s park environment). Once these statistics were collected, I calculated the median for each league, and then divided each player’s individual statistic by the league median. This provides a basic contextual index that could read, “Player’s Statistic Compared to League Median,” where 1.01 is larger than median and 0.99 is smaller than median. Note that this does not uniformly mean “better or worse;” the value system for an Age Index of 0.92 (younger than median league age) would be much different than a 0.92 index for BABIP (lower BABIP than league median).
From these basic index figures, I calculated several new stats:
Luck: [OPS Index] / [BABIP Index]. This was an index statistic used to estimate the impact of a high BABIP on OPS.
oppOPS_1: [OppOPS] * [Age Index * Park Index]. This is an attempt to estimate what a player’s opposing OPS might be if corrected for Age and Park (i.e., higher Age Index suggests a player is older for their level, and higher Park Index suggests a player worked in a more favorable park).
oppOPS_2: [OppOPS] * [oppOPS Index]. This is a more straightforward attempt to express the player’s opposing OPS compared to the league median back into an opposing OPS scale.
oppTAV_1: [OppTAv] * [Age Index * oppRPA Index]. This is an attempt to estimate what a player’s opposing TAv might be if corrected for age and the opposing run production allowed by pitchers.
oppTAV_2: [OppTAv] * [oppTAv Index]. This is a more straightforward attempt to express the player’s opposing TAv compared to the league median back into an oppTAv scale.
Once I calculated these stats, I took the harmonic mean between each set (harmonic index). This allowed me to directly compare each player’s TAv and OPS to their harmonic index for both stats:
OPS Index: [OPS / BABIP Index] / [Harmonic Index of OPS]
TAv Index: [TAv] / [Harmonic Index of TAv]
I divided each player’s OPS by their BABIP Index to attempt to correct for the impact of their luck on their OPS. Thus, if a player had an inflated BABIP, the assumption would be that the OPS would partially be inflated as well due to the inclusion of component parts. So, this correction is meant as an additional contextual expression of a player’s performance.
It must be stated that these indices are quite problematic, given that they are simply expressions of one point in time. They are not scaled for historical importance, or regressed to establish any linear significance of the component statistics. This is a methodology that I am working through, and I opted to publish these stats any how because I do not believe there has been enough of a discussion about the impact of context on 2018 minor league stats thus far. So, this is meant to be the beginning of a work in progress, and for that reason any feedback is much appreciated.