Using Deserved Run Average

Baseball Prospectus officially released the new Deserved Run Average (DRA) this week, fresh with a new set of improvements, as always. The main site will have more information coming soon to highlight some of the specific methodological tweaks that were made for the latest DRA. In the meantime, the data are here to play with and analyze, and (arguably) the most exciting update made to the statistic is the inclusion of error bars for both DRA and (by extension) Wins Above Replacement Player (WARP). This is an exciting update because the work of Jonathan Judge and the Baseball Prospectus stats team are arguably opening the newest door of the so-called “analytics movement” to the public, and embracing a general statistical concept that ought to be discussed throughout the public: uncertainty.


On Method:
When I run Twitter chats from BPMilwaukee, one of the most curious things to my mind is that followers of BPMilwaukee will not necessarily support general BP stats work. No concerns there, really; it’s not necessary to “brand” MLB stats analysis, and indeed when one begins supporting stats-as-brands, that’s just as problematic as how so-called Old School stats like Runs Batted In or Earned Run Average are used in orthodox baseball discussions. No, what I find curious is the general idea that a stat like WARP or DRA is faulty because it is “made up,” which is presumably a concern because the BP stats team are extremely transparent about how the stats are constructed and also how (and why) they are changing. So folks actually know that DRA changes…which is different than how the vast majority of websites present baseball stats. What is problematic about this attitude about DRA is that it ignores how other statistics are merely “constructs” in the very same way that DRA is merely a construct, and it also trades in the murky waters of false certainty.

For the past two years, I have worked in Community Development and Economic Development positions while completing a professional urban planning and policy degree. I used to believe that I was a “stats” guy or an “analytics” guy, but I never quite understood the importance of what actual statistical analysis means until I was forced to reckon with my biases while training for economic analysis. Before I learned and studied stats, and was required to use them on the job, I thought the “numbers” were most important. While fields aligned with statistics are concerned in some sense with “numbers” and thus with producing “numbers-oriented results” (i.e., sometimes your boss really wants the results of your analysis), by far the most important elements of statistical analysis are “concept validity,” methodology, and uncertainty. What is most important about statistical analysis is process, it turns, out: how an analyst reaches a conclusion is much more important than the concluding numbers on their own, for it is only in light of outlining methodology, and explaining what is at stake with a certain measurement, that anyone (including a consumer of those numbers) could understand the numerical results of statistical analysis.

It’s ironic that many victories of the so-called “analytics movements” are now enshrined in their own dangerous orthodoxy, for what everyone seems to have forgotten is that even if the debate was about numbers, the original controversy was to convince the “Guards of Baseball Knowledge and Value” that there were legitimately different ways of thinking about the game and that that meant there were legitimately different measurements that could be presented. Somewhere along the line, we became obsessed with those measurements, rather than the process-oriented creed of focusing on how to think about baseball. This extends to statistical analysis, then, too: it is as though when many fans were convinced of the merits of WARP and other stats, they simply turned over the box containing ERA, RBI, etc., dumped out those contents, and stuck the new measurements into the box. That was never the point, and to the extent that many of us did not communicate the significance of process-oriented thinking about baseball stats, that was our problem (and I place myself in this camp, having only realized the significance of this issue over the last few years).

Anyway, “concept validity” is the most important thing that I have learned about statistical analysis, aside from clearly stating your uncertainty in proper terms. “Concept validity” is basically the extent to which the phenomena you’re trying to measure match the methods that you’re using to measure the phenomena. What should be inherent in this process is an understanding that as an analyst’s approach to measuring phenomena changes, so too should their results change; one need not hold the numerical results of analysis sacred, for if new empirical evidence emerges, methodological research unearths a better way to measure something, or a literature review reveals a better way to define a concept, there is nothing wrong with the analytical results changing.

So, keep this in mind when you’re thinking about why DRA has “changed.” DRA doesn’t “hate” anyone on your team, or love them. It is not a mark against DRA, or WARP, that the stat is consistently updated and changed, because that is a sign that its authors are attempting to reach that mark of “concept validity.” If it is the goal of DRA “to tease out the most likely contributions of pitchers to the run-scoring that occurs around them” and updated methodological approaches, or an updated understanding of pitching-related data, helps to accomplish that goal, revising the stat is a methodological strength. That said, I can understand that within a statistics field, one may have disagreements with some of the particular methodological approaches; but I don’t take any substance of that type of disagreement to dismiss the value of the overall methodological process of DRA.

This is why the new DRA is so important: it continues Baseball Prospectus’s commitment to presenting uncertainty (as has been done on Brooks Baseball, as one example) in publishing baseball statistics. Embrace this approach: so far as DRA is “made up,” it is made according to a methodologically sound process that upholds honest and transparent thinking about uncertainty.


DRA Values
One of the approaches to constructing DRA is to valuate the Run-value of pitching outcomes, and those outcomes are published by Baseball Prospectus. These elements are arguably more important than the DRA output itself, for these outcomes show the balance of a pitcher’s performance: is a pitcher saving runs during hits, balls not in play (e.g., Home Runs, strike outs, walks, etc.), or outs on balls in play?

My favorite Brewers pitcher, Zach Davies, is a “casualty” of the new DRA (h/t to Kyle Lesniewski for beating me to this realization). But we’re not going to say, “DRA hates Zach Davies.” On the contrary, it is possible to see that from Davies’s Out Runs (-1.4), Not In Play (NIP) Runs (1.9), Hit Runs (1.4), and Framing Runs (-0.1) that Davies is not getting the job done in terms of limiting runs when the ball isn’t in play, and he’s not limiting runs that occur on hits, either. Here’s how the 2018 Brewers look, sorted by NIP Runs (Josh Hader is real!):

Pitcher IP NIP Runs Hit Runs Out Runs Framing
Hader 18 -4 -1.6 1.9 -0.1
Barnes 16 -1.9 -1.5 1.2 0
Williams 9.3 -1.6 -1.2 1.1 0
Drake 12.7 -1.1 -1 0.6 0
Woodruff 9.3 -0.5 0.3 -0.2 -0.1
Houser 2 -0.4 -0.2 0.2 0
Suter 30.3 -0.3 3.4 -1.6 -0.1
Knebel 2.7 -0.2 -0.1 0.1 0
Perez 0.3 -0.1 -0.1 0 0
Hoover 1.3 0 0.4 -0.2 0
Jeffress 14 0 0.8 -0.3 0
Albers 13.3 0.2 0.5 -0.2 0
Guerra 22 0.2 0.1 -0.1 0
Lopez 3 0.3 0.4 -0.1 0
Jennings 13 0.5 0.2 -0.5 0
Anderson 34.7 1.3 1 -1.6 -0.1
Davies 34 1.9 1.4 -1.4 -0.1
Chacin 33.7 2.7 0.1 -1.9 0

These run elements help to define DRA. At this point in the season, however, it’s important to note just how large the Standard Deviation appears for DRA. For example, Davies’s DRA is currently published at 6.02, but with a standard deviation of 1.00, approximately 70 percent of the time, Davies could be expected to land between 5.02 DRA and 7.02 DRA. Tracking DRA with RA9 (Runs Allowed per 9 IP), something like a 5.02 RA9 gets Davies into respectable rotation territory, and there’s no telling that the righty could also prevent runs to a greater extent (i.e., serve as an even greater outlier).

Here are Brewers starters by variation, sorted by lowest Standard Deviation.

Pitcher DRA DRA SD DRA_Low DRA_High
Perez 0.75 0.1 0.65 0.85
Hader 0.92 0.19 0.73 1.11
Williams 1.22 0.38 0.84 1.6
Barnes 1.43 0.4 1.03 1.83
Drake 1.78 0.58 1.2 2.36
Guerra 3.71 0.68 3.03 4.39
Houser 1.25 0.74 0.51 1.99
Anderson 4.49 0.8 3.69 5.29
Jennings 4.24 0.84 3.4 5.08
Chacin 4.39 0.9 3.49 5.29
Davies 6.02 1 5.02 7.02
Albers 4.99 1.07 3.92 6.06
Jeffress 5.07 1.1 3.97 6.17
Suter 4.91 1.15 3.76 6.06
Woodruff 2.69 1.3 1.39 3.99
Knebel 2.05 1.75 0.3 3.8
Lopez 9.52 3.47 6.05 12.99
Hoover 8.31 4.8 3.51 13.11

Now, let’s repeat this measurement with WARP, which should help to underscore the extent to which fans should quote Replacement Level stats with certainty. Doesn’t this make you wonder what the error bars might be on Baseball Reference or FanGraphs WAR? Hopefully those websites follow suit and publish WAR error bars where possible.

Perez 0.02 0 0.02 0.02
Houser 0.08 0.02 0.06 0.1
Hader 0.8 0.04 0.76 0.84
Williams 0.38 0.04 0.34 0.42
Knebel 0.08 0.05 0.03 0.13
Barnes 0.62 0.07 0.55 0.69
Hoover -0.05 0.07 -0.12 0.02
Drake 0.44 0.08 0.36 0.52
Lopez -0.15 0.12 -0.27 -0.03
Jennings 0.1 0.12 -0.02 0.22
Woodruff 0.25 0.13 0.12 0.38
Albers -0.01 0.16 -0.17 0.15
Jeffress -0.03 0.17 -0.2 0.14
Guerra 0.39 0.17 0.22 0.56
Anderson 0.31 0.31 0 0.62
Chacin 0.34 0.34 0 0.68
Davies -0.28 0.38 -0.66 0.1
Suter 0.13 0.39 -0.26 0.52

What I find extremely interesting about this exercise is the extent to which the Brewers starting pitchers exhibit variation in their potential WARP production. Almost to a man, the Brewers remaining rotation (after Brent Suter was moved to the bullpen to make room for Wade Miley) could range anywhere from replacement level to solid rotation piece (for reference, among 149 pitchers with 17.0 IP or higher, 0.34 WARP is a median 2018 performance thus far). This will be a stat worth watching for the remainder of 2018.

Finally, the last stat worth watching is whether the Brewers can continue to out perform their DRA. For my last publication on Runs Prevented, the Brewers as a pitching staff were approximately 18 runs better than their DRA suggested. My hypothesis here is that the Brewers groundball efficiency machine is leading this charge, but that could be one of many explanations including random luck.

Lopez 9.52 3 6.52
Jeffress 5.07 0.64 4.43
Albers 4.99 1.35 3.64
Guerra 3.71 1.23 2.48
Anderson 4.49 2.86 1.63
Davies 6.02 4.5 1.52
Jennings 4.24 2.77 1.47
Houser 1.25 0 1.25
Perez 0.75 0 0.75
Hader 0.92 1.5 -0.58
Suter 4.91 5.64 -0.73
Barnes 1.43 2.25 -0.82
Chacin 4.39 5.35 -0.96
Woodruff 2.69 3.86 -1.16
Williams 1.22 2.89 -1.68
Drake 1.78 6.39 -4.62
Knebel 2.05 10.12 -8.07
Hoover 8.31 20.25 -11.94

These statistics provide a wide range of tools for Brewers fans and analysts. Ranges of DRA and WARP can be compared in order to assess both uncertainty and potential overlapping fields of value. To my mind, the best aspect of this new presentation is that fans and analysts no longer need to feign false certainty over WARP, and this is great; one shouldn’t need to say “Zach Davies has a 6.02 DRA” right now, when one can say “Davies’s DRA ranges from 5.02 to 7.02.” This exercise can be repeated throughout the season, and perhaps through embracing uncertainty we can find better hypothesis about how and why a team is under-performing (or over-performing) their peripheral stats or DRA estimates.


Photo Credit: Patrick Gorski, USA Today Sports Images

Related Articles

2 comments on “Using Deserved Run Average”

Philip Schumacher

I have a question about how standard deviation is used. In the table above, it appears that the estimated DRA is assumed to have a normal distribution and that there is no skewness to the data. However, DRA has a minimum of zero and a maximum of infinity. So as a pitcher’s DRA approaches zero, the distributions of the estimate is likely no longer normal but may be a gamma distribution, for example. In that case the standard deviation may differ for values to the “left” of the most likely value vs to the “right”. Would it be statically more accurate to assume a non-normal distribution and have the standard deviation differ for DRA?

Thank you for the article and discussion – I think this is a great way to better evaluate players.

Nicholas Zettel

This is a really good question. I’m going to look into this.

Leave a comment