Fun With LevERA+ At The Think Factory
Thursday, May 6, 2010
, Posted by Gator Guy at 7:13 AM
It was brought to my attention yesterday that my recent post on Clutch Pitchers and LevERA+ had been linked to by The Baseball Think Factory. I learned this from a friend who sent me an e-mail that quoted some of the more amusing misconceptions from the commenters at BBTF. I initially had no intention of addressing these misconceptions, reasoning that it was probably futile to reason with anyone who could have possibly understood the post to be arguing that Steve Trachsel was a great pitcher, or that Jeff Suppan was better than Jim Palmer and Tom Seaver. There were certain commenters at BBTF however who seemed to have at least a passing interest in the concept of LevERA+, and so I'll devote a few more words to the subject. I'll briefly address some of the more amusing misconceptions voiced at the BBTF and then discuss the calculation and conceptual underpinnings of the LevERA+ concept.
"No, I wasn't arguing that Steve Trachsel is better than Tom Seaver..."
Incredibly, more than a few BBTF commenters seemed to think that I was proposing the clutch adjustment factor as a measure of pitching prowess. Even more believed I was arguing that Trachsel or Jeff Suppan were "more clutch" than Tom Seaver or Jim Palmer or Ron Guidry. I was completely mystified as to how anyone could have made these extraordinary leaps of illogic until my buddy pointed out to me that the term "clutch adjustment factor" really is inapt. I have to admit he's correct, and it's possible that some of the more absurd misconceptions expressed at BBTF derive from this ill-advised term. The better term is of course "leverage adjustment factor."
The term "clutch adjustment factor" was inapt for a few reasons. First, although high-leverage situations can reasonably be termed "clutch" situations, they are clutch only within the context of that particular game, without any regard for the significance of the game in the larger context of the season, the standings, or the advancement of a team's pursuit of a pennant or world series championship. My prior posts on the subject of "clutch pitchers" and "big-game pitchers" have focused almost exclusively on the latter concept of clutch, hence the focus on September and hugely consequential games in the context of races for the post-season. A bases loaded, two-out situation in the ninth inning of a one-run game in April is a clutch situation, to be sure, but it's an aspect of clutch performance distinctly different (and, for me, less interesting) than a pitcher's performance in hugely consequential games.
In any event, the statistic I'll now refer to as "leverage adjustment factor" really wasn't the focus of the post anyway. LevERA+ was the focus, and the discussion of leverage adjustment factor just a means of demonstrating that (i) for most pitchers the difference between ERA+ and LevERA+ isn't particularly material, and (ii) for some pitchers the difference is fairly significant, at least insofar that most of us stat geeks attach some significance to 4% and 5% differences in ERA+.
So, to be clear, I'm not proposing Trachsel or Suppan for the Hall of Fame, and I'm not suggesting that Guidry's 10th place ranking on the list of largest leverage adjustment factors is a qualification for the Hall. Guidry's 17th place ranking on the list of highest LevERA+s since 1952, however, is a fact worthy of consideration in assessing his qualifications for the Hall.
The Post had nothing to do with the subject of whether 'clutch performance' is an innate ability
I believe this was the misconception underlying some of the comments about the regression analyses discussed in the post. If it isn't, then the commenters at BBTF simply don't understand regression analysis, and perhaps this is the more likely explanation for some of their comments. Giving them the benefit of the doubt, however, their comments on the regression analyses, though misguided, at least make sense if the commenters were under the impression that I was arguing that the regression analyses established the existence of an identifiable "clutch ability." I wasn't arguing that, and the post was very clear in that regard. Although the issue of whether an innate "clutch ability" can be probabilistically verified has been a hot topic for sabremetricians for a number of years, I've always found the issue to be academic in the extreme. I basically agree with Bill James on the subject: I don't know, nor do I care, whether a particular player's excellent performance in big-games and clutch situations is a function of some innate clutch ability (or coolness under pressure, heightened intensity, superior character or black magic, for that matter), but these performances did occur and shouldn't be dismissed in assessing a player's accomplishments merely because there is some prospect that these clutch performances were a function to some degree of luck or chance.
One particular commenter at BBTF who evinced an almost superhuman ability for misapprehension and misconception argued that the regression analyses were worthless without information about distributive and probabilistic characteristics - the standard deviation, measures of distribution relative to normal distribution, and probability functions. "What number of pitchers would we expect to be above 4% by random chance alone?", he asked. All of this information would be supremely relevant to the question of whether Steve Trachsel's 7.5% leverage adjustment factor was evidence of an innate clutch ability or merely a function of random chance. But all of it is completely irrelevant to whether LevERA+ is more highly correlated with pitcher winning percentage than ERA+.
The regression analyses were useful in testing the hypothesis that LevERA+, by weighting elements of a pitcher's performance by impact on win probability, would be a more accurate predictor of pitcher winning percentage than ERA+. This is the same exercise as that engaged in by the originators of the OPS stat. Although the conceptual validity of combining on-base percentage and slugging average as a measure of contribution to run scoring was manifest, OPS wouldn't be a particularly helpful statistic if it didn't more highly correlate with run scoring than both OBP and slugging average. In particular, the sabremetric pioneers were aware that slugging average attached exaggerated weights to extra-base hits, which rendered slugging average a less predictive measure of run scoring than OBP and could have resulted in OPS being a relatively useless stat. In other words, the addition of OBP and slugging average to create OPS is rather arbitrary and it is only the fact that empirical evidence confirms its higher correlation with run scoring that makes it a more useful stat than OBP and slugging average.
This is what the commenters at BBTF should have been asking if they had understood the regression analyses: did the distribution of the winning percentage and LevERA+/ERA+ exhibit a basically linear relationship? (Yes, it did). Were the parameters for the pitcher populations analyzed by the regression fully described? (They were). Were the selection criteria for the pitcher populations free of criteria that may have introduced selection bias? (They were).
The analysis of the correlation between LevERA+ and winning percentage presented an almost paradigmatic utilization of regression analysis, and the comparison of the bivariate correlations of LevERA+ and ERA+ to winning percentage revealed a statistically significant advantage for LevERA+, just as the comparison of the bivariate correlation of OPS to run scoring revealed its advantages over on-base percentage and slugging average.
No, the LevERA+ statistic provides no advantage to pitchers who improve their performance with men on base, or demonstrate an ability to escape high-leverage situations resulting from their own failure to prevent baserunners.
This is an understandable misconception, but a misconception nonetheless. The LevERA+ statistic is based on the WPA and WPA/Li statistics, and neither statistic credits a pitcher for escaping his own jams. The incremental "win probability added" to a pitcher who loads the bases but escapes the inning without surrendering a run is no different than the win probability added to a pitcher who retires the side in order. As one commenter at BBTF accurately noted, the win probabilities "zero out" for the inning, with the pitcher's retirement of batters with runners on base offsetting the negative win probabilities attributed as a result of permitting batters to reach base. This is easily demonstrated by looking at the win probability and win expectancy increments in a boxscore at B-R.com. Here's the first inning of Steve Trachsel's start against the Dodgers on May 19, 1998, in which he allowed a double and a single but escaped without surrendering a run:
Here's the first inning of Trachsel's start against the Giants on September 29, 1998:
In each case, the summation of the win probability events for the inning reflected in the highlighted field above gave Trachsel the same WPA credit: he increased his team's win probability by 5% by pitching a scoreless top of the first. The fact that he allowed a double and single in the first game and retired the side in order in the second game made no difference.
Another commenter at BBTF queried whether bullpen performance has an impact on LevERA+. It doesn't, and this is another advantage of LevERA+ over ERA+. ERA+, like ERA, charges a run to a pitcher if he leaves the game with a runner on first and two out and the reliever permits the runner to score, and it weights that run the same as it would if the pitcher who walked the batter had surrendered a home run to that batter. WPA, and therefore LevERA+, charges the pitcher only with the probability of a runner scoring from first with two out. The reliever's performance is irrelevant, and the run allowed by the reliever has no affect on the LevERA+ of the pitcher who left the game.
The Calculation of LevERA+ and leverage adjustment factor
LevERA+ is simply the product of ERA+ and the leverage adjustment factor. The leverage adjustment factor is obtained by (i) multiplying a pitcher's clutch stat (listed in the "Win Probability" table at B-R.com) by 10 and then dividing the result by his runs allowed, (ii) adding the quotient obtained in (i) to 1, and (iii) multiplying the results obtained in (ii) by his ERA+. A pitcher with a negative clutch stat will accordingly have a negative result in (i), a figure lower than 1.0 in (ii), and will have a LevERA+ lower than his ERA+. Here's the formula (omitting the "+" from LevERA+ and ERA+ so as not to create confusion in the formula):
LevERA = ERA * (1 + ((clutch*10)/ RA))
As an example, assume a pitcher with an ERA+ of 120, a clutch stat of 1.0 and runs allowed of 100. The clutch stat is multiplied by 10 (i.e., the run conversion factor necessary to turn to the wins-based clutch stat into an equivalent number of runs). The result - 10 - is then divided by 100 (the pitcher's runs allowed). The result - .1 - is then added to 1 to arrive at 1.1. The pitcher's LevERA+ is the product of 120 (his ERA+) and 1.1, or 132. If the pitcher's clutch stat had been -1.0 rather than 1.0, then the product of his clutch stat and 10 would be -10, -10 divided by his 100 runs allowed would be -.1, and the sum of -.1 and 1 would be .9. The product of his ERA+ of 120 and .9 yields a LevERA+ of 108.
The actual conversion unit for converting wins-based stats like WPA to runs varies depending upon the run scoring environment (which is itself a function of the park, general scoring levels in the league and, perhaps most significantly, the pitcher's own performance). However, these factors are already factored into the WPA stat, and therefore the general, historically-derived conversion of 10 runs per win is the appropriate multiplier for the pitcher's clutch stat. (I initially didn't realize this, and my first post on LevERA+ therefore used a run conversion figure calculated individually for each pitcher, which had the effect of double-counting the run-scoring environment factor already contained within WPA).
Conclusion
In sum, Steve Trachsel was a very mediocre pitcher who allowed far more runs on average than did Jim Palmer or Tom Seaver. Trachsel had a propensity however for allowing runs in lower-leverage situations on average than other pitchers, and the distribution of his runs allowed skewed more toward low-leverage situations than the distribution of Palmer's and Seaver's runs allowed. This obviously doesn't make Trachsel a better pitcher - he allowed far more runs that Palmer and Seaver - but the difference between Trachsel and Palmer/Seaver in terms of leverage-weighted runs allowed and LevERA+ was slightly narrower than the difference in terms of mere ERA+.
Again, the list in my prior post of leverage adjustment factors (or "clutch adjustment factors", as I termed it) was merely illustrative; it is the LevERA+ statistic that actually measures pitcher performance in a way more highly correlated with winning percentage than ERA+.
Thank you to the moderator of the BBTF discussion about LevERA+ for urging the commenters to be more open-minded in their consideration of the stat and more careful in their reading and interpretation of my prior post (although the moderator also apparently misunderstood the purpose of the regression analysis to be aimed at measuring the probability that leverage adjustment factors were a measure of innate clutch ability). BBTF is a good aggregator of baseball news and so I link to it at this website. I don't generally read the comments, but I found a lot of the snark exhibited in the discussion of LevERA+ to be first-rate and genuinely funny. I hope a few of the commenters find their way to this post and that some of the misconceptions about LevERA+ are cleared up. LevERA+ is really just an adjustment to ERA+, a small improvement on it. But as Sean Forman learned last month, any tinkering with the beloved ERA+ is a controversial and incendiary venture. It is a venerable and ground-breaking stat. I hope others recognize LevERA+ as merely a small refinement of it.
"No, I wasn't arguing that Steve Trachsel is better than Tom Seaver..."
Incredibly, more than a few BBTF commenters seemed to think that I was proposing the clutch adjustment factor as a measure of pitching prowess. Even more believed I was arguing that Trachsel or Jeff Suppan were "more clutch" than Tom Seaver or Jim Palmer or Ron Guidry. I was completely mystified as to how anyone could have made these extraordinary leaps of illogic until my buddy pointed out to me that the term "clutch adjustment factor" really is inapt. I have to admit he's correct, and it's possible that some of the more absurd misconceptions expressed at BBTF derive from this ill-advised term. The better term is of course "leverage adjustment factor."
The term "clutch adjustment factor" was inapt for a few reasons. First, although high-leverage situations can reasonably be termed "clutch" situations, they are clutch only within the context of that particular game, without any regard for the significance of the game in the larger context of the season, the standings, or the advancement of a team's pursuit of a pennant or world series championship. My prior posts on the subject of "clutch pitchers" and "big-game pitchers" have focused almost exclusively on the latter concept of clutch, hence the focus on September and hugely consequential games in the context of races for the post-season. A bases loaded, two-out situation in the ninth inning of a one-run game in April is a clutch situation, to be sure, but it's an aspect of clutch performance distinctly different (and, for me, less interesting) than a pitcher's performance in hugely consequential games.
In any event, the statistic I'll now refer to as "leverage adjustment factor" really wasn't the focus of the post anyway. LevERA+ was the focus, and the discussion of leverage adjustment factor just a means of demonstrating that (i) for most pitchers the difference between ERA+ and LevERA+ isn't particularly material, and (ii) for some pitchers the difference is fairly significant, at least insofar that most of us stat geeks attach some significance to 4% and 5% differences in ERA+.
So, to be clear, I'm not proposing Trachsel or Suppan for the Hall of Fame, and I'm not suggesting that Guidry's 10th place ranking on the list of largest leverage adjustment factors is a qualification for the Hall. Guidry's 17th place ranking on the list of highest LevERA+s since 1952, however, is a fact worthy of consideration in assessing his qualifications for the Hall.
The Post had nothing to do with the subject of whether 'clutch performance' is an innate ability
I believe this was the misconception underlying some of the comments about the regression analyses discussed in the post. If it isn't, then the commenters at BBTF simply don't understand regression analysis, and perhaps this is the more likely explanation for some of their comments. Giving them the benefit of the doubt, however, their comments on the regression analyses, though misguided, at least make sense if the commenters were under the impression that I was arguing that the regression analyses established the existence of an identifiable "clutch ability." I wasn't arguing that, and the post was very clear in that regard. Although the issue of whether an innate "clutch ability" can be probabilistically verified has been a hot topic for sabremetricians for a number of years, I've always found the issue to be academic in the extreme. I basically agree with Bill James on the subject: I don't know, nor do I care, whether a particular player's excellent performance in big-games and clutch situations is a function of some innate clutch ability (or coolness under pressure, heightened intensity, superior character or black magic, for that matter), but these performances did occur and shouldn't be dismissed in assessing a player's accomplishments merely because there is some prospect that these clutch performances were a function to some degree of luck or chance.
One particular commenter at BBTF who evinced an almost superhuman ability for misapprehension and misconception argued that the regression analyses were worthless without information about distributive and probabilistic characteristics - the standard deviation, measures of distribution relative to normal distribution, and probability functions. "What number of pitchers would we expect to be above 4% by random chance alone?", he asked. All of this information would be supremely relevant to the question of whether Steve Trachsel's 7.5% leverage adjustment factor was evidence of an innate clutch ability or merely a function of random chance. But all of it is completely irrelevant to whether LevERA+ is more highly correlated with pitcher winning percentage than ERA+.
The regression analyses were useful in testing the hypothesis that LevERA+, by weighting elements of a pitcher's performance by impact on win probability, would be a more accurate predictor of pitcher winning percentage than ERA+. This is the same exercise as that engaged in by the originators of the OPS stat. Although the conceptual validity of combining on-base percentage and slugging average as a measure of contribution to run scoring was manifest, OPS wouldn't be a particularly helpful statistic if it didn't more highly correlate with run scoring than both OBP and slugging average. In particular, the sabremetric pioneers were aware that slugging average attached exaggerated weights to extra-base hits, which rendered slugging average a less predictive measure of run scoring than OBP and could have resulted in OPS being a relatively useless stat. In other words, the addition of OBP and slugging average to create OPS is rather arbitrary and it is only the fact that empirical evidence confirms its higher correlation with run scoring that makes it a more useful stat than OBP and slugging average.
This is what the commenters at BBTF should have been asking if they had understood the regression analyses: did the distribution of the winning percentage and LevERA+/ERA+ exhibit a basically linear relationship? (Yes, it did). Were the parameters for the pitcher populations analyzed by the regression fully described? (They were). Were the selection criteria for the pitcher populations free of criteria that may have introduced selection bias? (They were).
The analysis of the correlation between LevERA+ and winning percentage presented an almost paradigmatic utilization of regression analysis, and the comparison of the bivariate correlations of LevERA+ and ERA+ to winning percentage revealed a statistically significant advantage for LevERA+, just as the comparison of the bivariate correlation of OPS to run scoring revealed its advantages over on-base percentage and slugging average.
No, the LevERA+ statistic provides no advantage to pitchers who improve their performance with men on base, or demonstrate an ability to escape high-leverage situations resulting from their own failure to prevent baserunners.
This is an understandable misconception, but a misconception nonetheless. The LevERA+ statistic is based on the WPA and WPA/Li statistics, and neither statistic credits a pitcher for escaping his own jams. The incremental "win probability added" to a pitcher who loads the bases but escapes the inning without surrendering a run is no different than the win probability added to a pitcher who retires the side in order. As one commenter at BBTF accurately noted, the win probabilities "zero out" for the inning, with the pitcher's retirement of batters with runners on base offsetting the negative win probabilities attributed as a result of permitting batters to reach base. This is easily demonstrated by looking at the win probability and win expectancy increments in a boxscore at B-R.com. Here's the first inning of Steve Trachsel's start against the Dodgers on May 19, 1998, in which he allowed a double and a single but escaped without surrendering a run:
Here's the first inning of Trachsel's start against the Giants on September 29, 1998:
In each case, the summation of the win probability events for the inning reflected in the highlighted field above gave Trachsel the same WPA credit: he increased his team's win probability by 5% by pitching a scoreless top of the first. The fact that he allowed a double and single in the first game and retired the side in order in the second game made no difference.
Another commenter at BBTF queried whether bullpen performance has an impact on LevERA+. It doesn't, and this is another advantage of LevERA+ over ERA+. ERA+, like ERA, charges a run to a pitcher if he leaves the game with a runner on first and two out and the reliever permits the runner to score, and it weights that run the same as it would if the pitcher who walked the batter had surrendered a home run to that batter. WPA, and therefore LevERA+, charges the pitcher only with the probability of a runner scoring from first with two out. The reliever's performance is irrelevant, and the run allowed by the reliever has no affect on the LevERA+ of the pitcher who left the game.
The Calculation of LevERA+ and leverage adjustment factor
LevERA+ is simply the product of ERA+ and the leverage adjustment factor. The leverage adjustment factor is obtained by (i) multiplying a pitcher's clutch stat (listed in the "Win Probability" table at B-R.com) by 10 and then dividing the result by his runs allowed, (ii) adding the quotient obtained in (i) to 1, and (iii) multiplying the results obtained in (ii) by his ERA+. A pitcher with a negative clutch stat will accordingly have a negative result in (i), a figure lower than 1.0 in (ii), and will have a LevERA+ lower than his ERA+. Here's the formula (omitting the "+" from LevERA+ and ERA+ so as not to create confusion in the formula):
LevERA = ERA * (1 + ((clutch*10)/ RA))
As an example, assume a pitcher with an ERA+ of 120, a clutch stat of 1.0 and runs allowed of 100. The clutch stat is multiplied by 10 (i.e., the run conversion factor necessary to turn to the wins-based clutch stat into an equivalent number of runs). The result - 10 - is then divided by 100 (the pitcher's runs allowed). The result - .1 - is then added to 1 to arrive at 1.1. The pitcher's LevERA+ is the product of 120 (his ERA+) and 1.1, or 132. If the pitcher's clutch stat had been -1.0 rather than 1.0, then the product of his clutch stat and 10 would be -10, -10 divided by his 100 runs allowed would be -.1, and the sum of -.1 and 1 would be .9. The product of his ERA+ of 120 and .9 yields a LevERA+ of 108.
The actual conversion unit for converting wins-based stats like WPA to runs varies depending upon the run scoring environment (which is itself a function of the park, general scoring levels in the league and, perhaps most significantly, the pitcher's own performance). However, these factors are already factored into the WPA stat, and therefore the general, historically-derived conversion of 10 runs per win is the appropriate multiplier for the pitcher's clutch stat. (I initially didn't realize this, and my first post on LevERA+ therefore used a run conversion figure calculated individually for each pitcher, which had the effect of double-counting the run-scoring environment factor already contained within WPA).
Conclusion
In sum, Steve Trachsel was a very mediocre pitcher who allowed far more runs on average than did Jim Palmer or Tom Seaver. Trachsel had a propensity however for allowing runs in lower-leverage situations on average than other pitchers, and the distribution of his runs allowed skewed more toward low-leverage situations than the distribution of Palmer's and Seaver's runs allowed. This obviously doesn't make Trachsel a better pitcher - he allowed far more runs that Palmer and Seaver - but the difference between Trachsel and Palmer/Seaver in terms of leverage-weighted runs allowed and LevERA+ was slightly narrower than the difference in terms of mere ERA+.
Again, the list in my prior post of leverage adjustment factors (or "clutch adjustment factors", as I termed it) was merely illustrative; it is the LevERA+ statistic that actually measures pitcher performance in a way more highly correlated with winning percentage than ERA+.
Thank you to the moderator of the BBTF discussion about LevERA+ for urging the commenters to be more open-minded in their consideration of the stat and more careful in their reading and interpretation of my prior post (although the moderator also apparently misunderstood the purpose of the regression analysis to be aimed at measuring the probability that leverage adjustment factors were a measure of innate clutch ability). BBTF is a good aggregator of baseball news and so I link to it at this website. I don't generally read the comments, but I found a lot of the snark exhibited in the discussion of LevERA+ to be first-rate and genuinely funny. I hope a few of the commenters find their way to this post and that some of the misconceptions about LevERA+ are cleared up. LevERA+ is really just an adjustment to ERA+, a small improvement on it. But as Sean Forman learned last month, any tinkering with the beloved ERA+ is a controversial and incendiary venture. It is a venerable and ground-breaking stat. I hope others recognize LevERA+ as merely a small refinement of it.
Currently have 0 comments: