Fun With LevERA+ At The Think Factory
It was brought to my attention yesterday that my recent post on Clutch Pitchers and LevERA+ had been linked to by The Baseball Think Factory. I learned this from a friend who sent me an e-mail that quoted some of the more amusing misconceptions from the commenters at BBTF. I initially had no intention of addressing these misconceptions, reasoning that it was probably futile to reason with anyone who could have possibly understood the post to be arguing that Steve Trachsel was a great pitcher, or that Jeff Suppan was better than Jim Palmer and Tom Seaver. There were certain commenters at BBTF however who seemed to have at least a passing interest in the concept of LevERA+, and so I'll devote a few more words to the subject. I'll briefly address some of the more amusing misconceptions voiced at the BBTF and then discuss the calculation and conceptual underpinnings of the LevERA+ concept.
"No, I wasn't arguing that Steve Trachsel is better than Tom Seaver..."
Incredibly, more than a few BBTF commenters seemed to think that I was proposing the clutch adjustment factor as a measure of pitching prowess. Even more believed I was arguing that Trachsel or Jeff Suppan were "more clutch" than Tom Seaver or Jim Palmer or Ron Guidry. I was completely mystified as to how anyone could have made these extraordinary leaps of illogic until my buddy pointed out to me that the term "clutch adjustment factor" really is inapt. I have to admit he's correct, and it's possible that some of the more absurd misconceptions expressed at BBTF derive from this ill-advised term. The better term is of course "leverage adjustment factor."
The term "clutch adjustment factor" was inapt for a few reasons. First, although high-leverage situations can reasonably be termed "clutch" situations, they are clutch only within the context of that particular game, without any regard for the significance of the game in the larger context of the season, the standings, or the advancement of a team's pursuit of a pennant or world series championship. My prior posts on the subject of "clutch pitchers" and "big-game pitchers" have focused almost exclusively on the latter concept of clutch, hence the focus on September and hugely consequential games in the context of races for the post-season. A bases loaded, two-out situation in the ninth inning of a one-run game in April is a clutch situation, to be sure, but it's an aspect of clutch performance distinctly different (and, for me, less interesting) than a pitcher's performance in hugely consequential games.
In any event, the statistic I'll now refer to as "leverage adjustment factor" really wasn't the focus of the post anyway. LevERA+ was the focus, and the discussion of leverage adjustment factor just a means of demonstrating that (i) for most pitchers the difference between ERA+ and LevERA+ isn't particularly material, and (ii) for some pitchers the difference is fairly significant, at least insofar that most of us stat geeks attach some significance to 4% and 5% differences in ERA+.
So, to be clear, I'm not proposing Trachsel or Suppan for the Hall of Fame, and I'm not suggesting that Guidry's 10th place ranking on the list of largest leverage adjustment factors is a qualification for the Hall. Guidry's 17th place ranking on the list of highest LevERA+s since 1952, however, is a fact worthy of consideration in assessing his qualifications for the Hall.
The Post had nothing to do with the subject of whether 'clutch performance' is an innate ability
I believe this was the misconception underlying some of the comments about the regression analyses discussed in the post. If it isn't, then the commenters at BBTF simply don't understand regression analysis, and perhaps this is the more likely explanation for some of their comments. Giving them the benefit of the doubt, however, their comments on the regression analyses, though misguided, at least make sense if the commenters were under the impression that I was arguing that the regression analyses established the existence of an identifiable "clutch ability." I wasn't arguing that, and the post was very clear in that regard. Although the issue of whether an innate "clutch ability" can be probabilistically verified has been a hot topic for sabremetricians for a number of years, I've always found the issue to be academic in the extreme. I basically agree with Bill James on the subject: I don't know, nor do I care, whether a particular player's excellent performance in big-games and clutch situations is a function of some innate clutch ability (or coolness under pressure, heightened intensity, superior character or black magic, for that matter), but these performances did occur and shouldn't be dismissed in assessing a player's accomplishments merely because there is some prospect that these clutch performances were a function to some degree of luck or chance.
One particular commenter at BBTF who evinced an almost superhuman ability for misapprehension and misconception argued that the regression analyses were worthless without information about distributive and probabilistic characteristics - the standard deviation, measures of distribution relative to normal distribution, and probability functions. "What number of pitchers would we expect to be above 4% by random chance alone?", he asked. All of this information would be supremely relevant to the question of whether Steve Trachsel's 7.5% leverage adjustment factor was evidence of an innate clutch ability or merely a function of random chance. But all of it is completely irrelevant to whether LevERA+ is more highly correlated with pitcher winning percentage than ERA+.
The regression analyses were useful in testing the hypothesis that LevERA+, by weighting elements of a pitcher's performance by impact on win probability, would be a more accurate predictor of pitcher winning percentage than ERA+. This is the same exercise as that engaged in by the originators of the OPS stat. Although the conceptual validity of combining on-base percentage and slugging average as a measure of contribution to run scoring was manifest, OPS wouldn't be a particularly helpful statistic if it didn't more highly correlate with run scoring than both OBP and slugging average. In particular, the sabremetric pioneers were aware that slugging average attached exaggerated weights to extra-base hits, which rendered slugging average a less predictive measure of run scoring than OBP and could have resulted in OPS being a relatively useless stat. In other words, the addition of OBP and slugging average to create OPS is rather arbitrary and it is only the fact that empirical evidence confirms its higher correlation with run scoring that makes it a more useful stat than OBP and slugging average.
This is what the commenters at BBTF should have been asking if they had understood the regression analyses: did the distribution of the winning percentage and LevERA+/ERA+ exhibit a basically linear relationship? (Yes, it did). Were the parameters for the pitcher populations analyzed by the regression fully described? (They were). Were the selection criteria for the pitcher populations free of criteria that may have introduced selection bias? (They were).
The analysis of the correlation between LevERA+ and winning percentage presented an almost paradigmatic utilization of regression analysis, and the comparison of the bivariate correlations of LevERA+ and ERA+ to winning percentage revealed a statistically significant advantage for LevERA+, just as the comparison of the bivariate correlation of OPS to run scoring revealed its advantages over on-base percentage and slugging average.
No, the LevERA+ statistic provides no advantage to pitchers who improve their performance with men on base, or demonstrate an ability to escape high-leverage situations resulting from their own failure to prevent baserunners.
This is an understandable misconception, but a misconception nonetheless. The LevERA+ statistic is based on the WPA and WPA/Li statistics, and neither statistic credits a pitcher for escaping his own jams. The incremental "win probability added" to a pitcher who loads the bases but escapes the inning without surrendering a run is no different than the win probability added to a pitcher who retires the side in order. As one commenter at BBTF accurately noted, the win probabilities "zero out" for the inning, with the pitcher's retirement of batters with runners on base offsetting the negative win probabilities attributed as a result of permitting batters to reach base. This is easily demonstrated by looking at the win probability and win expectancy increments in a boxscore at B-R.com. Here's the first inning of Steve Trachsel's start against the Dodgers on May 19, 1998, in which he allowed a double and a single but escaped without surrendering a run:
Here's the first inning of Trachsel's start against the Giants on September 29, 1998:
In each case, the summation of the win probability events for the inning reflected in the highlighted field above gave Trachsel the same WPA credit: he increased his team's win probability by 5% by pitching a scoreless top of the first. The fact that he allowed a double and single in the first game and retired the side in order in the second game made no difference.
Another commenter at BBTF queried whether bullpen performance has an impact on LevERA+. It doesn't, and this is another advantage of LevERA+ over ERA+. ERA+, like ERA, charges a run to a pitcher if he leaves the game with a runner on first and two out and the reliever permits the runner to score, and it weights that run the same as it would if the pitcher who walked the batter had surrendered a home run to that batter. WPA, and therefore LevERA+, charges the pitcher only with the probability of a runner scoring from first with two out. The reliever's performance is irrelevant, and the run allowed by the reliever has no affect on the LevERA+ of the pitcher who left the game.
The Calculation of LevERA+ and leverage adjustment factor
LevERA+ is simply the product of ERA+ and the leverage adjustment factor. The leverage adjustment factor is obtained by (i) multiplying a pitcher's clutch stat (listed in the "Win Probability" table at B-R.com) by 10 and then dividing the result by his runs allowed, (ii) adding the quotient obtained in (i) to 1, and (iii) multiplying the results obtained in (ii) by his ERA+. A pitcher with a negative clutch stat will accordingly have a negative result in (i), a figure lower than 1.0 in (ii), and will have a LevERA+ lower than his ERA+. Here's the formula (omitting the "+" from LevERA+ and ERA+ so as not to create confusion in the formula):
LevERA = ERA * (1 + ((clutch*10)/ RA))
As an example, assume a pitcher with an ERA+ of 120, a clutch stat of 1.0 and runs allowed of 100. The clutch stat is multiplied by 10 (i.e., the run conversion factor necessary to turn to the wins-based clutch stat into an equivalent number of runs). The result - 10 - is then divided by 100 (the pitcher's runs allowed). The result - .1 - is then added to 1 to arrive at 1.1. The pitcher's LevERA+ is the product of 120 (his ERA+) and 1.1, or 132. If the pitcher's clutch stat had been -1.0 rather than 1.0, then the product of his clutch stat and 10 would be -10, -10 divided by his 100 runs allowed would be -.1, and the sum of -.1 and 1 would be .9. The product of his ERA+ of 120 and .9 yields a LevERA+ of 108.
The actual conversion unit for converting wins-based stats like WPA to runs varies depending upon the run scoring environment (which is itself a function of the park, general scoring levels in the league and, perhaps most significantly, the pitcher's own performance). However, these factors are already factored into the WPA stat, and therefore the general, historically-derived conversion of 10 runs per win is the appropriate multiplier for the pitcher's clutch stat. (I initially didn't realize this, and my first post on LevERA+ therefore used a run conversion figure calculated individually for each pitcher, which had the effect of double-counting the run-scoring environment factor already contained within WPA).
Conclusion
In sum, Steve Trachsel was a very mediocre pitcher who allowed far more runs on average than did Jim Palmer or Tom Seaver. Trachsel had a propensity however for allowing runs in lower-leverage situations on average than other pitchers, and the distribution of his runs allowed skewed more toward low-leverage situations than the distribution of Palmer's and Seaver's runs allowed. This obviously doesn't make Trachsel a better pitcher - he allowed far more runs that Palmer and Seaver - but the difference between Trachsel and Palmer/Seaver in terms of leverage-weighted runs allowed and LevERA+ was slightly narrower than the difference in terms of mere ERA+.
Again, the list in my prior post of leverage adjustment factors (or "clutch adjustment factors", as I termed it) was merely illustrative; it is the LevERA+ statistic that actually measures pitcher performance in a way more highly correlated with winning percentage than ERA+.
Thank you to the moderator of the BBTF discussion about LevERA+ for urging the commenters to be more open-minded in their consideration of the stat and more careful in their reading and interpretation of my prior post (although the moderator also apparently misunderstood the purpose of the regression analysis to be aimed at measuring the probability that leverage adjustment factors were a measure of innate clutch ability). BBTF is a good aggregator of baseball news and so I link to it at this website. I don't generally read the comments, but I found a lot of the snark exhibited in the discussion of LevERA+ to be first-rate and genuinely funny. I hope a few of the commenters find their way to this post and that some of the misconceptions about LevERA+ are cleared up. LevERA+ is really just an adjustment to ERA+, a small improvement on it. But as Sean Forman learned last month, any tinkering with the beloved ERA+ is a controversial and incendiary venture. It is a venerable and ground-breaking stat. I hope others recognize LevERA+ as merely a small refinement of it.
Clutch Pitchers and LevERA+
The Bully Factor
Bill James has a new article up entitled "The Bully Factor" that examines pitcher performance on the basis of quality of the opponent. The article was prompted by an inquiry from a subscriber to his website, but Bill says the idea of breaking down a pitcher's performance by quality of opponent originated in a 1969 when Bill argued to a college buddy that Marichal was better than Gibson and his buddy (a big Cards fan) responded by insisting that Marichal tended to beat up on the weak sisters in the league (in the '60s NL, that would be teams like the Mets and Astros). Bill's response at the time, without knowing any of the actual facts, was "bullshit."
Bill finally got around to crunching the numbers and posted his spreadsheet online for downloading (clicking on the preceding link will automatically download the excel spreadsheet to your hard drive). I believe the article itself is only available at Bill's subscriber-only website. Basically, Bill divided teams into four quality categories based on their aggregate records by each decade and then broke down a pitcher's starts against teams in each category. The findings are interesting, if not all that significant. Bill himself makes no great claims as to the significance of his research in judging pitchers. Bill doesn't really make this point but I will: given two pitchers with identical records, one should prefer the pitcher who pitches better against A-list competition. Why? Because if the pitcher's team is in contention, games against other contenders are two-fers. A win against another contender is not only a win for your team but a loss for the other contender.
Here's how Bill described his methodology in arriving at a single metric he refers to as "the Bully Factor":
How do we measure the extent to which each pitcher dominated inferior competition? I looked at six factors relative to that issue, which were: 1) The percentage of the pitcher’s wins that came over “D” quality competition, 2) The difference in the pitcher’s winning percentage versus “A & B” teams and his winning percentage versus “C & D” teams, 3) The difference in the pitcher’s ERA versus “A & B” teams and his ERA versus “C & D” teams, 4) The difference in the pitcher’s overall effectiveness RANK (1 to 702) versus “A & B” teams and his overall effectiveness rank versus “C & D” teams, 5) The difference in the pitcher’s overall effectiveness rank (1 to 702) versus “A” teams compared to his overall effectiveness rank versus all teams, and 6) The player’s career win total versus “A & B” teams compared to his career wins versus “C & D” teams.
I made up an index of these six indicators, which I called the “Bully Factor”; a high Bully Factor indicates that the pitcher pitched much better against weak competition than against strong competition—much better, or in some cases much more. Later, I’ll list the pitchers at the top and bottom of the chart, but first, let’s look at the guys with the most “normal” data, the guys in the center of the chart.So who are the biggest bullies among notable pitchers? Well, to begin with, Bill was pretty much on target with his "bullshit" response to his buddy's assertion that Marichal was a bully and Gibson wasn't: Marichal generally performed better against the quality competition, whereas Gibson had a greater tendency to beat up on the weak sisters in the league. Bill is careful not draw any grand conclusions from this fact, as well he should be, because Gibson's spectacular big-game record certainly refutes any argument that Gibson couldn't step it up against good teams in big games. But the fact remains that as between the two Gibson did more padding of his stats against the bad teams than Marichal did.
Here are the biggest bullies among the more notable pitchers of the last 60 years (Bill's data covers pitchers with 100 or more starts since 1952): Bob Turley, Denny McLain, C.C. Sabathia, Early Wynn, Jack Morris, Justin Verlander, Roy Oswalt, Bob Lemon, Tim Wakefield, Ken Holtzman, Herb Score, Mel Parnell, Joe Niekro, Camilo Pascual, Derek Lowe and Mark Buehrle. Zack Greinke also has a pretty big Bully Factor so far in his brief career.
Some other notable pitchers who had Bully Factors well above average are Sam McDowell, Luis Tiant, Tim Hudson, Dave Stewart, Mike Hampton, J.R. Richard, Steve Rogers, Don Newcombe, Vida Blue, Bob Gibson, Andy Pettitte, David Wells, Randy Johnson and Bert Blyleven.
Notable pitchers with very low Bully Factors include Frank Lary (aka "the Yankee Killer"), Carlos Zambrano, A.J. Burnett, Kenny Rogers, Bartolo Colon, Jarrod Washburn, Phil Niekro, Dave Stieb, Floyd Bannister, Bob Welch, Frank Viola, Mel Stottlemyre, John Lackey, Al Leiter and Bret Saberhagen.
Some other notable pitchers who had Bully Factors distinctly below average are Bret Saberhagen, Fernando Valenzuela, Nolan Ryan, Tommy John, John Candelaria, Juan Marichal, Mickey Lolich, Cliff Lee, Robin Roberts, Sandy Koufax, John Smoltz, Tom Glavine, Mike Cuellar, Dennis Eckersely, Ron Guidry, Dwight Gooden, Dave McNally, John Tudor, Johan Santana, Curt Schilling and Frank Tanana.
Of most interest to me were the pitchers who performed particularly well against the A category teams. Generally speaking these teams had winning percentages over .550 for the decade. There are five pitchers who really stand out, compiling excellent winning percentages and ERAs against A category teams: Whitey Ford, Sandy Koufax, Bret Saberhagen, Pedro Martinez and Johan Santana. Against A category competition, each had a winning percentage above .630 and an ERA below their career ERA.
There are ten other pitchers who had a winning percentage above .570 against A quality competition (min. 25 wins against A competition): Dwight Gooden, Freddy Garcia, Roy Halladay, Jack Sanford, David Wells, Jim Maloney, Juan Marichal, Tom Glavine, Ron Guidry and John Candelaria.
Categorizing teams based on their records over a decade rather than annual records will produce some anomalies. Just for example, a pitcher who just came into the AL within the last few years will have his games against the Tampa Rays thrown into the D category of weak sisters even though the Rays have been anything but weak the last few years. As another example, Bill's data shows Saberhagen with a .570 winning percentage against teams with decade records above .500 and .601 against teams with decade records below .500. Splits based on annual team records, however, show that Saberhagen's numbers are flipped: he had a .606 W% against teams with records above .500 and a .571 W% against teams under .500.
Still, as always, James is provocative. And some of the findings are very striking. Ford and Koufax were great against top flight competition. Jack Morris and Justin Verlander really feasted on the worst teams. Draw your own conclusions as to the significance of these facts.
Andy Pettitte
Andy Pettitte: Hall of Famer, or just a good pitcher on great teams who was lucky to get great run support? If Andy retired today, I'd have to believe the BBWAA would come down decidedly in the latter camp. But another 15 win season in 2010 and some more October glory could change that.
I would have to admit that Andy is still a marginal HOF candidate, at least by the ostensible standards of recent HOF balloting. But if I had a vote, I'd have to ask myself: can I really vote to exclude a guy who has been such a large part of so much baseball history, and a crucial cog for so many world champions? And what if Pettitte tops the 250 win mark? He's a sure bet to top 240 wins and that's a formidable figure in this era. Who among the best active pitchers is a good bet to reach 240 wins? In an age where Cy Young award winners win 15 or 16 games, I'd venture that not even Sabathia and Halladay - the two most likely to hit 240 - are even-money bets.
The key to Andy's HOF chances is his reputation as a big-game pitcher, of course. But as staggering as his post-season numbers are, Pettitte critics are still loathe to acknowledge Andy's big-game bona fides. As best I can tell, they regard Pettitte as a post-season version of the regular-season Jack Morris: a guy who won a lot but only because he had great run support. This is a myth.
A closer look at Pettitte's post-season record reveals that Pettitte's excellent post-season record is more a function of his clutch pitching than his run support. It also reveals that Pettitte has been getting better in the post-season as he gets older. And if one looks at Pettitte's September record while ihs team is in contention for a playoff berth, they find a record remarkably similar to his outstanding post-season record. Let's take a closer look and see if you don't agree that Andy Pettitte should already have his ticket punched for Cooperstown.
The Post-Season
Here's the post-season record:
Click on the above stat line to see Pettitte's post-season game log. Pettitte's been getting even better in recent years, compiling an 8-2 record and 2.98 ERA in 96.2 innings over his last four post-seasons ('03, '05, '07 and '09). If not for two egregious bullpen collapses by the Astros bullpen in October 2005 Pettitte's post-season record since '03 would be 10-2.
Considering the level of competition a pitcher faces in the post-season, these are great numbers. The stat geeks and ERA+ worshippers, however, aren't impressed by Pettitte's post-season ERA. They assume that his 18-9 record must be a function of autumnal thunder from the Bronx Bombers' bats. Not so. The Yankees have provided Pettitte with an average of 4.575 runs/game. That's slightly above the post-season average of 4.19 runs/game since 1995, but well below the regular-season major league average of 4.81 and still further below the A.L. average of 4.98. A pythagorean calculation based on Pettitte's post-season run support and his runs allowed/game projects a record of 15-12 for a .556 winning percentage. Pettitte has significantly outperformed his pythagorean record, however, by performing exceptionally well in high-leverage situations in the post-season and by pitching his best in those games where pitcher performance is most critical - games in which his team provided between three and five runs of support. These two factors render Pettitte's 3.90 post-season ERA extremely misleading.
Pettitte's "clutch" figure in his 16 post-season starts since 2002 is 1.21 according to Fangraphs.com. This means that Pettitte's clutch pitching in these 16 post-season starts has been worth an incremental 1.21 victories. This translates to approximately 12 fewer earned runs if the runs allowed (and not allowed) by Pettitte in the post-season are weighted in proportion to their impact on the Yankees' win expectancy. This means Pettitte's Leveraged ERA is therefore 2.17 in the 99.2 post-season innings he's pitched since the 2002 post-season, approximately 33% better than his nominal 3.25 ERA. Fangraphs doesn't have the post-season clutch statistics prior to 2002, but if Pettitte's clutch performance prior to 2002 were neutral (i.e., a clutch figure of 0) his Leveraged ERA for his entire post-season career would be 3.47, approximately 11% better than his nominal 3.90 ERA.
Pettitte's clutch pitching within post-season games has been matched by his tendency to pitch his best in games where his performance is most critical in determining the outcome. In games in which the Yankees scored between 3 and 5 runs, Pettitte had a superlative 2.99 ERA and a record of 8-3. In games in which the Yankees scored between 2 and 4 runs Pettitte was even better: a 2.75 ERA and a record of 7-3. By contrast, since 1995 the record of home teams in League Championship Series when they score between 2 and 4 runs is 22-45. Pettitte's record in the post-season when receiving between 2 and 5 runs of support is a significant factor behind his 18-9 record and his ability to outperform his pythagorean projected record.
The World Series
Pettitte's World Series record has been a story of feast or famine. He's had two absolutely atrocious starts - game 1 of the '96 Series and game 6 of the '01 Series - in which he allowed a total of 13 earned runs in 4.1 innings. He's compiled a 2.70 ERA in his other 11 WS starts. Pettitte has actually experienced some pretty tough luck in the World Series, taking losses or no-decisions in four games in which he made quality starts and compiled a cumulative ERA of 2.02.
Perhaps most impressive about Pettitte's World Series record is the number of games in which he's turned in dominating performances. In the last 20 years there have been 24 World Series games in which a starting pitcher has pitched 7 or more innings and not allowed an earned run. Tom Glavine, Randy Johnson, Curt Schilling and Jack Morris each had one, for a combined total of four such games out of their cumulative 24 World Series starts. Greg Maddux and John Smoltz each turned the trick twice, for a combined total of 4 such games in their cumulative 13 World Series starts. Andy Pettitte has pitched four such games in his 13 World Series starts. Only four pitchers have had more than two 7 inning, 0 earned run World Series starts during the post-1920, live-ball era: Waite Hoyt and Bob Gibson, each of whom had three, and Whitey Ford and Andy Pettitte, each of whom had four. That's some pretty select company - only Hall of Famers need apply.
Pennant Races
Pettitte's September record when competing in a tight race for a division title or post-season berth is similarly exceptional. He's 26-9 in 51 starts with a 3.69 ERA (approx. a 125 ERA+). He's won four or more September starts in tight races three times: he was 5-1 for the Yankees in his rookie year of '95, 4-1 for the Yankees in '03, and 4-0 with a 1.86 ERA for the Astros in '05 as he and Roy Oswalt led Houston's charge into the World Series.
Pettitte's September record in title races is very similar to Seaver's and Palmer's:
Not much to choose from between Andy Pettitte and these two first-ballot Hall of Famers when it came to pennant races. Pettitte wasn't a Palmer or Seaver from April to August, but in the two months of the baseball season that dominate the history books Andy Pettitte was the equal or better of many of the greatest pitchers in the history of the game. Not that Pettitte's regular season record is anything to sneeze at; it compares quite well with the careers of many recent HOF inductees, such as Hunter, Drysdale, Jenkins and Bunning, as well as pitchers like Jack Morris and Bert Blyleven who are within striking range of induction. An argument for Pettitte's elevation to the Hall is not an exercise in incrementally loosening the HOF criteria by setting the bar at the level of the most dubious previous inductee, unless one takes the untenable position that none of Hunter, Drysdale, Bunning, Jenkins, Haines, Pennock, Hoyt, Gomez, Sutter, Coveleski, Lyons, Bender and Chesbro really belong in the Hall. If this is the position of those who would oppose Pettitte's induction into the Hall, then the debate is not about whether to lower HOF standards but whether HOF standards should be radically raised.
Pettitte's argument for the Hall is the same as that for so many HOF inductees: he has remained for many years among the first rank of his contemporaries if not the top handful, he contributed significantly to great teams, and he distinguished himself in the September and October games that matter the most.
One last data point to consider. Pettitte will likely approach Morris's career win total, if not pass it. He will have a significantly better winning percentage, a significantly superior ERA+, and his pennant race and post-season records will be not only superior to Morris's but vastly deeper as well. If, as appears likely, Morris breaks the 50% mark in HOF balloting within the next five years, then Andy Pettitte deserves more than the 75% necessary to gain entry into Cooperstown.
Superchief
Allie Pierce Reynolds was the unquestioned ace of teams that won six World Series. He is not in the Hall of Fame. This has always struck me as extremely odd. Bizarre, even. If the ace starting pitcher on the team that won five consecutive World Series isn't a Hall of Famer then I'm missing something. But Superchief missed by just one vote in the December 2008 balloting by the pre-1943 Veterans Committee, garnering 8 of the 9 votes required for induction. That's good news for Reynolds, and it might be good news for Ron Guidry, too.
Yes, Reynolds played on great teams. Yes, he pitched to Yogi Berra for his entire Yankee career. Yes, the centerfielders patrolling Yankee Stadium's vast center-left expanse were Joe Dimaggio and then Mickey Mantle. But consider the following: Reynolds was 7-2 in 15 World Series appearances, nine of which were starts. His ERA was 2.79, which equates to an ERA+ of approximately 140.
Only one of the Yanks' pennants was won handily during their streak of five straight world championships; the 1953 team led the league by ten or more games for most of September. The '49 to '52 teams each prevailed in very tight races, generally besting Indians and Red Sox teams that were themselves stocked with all-stars and Hall of Famers. Reynolds pitched brilliantly down the stretch in those pennant races, winning four September games each of those years. The Yankees played 157 games in September in their six world championship years during Reynolds' career, a number almost exactly equivalent to one full season, and Reynolds won 22 games in those Septembers.
Reynolds lost the openers to the '51 and '52 World Series and each time followed with a complete game victory in Game 4 to even the Series, saving the Yankees from falling into an all but insurmountable deficit and sparking Yankee comebacks on their way to another world championship.
Quite simply, Allie Reynolds was the greatest big-game pitcher of his era. As great as those Yankee teams were, it was Reynolds and his rotation mates, Vic Raschi and Eddie Lopat, who were the key to the Yanks' success in the five World Series between '49 and '53. Whether it was the World Series or the heat of a September pennant race, Superchief was at his best, and without him the Yankees' historic world championship tally of the late '40s and early '50s would have been fewer by two and perhaps more.
Reynolds may have won only 182 regular season games in his career, but I'm willing to bet that the biggest winners and Hall of Famers of his era - Spahn, Wynn, Roberts, Lemon and Feller - would gladly trade a huge chunk of their career win totals for just a few of Superchief's World Series rings. And I'm willing to bet that the multitude of Hall of Famers on the Dodgers, Giants, Indians and Red Sox teams who competed against Superchief would agree that he is more than worthy of induction into the sacred Hall.
Reynolds' success in recent Veterans Committee balloting may be good news for Guidry because their careers are so remarkably similar. They had similar career lengths, with Reynolds pitching 100 more innings than Guidry. Each periodically pitched out of the bullpen. Each had a huge impact on numerous tight pennant races. Each was the ace of teams that won multiple world championships. Guidry's teams didn't have quite the level of success of Reynolds' Yankee teams, but on the other hand Guidry compiled a slightly superior regular season record in terms of ERA+, winning percentage and number of league leading performances.
If Reynolds is inducted it will plainly be because of his outstanding pennant race and post-season performances, particularly his role in leading the Yankees' stretch drives in the years '49 to '52. If these factors carry Superchief into the Hall then they should militate for Guidry's induction as well, because even Superchief must take a backseat to Ron Guidry when it comes to dominating pennant race performances. Reynolds' record in pennant races is notable for its consistency; like Guidry, Allie had five outstanding September performances in the midst of white hot pennant races. But Guidry's record exceeds Reynolds' in two respects: Guidry never stumbled in a pennant race, whereas even Superchief had a tough finish in '48 while the Yanks were chasing the Indians; and Reynolds' Septembers, while superlative, were never as dominating as Guidry's epic performances in '77 and '78.
The similarities don't end with the numbers. Both Reynolds and Guidry were quiet, stolid leaders, respected by their teammates for combining an unflinching competitive fire with an unflappable demeanor. Each let their play on the field do the talking. Neither liked talking about himself. As the Associated Press noted in its article on the subject of Reynolds' passing in 1994:
The late Dale Mitchell, who played with Cleveland, once said Reynolds might not have made the Hall because he refused to promote himself. "He's not that kind of guy," Mitchell said. "But I'll tell you one thing: In Yankee Stadium in September with that fastball, there wasn't anybody ever lived who was any tougher. With those shadows, we were like ducks in a shooting gallery."Both Reynolds and Guidry exemplified the team-first ethic, the value of which can't be measured by statistics. Like Guidry, Reynolds graciously accomodated the spot-relief role periodically assigned to him, acceding to manager Stengel's strategy for the good of the team. When asked about his failure to make the Hall, Reynolds expressed his preference for winning over personal accolades.
I'm kind of indifferent now about whether I make the Hall of Fame," he said. "If it happens, it happens. I'm pretty much laid back on that. They've got to have some kind of rules. I knew that was going to happen with all the relief work I did for the Yankees. That really was a career-shortener. But to me, that was important. Teamwork was more important than some kind of honor."As was the case with Guidry, pitching from the bullpen when the situation demanded it probably cost Reynolds more than one 20-win season. Superchief averaged more than 10 relief appearances per season with the Yankees, as a consequence never making more than 31 starts in a season. Stengel's strategy was a huge success for the Yankees, less so for Reynolds personally.
It appears as if the Veterans Committee may be prepared to finally look beyond the sterile statistics and recognize Allie Reynolds' contribution to six World Series winners. And if they do, they should take a close look at Ron Guidry, too, whose career win total doesn't capture his role as the ace pitcher for the most successful American League teams of his era. As the plaque in Yankee Stadium's Monument Park puts it, Guidry was "a respected leader of the pitching staff for three American League pennants and two world championships. A true Yankee."
The plaque really says it all. If it's the only plaque he ever gets, there's no doubt Guidry will be just fine with that. But he deserves another plaque, one that will hang in Cooperstown.
ERA+: Looking Behind The Stat
ERA+ is a great analytical tool. It permits comparisons of ERAs across different eras and different run environments by adjusting for general league scoring levels and park factors. Its advantages over simple ERA are obvious. It is the single pitching statistic most often regarded as the definitive tool for analyzing pitching careers. Some stat geeks have become so enamored of ERA+ and its derivatives that they deny certain baseball truisms that might call into question the validity of judging pitchers primarily on the basis of ERA+. They tend to deny the concept of clutch pitching, despite the fact that certain pitchers evince a tendency to pitch measurably better or worse in high leverage situations (see this post for a discussion of Leveraged ERA+, or LevERA+, which weights runs allowed (and runs prevented) based on the impact on win expectancy). They also tend to discount the theory that most pitchers "pitch to the score" by changing their pitching approach depending on the game situation.
A host of statistics confirm that most pitchers do indeed pitch to the score. Pitchers as a group subscribe to the theory that when granted a big lead it is better to put the ball over the plate and make the opposition hit their way back into the game rather than risking a rally fueled by bases on balls. Virtually all successful pitchers walk fewer batters when working with a significant lead. Virtually all pitchers, successful or not, surrender more runs when working with tremendous run support from their teammates. Baseball-Reference.com recently added pitching splits based on team run support, showing a pitcher's performance in games in which they received between 0 and 2 runs of support, 3 to 5 runs of support and 6 or more runs of support. The vast majority of pitchers will surrender more runs on average when working with 6 or more runs than they do when working with 5 or fewer runs. The run-support splits further confirm that variations in ERA in high run-support scenarios have little or no impact on a pitcher's winning percentage in these scenarios, with good pitchers winning between 90% and 95% of these decisions regardless of how much their ERAs increase with great run support.
These statistics don't reveal defects in the ERA+ statistic but rather reveal the limitations of the statistic. They reveal that the ERA+ of pitchers who are blessed with generally superior run support, like Jack Morris, may be misleading. In games in which Morris received six or more runs of support he allowed 18% more earned runs than he did when working with 3 to 5 runs of support. This didn't prevent Morris from winning 93.3% of his decisions in these games, approximately the same percentage as pitchers who had much smaller increases in ERA in similar situations. The incremental runs allowed by Morris in high run-support games significantly inflated his ERA and ERA+ but had virtually no impact on game outcomes or his teams' fortunes.
Morris is representative of most elite starting pitchers in this regard. They tend to allow significantly more runs when they have good run support to work with. The following list shows the percentage by which these pitchers' ERAs increased or decreased in games in which they received 6 or more runs of support (relative to games in which they received 5 or fewer runs of support).
Obviously, for a given ERA (or ERA+) the optimal distribution of runs allowed by a pitcher would have the pitcher allowing the fewest runs in games in which his run support was weak and the most runs in games where his run support was strong. Pitchers who pitch relatively better where their run support is particularly weak or strong see little benefit to their winning percentages; even the best pitchers in the lowest run scoring environments will win less than 25% of their decisions when they receive 2 or fewer runs of support, and even average pitchers will generally win nearly 90% of their decisions in games in which they receive 6 or more runs of support. The impact of a pitcher's performance is greatest in those games where his run support is in the middle range - three to five runs of support - and those pitchers who pitch well in those games see the most beneficial impact on their winning percentages.
Palmer pitched a slightly lower run scoring environment in Baltimore, and accordingly 3 to 5 runs represented slightly better run support than the same number of runs when scored in the parks Blyleven pitched in during the '70s. However, this potential mitigating factor is offset by the fact that Blyleven received better run support overall when receiving 3 to 5 runs of support, getting an average of 3.93 runs/game as compared to Palmer's 3.77 runs/game. After adjusting for the different scoring environments, the run support received by each within the 3 to 5 run category is almost precisely the same. The huge disparity in their winning percentages when receiving between 3 and 5 runs of support cannot be explained by disparate run suppport, and is almost solely a function of the fact that Palmer pitched significantly better when receiving middling run support.
Blyleven had a slightly better ERA+ than Palmer when receiving 6 or more runs of support, but winning percentage in this category is largely inelastic (meaning that it doesn't vary much even with significant fluctuations in ERA+ ). Palmer lost only one such game in the '70s, Blyleven lost two. Blyleven also had a better ERA+ than Palmer when receiving between 0 and 2 runs of support, but Palmer had a significantly better winning percentage, .267 to Blyleven's .211. Palmer's advantage when receiving weak run support can be explained by Palmer's far superior record in one-run games, which will constitute a significant percentage of games in which a pitcher receives two or fewer runs of support.
As the Palmer/Blyleven comparison demonstrates, relatively similar ERA+ figures can mask significant differences in pitcher performance. Although Palmer's ERA+ in the '70s was only marginally better than Blyleven's, Palmer's substantially better performance in high leverage situations and better performance in those games where pitcher performance is most likely to affect the outcome (i.e., the 3 to 5 run support category) produced a substantially better W-L record.
Ron Guidry. Guidry pitched much better in higher leverage situations, compiling a LevERA+ more than five points higher than his nominal ERA+. Guidry also pitched significantly better in games where he received 3 to 5 runs of support, compiling an ERA+ in those games of 130.5 as compared to an overall ERA+ of 119 and an ERA+ of 109.4 in games in which he had run support of 6 runs or more.
John Tudor. Tudor had nearly a 129 LevERA+ (as compared to a 124 ERA+). He also excelled in matching his performance to the game scoring environment, pitching his best in lower scoring games while allowing more runs in high run support scenarios.
Whitey Ford. Ford's LevERA+ of 137 was even more impressive than his outstanding 133 ERA+. Ford also allowed nearly 9% fewer runs when receiving 5 or less runs of support than he did with 6 or more runs of support.
Tommy John. John's 114 LevERA+ was approximately three points higher than his ERA+, and his ERA was nearly a full run higher when receiving support of 6 runs or more than when he was working with 5 runs or less. His ERA in high run support scenarios hurt his ERA and ERA+ but not his winning percentage, and accordingly his ERA+ is deceptively low.
Juan Marichal. Marichal had a slightly higher LevERA+ than ERA+, 125 to 123, and he allowed approximately half a run more when supported with 6 or more runs than he did when working with 4 to 5 runs. Between his fine clutch pitching and his tendency to allow insignificant runs when working with great run support, Marichal's 123 career ERA+ is deceptively low.
On the other end of the spectrum - the Blyleven end, so to speak - Dave Stieb, Curt Schilling, Orel Hershiser and Steve Rogers are notable examples of pitchers whose LevERA+s were lower than their ERA+ and who tended to pitch better when graced with huge run support than they did in games in the critical 3 to 5 run support category. Like Blyleven, their ERA+ figures don't tell the full story.
In short, any apparent comparability between Bert Blyleven's performance in the '70s and Jim Palmer's is illusory. Palmer was clearly the better pitcher and it's not even particularly close. This may not be apparent if one looks only at ERA+, but one doesn't have to look too hard behind the ERA+ stat to learn that while they may have allowed a similar number of runs, Palmer generally allowed them when he could afford to and Blyleven too frequently allowed them at the worst possible times. This fact, not disparate run support, accounts for the huge difference in their W-L records. ERA+ won't tell you that. It's still an important measure of pitching performance, but there are now statistics readily available that, when viewed together with ERA+, give a much fuller and accurate picture of a pitcher's performance.
_____________________
* Koufax's +59% figure is an anomaly produced by the fact that Koufax played in wildly disparate scoring environments, pitching in distinctly hitter-favorable parks until '62, and then switching to the pitcher friendly Dodger Stadium just as he was hitting his stride. As a consequence, a disproportionate number of games in which Koufax received 6 or more runs of support occurred early in his career when he was not yet the Koufax of legend, and this significantly skews the numbers.
Clutch Septembers of the '20s and '30s
I've discussed the great pennant race performances of pitchers over the last 50 years. It's time to look at some of the legendary pennant race performances from long ago. These performances help to explain why certain pitchers with conspicuously thin career qualifications for the Hall were nonetheless inducted into the Hall of Fame. They also help to explain why some pitchers who were never seriously considered for the Hall are nonetheless revered by the oldtimers. Some names will be very familiar, others less so. But each of these pitchers put together performances in the heat of pennant races that lifted their teams to glory.
Dizzy Dean, 1934
Let's start with Dizzy Dean. Everybody knows about Dizzy's 30-win season for the Gashouse Gang in '34.They might not recall however that it was Dizzy's performance in August and September of '34 that made him a national figure and a baseball legend, as Dizzy led the Cards comeback to catch the defending World Champion Giants.
From August 1 to the end of the season, Dizzy went 12-3 with 3 saves, posting an incredible 1.48 ERA in 155.1 innings and winning his last nine starts in a row. Dean's fast finish not only brought a pennant to the Gashouse Gang, it permitted him to win 30 games - the last time a National League pitcher would ever accomplish that feat.
Dean topped off his dream season by winning two of his three starts against the Tigers in the World Series, including the clincher in game seven. There is no question that Dean's superhuman achievements during the Cards dash to the NL pennant in '34 form the bulk of the Dean legend and was a significant part of his elevation to the Hall. Without that performance, and the 30 win season that resulted from the Cards decision to pitch Dean every other day down the stretch, it's likely Dizzy wouldn't be in the Hall.
Jesse Haines, 1928
The Cardinals were perennial contenders in the mid and late-20's, and Jesse Haines was their ace. After spending most of 1926 as the Cards' No. 3 starter, Haines came to the fore in the legendary World Series matchup with the Yankees of the Murderers Row era, pitching a complete-game shutout in game 3 and winning the decisive 7th game with a 6.2 inning, two run effort against Ruth, Gehrig and Co.
Haines was the unquestioned ace of the Cards staff in 1927, going 24-10 as the Cards narrowly missed winning another NL pennant. Haines pitched brilliantly in August, helping the Cards keep pace with the Pirates and Giants in a torrid three-way race, but stumbled in September and the Cards came up short. Haines redeemed himself in 1928, however, putting together a pennant race performance that ranks among the best in baseball history.
The Cards were just half a game in front on August 24th when Haines took the mound against the Phillies. Haines' shutout against the Phils triggered a five-game winning streak that extended the Cards' lead to 5.5 games by August 28th. But the lead slowly dwindled through early and mid-September and remained between one and two games for much of the last two weeks of the season. As the Cards were trying to hang on, Jesse Haines was the Cards' personal life preserver. Beginning with his win against the Phils on August 24th, Haines reeled off eight consecutive complete game victories, compiling a 1.38 ERA over that stretch. Three of Haines' last four starts came with the Cards up by one game or less. Haines didn't allow as much as three earned runs in any of those eight starts until the last one, when he beat the Boston Braves to keep the Cards up by one with three games to go.
The Cards held on to win the NL pennant but were swept by the Murderers Row Yankees in the World Series. Haines started and lost game 3 of the Series, after two errors on one play by Cards catcher Jimmie Wilson led to three Yankees runs that broke a 3-3 tie in the sixth. Even with this loss, however, Haines' numbers against the great Yankee lineups in the '26 and '28 World Series are impressive: in four appearances against Murderers Row in those two World Series, Haines won two of his three starts and put up a 1.99 ERA. Haines added a complete-game four-hitter against the A's in the 1930 World Series, and finished his World Series career with a 3-1 record and 1.67 ERA, World Series stats virtually identical to Ron Guidry's.
Jesse Haines 210 career wins and .571 winning percentage didn't much impress the BBWAA during the '50s and early '60s, but Jesse finally made the Hall in 1970 because enough Veterans Committee members remembered Jesse Haines' central role on those Cardinals teams that fought Murderers Row to a draw in the '26 and '28 World Series.
Big Bill Lee, 1938
Big Bill's remarkable stretch drive in the great NL pennant race of 1938 has been largely overshadowed by Gabby Hartnett's legendary "homer in the gloamin'" that gave the Cubs a crucial victory over the Pirates just as umpires were preparing to call the game due to darkness. Hartnett's homer, however, wouldn't even be a footnote to history but for Lee's astounding September performance because the Cubs would have already been eliminated.
The Pirates entered September with a fairly comfortable lead over the Cubs, Giants and Reds, who appeared to be in a tight race for 2nd place. The Pirates faltered in early September, however, and by September 14 the four teams were separated by just 3.5 games. Lee began September by shutting out the Pirates. He then pitched shutouts against the Reds and Giants, helping to move the Cubs into 2nd place just 2.5 games behind the Pirates. Lee pitched a fourth consecutive shutout on Sept. 22nd against the Phillies, but the Cubs were still 3.5 games back with 13 to play. Lee's scoreless streak was finally snapped by the Cardinals on Sept. 26, but Lee pitched his fifth straight complete-game victory. The Pirates arrived in Chicago the next day with a 1.5 game lead to begin a three game series with the 2nd place Cubs. The stage was set for one of the most remarkable finishes in NL history.
A diminished but still formidable Dizzy Dean was tapped by the Cubs to pitch the first game against the Bucs. Dean's arm was no longer what it was, damaged as a result of his attempt to compensate by overthrowing after a line-drive in the '37 All-Star game broke a toe on his landing foot and restricted his ability to follow through. Dizzy was only a once-a-week pitcher for the Cubs in '38, but when he pitched he was spectacular, taking a 6-1 record and 1.91 ERA to the mound to face the Pirates. Dean pitched brilliantly against the Pirates and took a 2-0 lead into the bottom of the ninth. Dizzy had runners on 2nd and 3rd with two outs in the ninth when Hartnett, the Cubs manager, waved in Bill Lee. Lee promptly through a wild pitch that allowed the runner to score from third, but with the tie run just 90 feet away Lee struck out Pirate catcher Al Todd to end the game. The Cubs were just half a game behind the Bucs.
The next day the Cubs and Pirates were tied 3-3 when the Pirates scored two runs in the 8th inning off Cub pitcher Larry French to take a 5-3 lead. Hartnett brought in Big Bill with no one out in the 8th to stem the rally and Lee managed to finish the inning without permitting further damage. It was Lee's third appearance in three days. The Cubs responded with two runs in the bottom of the 8th to tie the game 5-5, and Lee, who was slated to start the next day's game, was replaced by Charlie Root to pitch the top of the 9th. Root held the Pirates scoreless in the 9th, and the rest is history. Hartnett's bottom of the 9th shot in the gathering darkness at Wrigley Field remains one of the most famous home runs in baseball history.
The Cubs were now in first place for the first time since early June. Lee took the mound to make his fourth appearance in four days; his last start had been just three days prior. The Cubs, perhaps conscious of the fact that Lee was running on fumes, scored three runs in the bottom of the first to take a quick lead. By the end of the fifth inning the Cubs had an 8-1 lead, having pounded the Pirates pitching trio of Bauers, Brandt and Blanton. The Cubs won the game 10-1 to finish the series with the Bucs with a 1.5 game lead. Lee recorded his sixth complete game victory in September. For the month, Lee was 6-0 with two saves and a microscopic 0.64 ERA. He had started four games against the other contenders in the NL race and won them all, with wins over the Pirates bookending his month. The Cubs held on to win the pennant, maintaining their lead over the Pirates for the last three games of the season.
Lee started the first and fourth games of the World Series for the Cubs against the Yankee juggernaut manned by a roster of Hall of Famers. Lee pitched well but to no avail, surrendering just three earned runs in 11 innings against the likes of Gehrig, Dimaggio, Dickey, Gordon and Henrich, but losing both games. Ruffing and Gomez were too much for the Cubs batters, and the Yankees swept the Series.
When one considers Lee's iron-man performance against the Pirates in late September and the fact that three of his four September shutouts came against other contenders, Big Bill's pennant race performance for the Cubs in '38 might be the most spectacular in National League history.
A Recipe For Catfish
Catfish Hunter is frequently cited by the stat geeks as a prime example of an unworthy HOF inductee. He doesn't have a plaque at the Baseball Think Factory's Hall of Merit, where Dave Stieb, Bret Saberhagen and Wes Ferrell are enshrinees. Hunter's ERA+ is presumably the problem the HOM balloters have with Hunter. It can't be the 224 career wins, since Stieb, Saberhagen and Ferrell each have significantly fewer. I've offered my explanation for Hunter's induction into the HOF, an induction I believe was more than worthy. I thought I'd look at Catfish's Team Relative performance.
During his ten-year prime from '67 to '76 Catfish outperformed his team by 11.3%. That's not a very good figure for a Hall of Famer, and I wasn't particularly surprised by it. What I was surprised by was Hunter's Team Relative index for his five-year prime of '71 to '75, which covers the A's World Series years and his first season with the Yankees. I apparently had assimilated the argument of the stat geeks that Hunter's record during that period was purely a function of pitching for a great team and getting huge run support. Not true, as it turns out. Hunter's Team Relative index for that five-year period is 28%. If you remove the '75 season, Catfish outperformed his A's teams by 29.3%. And if you limit the analysis to just the three World Series championship years with the A's, Catfish's Team Relative index was 34.6%.
To be clear, I'm not arguing that Catfish didn't benefit from great run support. He did. And I'm not arguing that Catfish would've had five consecutive 20-win seasons if he'd played for Blyleven's Twins teams in the '70s. What I am arguing, however, is that the claim that Hunter's great record during this period was just a function of great run support from a great team is demonstrably untrue. Take away the great run support and Hunter was still outperforming his team by 28% over a five-year period and a robust 34.6% during the A's championship years. Those are Hall of Famer-type numbers, albeit for a relatively brief period. It is simply a myth to argue that any pitcher with a Team Relative index like Hunter's was merely a product of great run support and great teams.
Let's look at another pitcher generally dismissed by the stat geeks as a mere product of great run support: Jack Morris. Morris's Team Relative index during his peak nine-year period of '79- '87 was 15.8%, not much by HOF standards but right there with Bunning's 16% index for his 11-year peak. That means if Morris had played for an average hitting team with a .500 record he still would have posted a .579 win% over those nine years. I think it's fair to conclude therefore that Morris's actual winning percentage of .615 during his peak was perhaps 30% attributable to his run support; the bulk of the credit, however, has to go Morris. If I'm not mistaken, Morris detractors would look at his 105 ERA+ and conclude that Morris's .577 career winning percentage was attributable 95% to his superior run support. This is plainly not the case. The Team Relative analysis demonstrates that Morris was able to perform far above the standard a career ERA+ of 105 would typically indicate.
It's no mystery why Catfish is in the Hall. He's in for the same reason Waite Hoyt, Jesse Haines, Lefty Gomez, and Red Ruffing are in the Hall despite falling well short of 300 wins, and for the reason Curt Schilling will make the Hall. They excelled on the big stage and made a huge impact for great teams. They put their imprint on legendary pennant races and World Series contests. That counts for a lot in HOF balloting, and it should.
The Celebrated Mr. K
His blazing five-year stretch from '62-'66 has become the standard by which all other great pitchers are measured. The Gold Standard. The definition of pitching dominance. Anyone who considers a new mode of analyzing pitching greatness has to insert his five peak seasons into the formulas and see what comes out. If you plug into your formulas his stats from these five seasons, during which he won five straight ERA titles, three pitching triple crowns and three 25+ win seasons in four years, and a historic result doesn't come out the other end, then maybe you need to double check your methods and formulas.
From '62 to '66 Sandy Koufax outperformed his team by 41%. If you exclude the '62 season, where Koufax's injury and the Dodger's decision to rush him back into the rotation in late September significantly skew the numbers, then Koufax outperformed his team by 49.5% from '63 to '66*. That's Randy Johnson territory. A 50% Team Relative performance over a period of years could be known as the Sandy-Randy Standard.
Randy Johnson's peak four-year period by Team Relative analyses was actually the five-year period from '93 to '97 that includes his injury-shortened '96 season when he went 5-0. It also includes the strike abbreviated '94 and '95 seasons. Over that five-year period Johnson's Team Relative performance was 58.2%. History suggests, however, that Johnson would not have maintained the .920 winning percentage he compiled in '95-'96 had he pitched full seasons. Johnson's true peak, as measured by wins, ERA+ and most other measures, actually occurred with the D'backs from '99 to '02, and he compiled a 49.9% Team Relative performance during that period.
Maddux compiled a 52.6% Team Relative performance from '94 to '97, but that period also included two strike-shortened seasons.
Guidry's Team Relative performance over his three-year peak from '77 to '79 was 40%. Seaver had a 44.2% Team Relative performance for four years between '68 and '71. If one excludes Gibson's injury-shortened '67 season, Gibson maintained a 41.1% Team Relative performance from '65 to '70. If one excludes Marichal's injury-shortened '67 season, he maintained a 33.8% Team Relative performance from '63 to '69.
For Schilling's three 20-win seasons - '01, '02 and '04 - he had a Team Relative performance of 48.2%. For Guidry's three 20-win seasons he had a Team Relative performance of 43.5%.
Team Relative analysis confirms that Koufax's great run was indeed among the very best four or five year stretches in baseball history. Throw in the huge innings totals Koufax put up in these years, the no-hitters, strikeout records, pennant race and post-season performances, and it's clear why Mr. K became a legend.
_________________________
* Koufax's best season by far as measured by Team Relative performance was his injury-shortened '64 season, when he posted a 19-5 record for a Dodger team that was truly terrible but for Koufax, compiling a .442 winning percentage in games in which Koufax was not the pitcher of record. Koufax outperformed that team by nearly 89%.