The Theory of Relativity
I love a lot of the new pitching stats. They're great analytical tools. Take FIP, for example ("fielding independent pitching"). It's based on the proposition that what happens on a ball put in play is frequently a function of random chance and team fielding. Bill James recognized its utility and cited Wally Bunker's 1964 season as an example of a pitcher apparently benefiting from some good luck insofar as his BAbip that year was .216. It turns out that Bunker in fact had a pretty good facility for generating low BAbip's in his career, presumably because, like Maddux in his prime, he was adept at keeping the ball away from the fat part of the bat and inducing batters to hit pitches outside the hitter's sweet spots in the strike zone. But Bunker never again came close to posting the .216 BAbip he posted in '64, despite being backed by the legendary team defense of the '60s Orioles.
As Bill James has noted regarding FIP and various other new and sophisticated measures of pitching performance, they have a tendency to throw out a lot of information in an effort to isolate and identify a pitcher's performance independent of non-pitching factors. Bill is a little unsettled by this, and so am I. As he's argued, W-L records are the antipode to FIP and similar stats, incorporating all information, including unfortunately things that have nothing to do with a pitcher's performance, like offensive support and team fielding. However, the inclination of the stat geeks to summarily dismiss W-L records is extremely misguided. It is possible to start with W-L records and make appropriate adjustments, and that's what I'm about to propose.
The Theory of Relativity, in contrast to FIP, throws out nothing but attempts to adjust for everything (or at least most things) that happens outside of the pitcher's performance. Simply put, it compares a pitcher's W-L record to his teams record in games where the pitcher was not the pitcher of record (i.e., it subtracts the pitcher's W-L record from the team's), adjusting for factors that effect the pitcher's and team's W-L records but are largely unrelated to the pitcher's own performance. If a pitcher received run support better or worse than the run support a team generally provided its pitchers, the pitcher's W-L record is adjusted (via the Pythagorean theorem) to reflect what his W-L record would have been had he received run support equal to his team's average. It also adjusts for the performance of the rest of the team's pitching staff, because even a good pitcher who receives excellent run support will appear to fare poorly relative to his team's W-L record if the rest of the starting pitching staff is comprised of Walter Johnson, Pete Alexander, Tom Seaver and Randy Johnson, with Gossage, Eckersley and Rivera coming out of the bullpen.
These pitching staff adjustments are accomplished by taking the team's ERA+ (exclusive of the subject pitcher's own ERA+) and adjusting the team's W-L record to reflect what it would have been had the rest of the staff generated a 100 ERA+ (again, based on the Pythagorean theorem). It simply takes the team's ERA+ exclusive of the subject pitcher's ERA+, calculates the runs allowed or saved by the staff's performance above or below the assumed 100 ERA+, and adds or subtracts those incremental runs to the team's runs allowed. A Pythagorean record is then generated assuming a league-average staff.
Once you've adjusted the pitcher's record for run support and adjusted the team's record for the rest of the pitching staff's performance, you compare the pitcher's adjusted W-L record to his team's adjusted W-L record. The impact of run support on the pitcher's W-L record relative to his team's is thereby eliminated, and the impact of the rest of the staff's performance on the team's W-L record is similarly eliminated. A good pitcher will have an adjusted W-L record much better than his team's adjusted record, and a poor pitcher will have a worse one. Measuring the difference between the adjusted records of the pitcher and the team provides a good measure of the pitcher's performance. It doesn't expressly adjust for team defense (a notoriously difficult aspect of team performance to measure), but it implicitly incorporates it because bad team defense will lower the denominator representing the team's W-L record and therefore increase the relative impact of the pitcher's W-L record (adjusted for run support) relative to his team's W-L record (adjusted for the performance of the rest of the pitching staff).
The concept of simply comparing a pitcher's record to his team's is not novel, but the defects in the system became apparent to me when I was comparing Phil Niekro's relative W-L record to Don Sutton's. Even if the records were adjusted for variations in run support, Sutton would still tend to fare poorly compared to Niekro because Niekro would benefit by being compared to the poor Braves pitching staffs of the '70s, while Sutton would suffer from being compared to the generally excellent Dodger's pitching staffs of the '70s. It was easy for Niekro to outperform the sub-average pitchers on the Braves staff, but more difficult for Sutton to outperform the Tommy John's, Claude Osteen's and Andy Messersmith's who generally populated the Dodger staffs. It's fairly easy, however, to adjust for this, and the conceptual validity of the adjustment should be obvious. Still, the process of collating the team pitching data from different years, incorporating it into the adjustment formulas and generating the Pythagorean adjustments is a little involved and so for the moment I'll only present an analysis of three pitchers: Tom Seaver, Ron Guidry and Dave Stieb.
I selected these three pitchers because I thought they would be illustrative. Bill James has noted how spectacular Seaver's winning percentage was given the generally mediocre nature of the Mets teams he pitched for in the late '60s/early and mid-70s. I selected Guidry because I knew that his record was spectacular even after accounting for the fact that the Yankees teams he pitched for were generally pretty good, but I didn't know how his relative record had been affected by his run support and the quality of the Yankee pitching staffs. And I selected Stieb because (i) I knew that he had significantly underperformed relative to Pythagorean projections during his prime years in the early and mid-80s, and (ii) I was tired of beating up on Bert Blyleven. (I knew Blyleven also underperformed his Pythagorean projections in his prime, but I genuinely like the guy and he was by many measures a borderline great pitcher - certainly better than Stieb - albeit not a Hall of Famer).
I compared nine-year peaks for each of the pitchers. This was convenient because both Guidry and Stieb had distinct nine-year peaks that account for all of their superior seasons. One could select various nine-year periods for Seaver, because his peak extended well beyond nine years, but I selected his first nine seasons, comprising substantially his entire Met career. I'll begin the comparison by noting some things that you probably already know. For instance, the Mets were not a good team once you subtract Seaver, notwithstanding their two NL pennants and their '60 World Series championship. Their team winning percentage from '67 to '75 was .495 (mediocre, but not bad), but was only .463 once you subtract Seaver's .636 winning percentage from the equation, and that obviously stinks. You probably also knew that the Mets' problem was poor hitting. They actually had very good pitching, even after stupidly trading Nolan Ryan, posting a team ERA+ of 108 from '67 to '75. Once you subtract Seaver's superlative ERAs, however, the team ERA+ was 102.2. That's not great, but it's pretty good considering the staff's ace pitcher is excluded. Another way too look at it is that the Mets staff was above average even without the great Seaver.
I was somewhat surprised by how good Stieb's winning percentage was from '82 to '90. He was 135-90 for a very good .600 winning percentage. But I was also slightly surprised by how good the Jays teams were in that period. They had a .548 winning percentage, and were generally a pretty good team even aside from the excellent '85 and '87 seasons, other than in '82. Even subtracting Stieb's W-L record the Jays still had a .539 winning percentage. I was very surprised, however, by how good the Jays pitching was in that period. They had a team ERA+ of 109.9 and an ERA+ of 106.9 even after subtracting Stieb. Even without Stieb the Jays staff in the '80s was as good as the Yankees pitching in the period '77 to '85 (primarily because the Yankees pitching sagged significantly from '82 to '84). Jimmy Key and Doyle Alexander were no slouches, and Jim Clancy was a pretty good No. 4 starter. And the Tom Henke-led bullpen was generally pretty solid and sometimes excellent.
The Yankees had a team winning percentage of .575 from '77 to '86, and were well over .500 every year other than '82. The Yanks' winning percentage drops to .552 without Guidry, still very good but not that much better than the Jays' .539 W% without Stieb. The Yanks pitching was better than the Mets but not as good as the Jays, posting an overall 106.3 ERA+ and a 103.7 ERA+ without Guidry. The period of '77 to '85 was really a tale of two Yankee pitching staffs: the excellent staff from '77 to '81 and the generally mediocre staff from '82 to '85.
On the offensive support side both Guidry and Seaver received run support slightly better than team average, in each case about 3%. Stieb's run support was 1.2% below team average. Accordingly Guidry's and Seaver's adjusted W% was slightly lower than their actual W% and Stieb's slightly higher. The adjustments were quite small in each case, with Stieb's W% going up from .600 to .606. Guidry's adjusted W% dropped 18 points to .679 and Seaver's dropped 14 points to .622.
The big beneficiary of the adjustment to team W% by assuming an average pitching staff was Stieb. The Jays W% (exclusive of Stieb) drops from .539 to .518. A Jays staff with a 100 ERA+ would have added about 36 runs per year to the Jays' runs allowed total.
The effects of these adjustments were essentially negligible for Guidry and Seaver, with the reduction in their personal W%'s being largely offset by the reduction in the team W% resulting from translating their good team pitching staffs into average staffs. Stieb, by contrast, saw a significant increase in his W% relative to his team's. Simply comparing Stieb's .600 W% to his team's .548 W% shows that Stieb outperformed his team by 9.5%. Adjusting for run support and pitching staff, however, increases Stieb's relative performance figure to 17%. That's a pretty good figure, and though I've not yet run the figures for various HOFers I'm willing to bet that it compares favorably to some of the more marginal inductees.
Seaver outperformed his team after adjusting for run support and pitching staff by a tremendous 37%, which is almost precisely the figure obtained by comparing his straight W% to his team's.
Guidry outperformed his team after adjusting for run support and pitching staff by 27%, which represents less than a one point increase over the approximately 26% figure obtained by comparing his .697 W% to his team's .552 W% without Guidry.
Just to give some idea of how astounding Seaver's figure is, my preliminary calculations appear to suggest that Koufax outperformed his team during his historic five-year run from '62 to '66 by slightly north of 40%. Seaver's 37% relative performance figure maintained over a nine-year period, therefore, appears to be a historic feat, and I'm willing to bet that few other pitchers since 1920, if any, can match it.
Stieb's figures demonstrate how a pitcher who had run support below team average and pitched on a good staff can have actually outperformed his team by a larger margin than a simple comparison of W% between pitcher and team would indicate. On the flip side, a pitcher whose performance relative to his team's at first glance appears to be superlative can be revealed as a fundamentally average pitcher if he received both great run support relative to his team's average run support and pitched on a team with an inferior pitching staff. Obviously neither Seaver nor Guidry are examples of this, and I'm not sure off the top of my head which pitcher might fit this profile. I know Andy Pettitte has received tremendous run support throughout his career, but he's also pitched on generally excellent pitching staffs. If anyone can suggest such a pitcher in the comments section I'd appreciate it. I'm going to start looking by first identifying poor pitching staffs from recent decades and then examining the run support received by their starting pitchers.
The performance of Seaver, Guidry and Stieb relative to each other was not a complete surprise. For one thing, Stieb slightly underperformed his Pythagorean record from '82 to '90, compiling a .600 W% relative to a .613 Pythagorean projection (Stieb significantly underperformed the Pythagorean projection during his very best years of '82 to '85, indicating that he slightly outperformed Pythagorus over the balance of his nine-year stretch). The Pythagorean comparison doesn't provide for any of the adjustments in the Relativity method I've described, but it does indicate that Stieb didn't make particularly good use of his run support. Guidry, by contrast, hugely outperformed his Pythagorean projection from '77 to '85, posting a .697 W%, more than 40 points higher than his .654 Pythagorean projection. That's a big difference. Seaver underperformed his Pythagorean projection but by an insignificant amount, posting a .636 W% from '67 to '75 as compared to a .641 Pythagorean projection, well within the margin of error in Pythagorean projections.
What did we learn by comparing a pitcher's performance to his team's after making the Relativity adjustments? Well, without having finished fully computing the figures for a meaningful number of other pitchers, I think we learned that Stieb was a pretty good pitcher; Seaver, as one must have expected, was a truly great pitcher and a worthy member of the inner sanctum in the Hall of Fame; and Guidry was precisely between Stieb and Seaver. My own takeaway is that the gap between Seaver and Guidry was about what I'd expected: it's significant, because Seaver is unquestionably among the very elite in the history of baseball, and Guidry, although deserving of HOF induction in my opinion, is admittedly a marginal candidate if one focuses soley on career statistics and ignores the astounding big-game record and his degree of dominance over a decade. I think the Relativity analysis also suggests strongly that the gap between Stieb and Guidry is about as big as the gap between Guidry and Seaver. It's significant, and it belies any comparison of the two based on nothing more than ERA+.
The results for Stieb and Guidry confirm a few things and dispense with a few myths. They confirm that Guidry's improved performance in high leverage situations translated into incremental wins, and Stieb's poor performance in high leverage situations translated into incremental losses. Stieb may have had the superior ERA+, but Guidry's LevERA+ was distinctly superior, and the difference explains in part the disparity in their ability to outperform their teams. The Relativity analysis also dispenses with the myth that Guidry's outstanding career winning percentage was just a product of good run support and great teams. Guidry did indeed get good run support and pitched for good teams, but the fact remains that he outperformed his teams by a huge margin. A .600 winning percentage for a Yankee pitcher in the years '77 to '85 would be good but not that much better than the Yanks' record for those years. A .697 winning percentage, however, is spectacular even after adjusting for run support and the quality of the Yankee teams.
Based on what we've seen so far I think it's clear that elite pitchers will outperform their teams by 17% after adjusting for run support and the quality of the rest of the pitching staff. All time greats - and I mean pitchers among the top six or eight of all time - may outperform their team on an adjusted basis by more than 35%. And it should be clear that pitchers who outperform their team on an adjusted basis by more than 25% are no doubt Hall of Famers. If there are any doubts about that, the Relativity analyses of pitchers like Drysdale, Bunning, Sutton, Niekro, Ryan, Palmer are likely to resolve those doubts.
UPDATE: I just ran the numbers for Greg Maddux for the period '92-'02. He's an interesting case, of course, because he pitched on such great pitching teams, and so his 15% outperformance of his team's record on a straight comparison of W% could be expected to rise significantly. But - wow. Maddux shoots up to a relative performance index of 42% when adjusted for run support and pitching staff. I didn't appreciate how poor Maddux's run support was relative to team average. The Braves scored 4.86 runs/game when Maddux wasn't pitching, but only 4.41 for Greg. Maddux is just north of Seaver's 37%. I guess no one should be surprised.
Currently have 0 comments: