If You Have a Hall of Fame Ballot You’re Probably Bad at Voting

IMG_9608Fred McGriff is a victim.

In fact, along with the Montreal Expos, he’s probably the biggest victim of the 1994 MLB players’ strike. That’s because it’s very possible—probable even—the strike is what’s keeping him from the Hall of Fame.

This all rests on one assumption: Fred McGriff was clean. How reasonable is that? His name wasn’t in the Mitchell Report and it has never been leaked in conjunction with the 2003 list of 104 players who tested positive under the old anonymous survey testing policy. He was out of the game by 2005 when the current policy was put in place. Additionally, his head didn’t explode to where it looked like an orange on a toothpick and he didn’t have a bizarre power surge in his late thirties1.

Is he above suspicion? No, but that’s mostly because nobody playing the game in the Nineties is above suspicion. But let’s make the not unreasonable assumption that McGriff wasn’t a PED user. That means the roll of people ahead of McGriff on the all time home run list who aren’t in the Hall of Fame reads:

Barry Bonds
Alex Rodriguez
Ken Griffey, Jr.
Jim Thome
Sammy Sosa
Mark McGuire
Rafael Palmeiro
Manny Ramirez
Albert Pujols
David Ortiz
Gary Sheffield

McGuire publicly admitted to using PEDs off and on from 1989 until his retirement in 2001. Rodriguez’s name was said to be on the partially-leaked list ’03. He also served a 162-game suspension in connection with the Biogenesis of America scandal. Sosa’s name was leaked in connection to the same ’03 list, as was Ortiz’. Palmeiro sat in front of Congress, defiantly pointed his finger and said, “Let me start by telling you this: I have never used steroids, period.” He then tested positive for Stanozolol. Ramirez tested positive twice, then retired before serving the 100-game suspension mandated by the second failed test.

Bonds and Sheffield2 are both tied to PEDs through BALCO. There’s an entire book (Game of Shadows) detailing Bonds’ use.

That leaves Ken Griffey, Jr., Jim Thome and Albert Pujols as the only non-PED-stained players with more home runs than McGriff that aren’t already in the hall. Those three players are pretty much locks and probably first ballot locks. So that means everyone ahead of McGriff is either in the Hall, going to be in the Hall, or not getting the Hall in because they used PEDs.

McGriff will be the greatest clean home run hitter not in the Hall of Fame. It’s gotta happen to someone. He has 493 home runs, just shy of the 500 mark. In the past that has generally been considered one of the milestones that earns a player automatic inclusion into the Hall. Yet McGriff doesn’t even seem to be part of the debate. Although when the debate is “He cheated” vs. “Yeah but I was entertained so I don’t care that he turned his body into a bovine-hormone science experiment” the chances of anything intelligent being said are pretty remote.

He’s short of the milestone; 493 is not 500. It’s like running 26.1 miles. That’s something that voters seem to acknowledge through his declining vote totals on the annual ballot (he peaked at 23.9% in 2012 and has been around half that the last two years). But remember the 1994 season was cut short by a strike. And if not for that, McGriff would almost certainly have gotten the seven home runs he needs to avoid being a casualty of the steroid debate. Okay, it’s impossible to know what would have actually happened in those 48 games he lost. But we can guess. We can even make really good guesses by modeling home runs.

So I did. I modeled them a couple of ways. One is pretty dumb. The other is a little more sophisticated.

The most obvious way to guess McGriff’s non-strike total would be to make a simple projection based on how many home runs McGriff had to that point in the ’94 season3. Through 114 games he hit 34 home runs. At that pace, he would have ended the season with just over 48 (assuming you could hit 3/10ths of a home run). But that doesn’t account for the fact that home run rates usually aren’t linear with time.

For one thing temperatures cool off in September, so fly balls don’t carry as far. That’s simple physics. Moreover there is the generally accepted notion that players get tired as the long season grinds to an end (although, shouldn’t pitchers be getting tired as well?). But if you plot McGriff’s career home runs, you definitely see seasonality—fewer home runs in April; numbers jump in June and July, then peak around early August before declining in September.

Taking McGriff’s 15 seasons from 1987 to 20024 we can put together a slightly better estimate (this is the ‘pretty dumb’ model). We take all the home runs he hit in each month, total up all the games he played in each month, calculate how many home runs he hit per game per month, then just multiply that by the missed games. Doing that, he probably gets nine additional home runs, enough to push him over the 500 threshold5.

The better model isn’t based off McGriff’s own numbers. Instead it uses about 250 other player-months to predict what someone who looks like McGriff would hit in August and September. It’s based off some numbers—plate appearances, batting average, OPS, strikeouts, etc.—and some factor variables—month, league, age, decade6.

Doing prediction this way, McGriff projects to hit another seven home runs, exactly the number he’s short. And that’s probably the low-end as we used conservative numbers for inputs. In other words, we have to feed in some data to get predicted home run numbers out. So we average out McGriff’s monthly career numbers for all of the variables in the model, then we err on the side of being down (or up, depending) .25 standard deviations. This is sort of arbitrary but it guards against juicing the prediction. So even when the McGriff we feed into the model isn’t even average real-world McGriff over those missing games, he still gets to seven home runs.

However, it’s worth pointing out that McGriff was robbed of nearly 50 games in what would have been his best power season ever. In fact in August ’94 he was pretty much hitting everything thrown in the general direction of home plate. Through 10 games he had 7 homers and was hitting .421 with an OPS of 1.503.

That pace would not have been sustainable for much longer, but mean reversion doesn’t mean that he necessarily goes in the tank an underperforms to make the math happy. Simply going back to the nominal levels of his best power season probably clears seven with a couplathree home runs to spare. To put that in terms of the model, the August/September OPS values we fed in were around .800/.900. Here are McGriff’s actual monthly numbers for the non-strike part of the 1994 season: .928, .941, 1.007, .971, and 1.503 (partial for August)7.

Barring injury or his falling in to a Mystery Spot, McGriff easily hits seven and possibly as many as 10 or 11-ish more home runs in ’94 (and maybe doesn’t bother playing in 2004 and giving voters impression that he’s hanging around in a desperate attempt to hit the 500 mark, which is maybe being held against him) and maybe already has his bust in Cooperstown.

That’s a lot of maybes. Even that’s being epistemologically generous, though, as this is all based on something we can’t know because it’s based on games that never happened. The strike happened instead. Plus there’s the assumption that McGriff was clean. And if he wasn’t then this is just a really elaborate fool’s errand.

There is certainly more to consider on a player’s resume than just home runs. This is one of the triumphs of analytics. We’re better at measuring what matters so that, in 2010, a 13-12 Felix Hernandez can win the Cy Young over a 21-7 CC Sabathia. The former had a WHIP of 1.057, an ERA+ of 174 and a WAR of 7.1 (compared to the latter’s 1.191, 136, and 4.6). Wins are a stupid measuring stick.

And McGriff’s case might be suffering from the fact that we’ve moved beyond the flow-chart nature of some milestones (e.g. if 500, then HoF). Still, he was really good. Again, that maybe gets overlooked (or not properly contextualized) because of the inflated-numbers era in which he played. But McGriff finished in the top 5 of his league in OPS seven times and he did that during Peak PED. He’s 100th all time in hits (the strike certainly kept him from crossing the 2500 mark on hits) and 51st all time in total bases. His career offensive WAR of 55.5 isn’t jaw-dropping but puts him a little above Andre Dawson and a little below Gary Carter. Statistically, he looks like a Willie McCovey who struck out more.

In fact for every major category there is someone worse than McGriff who is in the Hall. That’s not the most compelling argument—”Hey, I’m not as bad as Charlie Gehringer”—but for almost any statistic you pick, McGriff’s numbers put him in the discussion8. And it’s not like he’s leading only the marginal Hall of Famers. McGriff’s career numbers put him among some of the game’s greatest (and the following lists of who McGriff is ahead of in each of these categories is by no means exhaustive).

  • Total hits: He’s ahead of Frank Thomas, Ozzie Smith, Mickey Mantle, Ryne Sandberg, Pudge Fisk and Kirby Puckett.
  • Total bases: He’s ahead of Mike Schmidt, Eddie Matthews, Brooks Robinson, Tony Gwynn and Kirby Puckett.
  • Extra Base Hits: He’s ahead of Nap Lajoie, Willie McCovey, Jim Rice, Orlando Cepeda and Kirby Puckett.
  • RBIs: He’s ahead of Joe DiMaggio, Tris Speaker, Billy Williams, Yogi Berra and Kirby Puckett.
  • Win Probability Added: He’s ahead of Tony Perez, Dave Winfield, Paul Molitor, Ralph Kiner and Kirby Puckett.
  • Career OPS+: He’s ahead of Jackie Robinson, Wade Boggs, Rod Carew, Eddie Murray and Kirby Puckett.

Picking on Puckett isn’t an accident. That should be obvious from the fact that he occurs in each list. Puckett is almost universally below McGriff, mostly because he only played 12 seasons. Now, they were really, really good seasons. But one day Puckett woke up with glaucoma and his career was over. So for cumulative numbers, Puckett is well short of some 80% of the people in the Hall10. It’s like the voters just projected out games he didn’t play and gifted him some additional season’s worth of production.

It’s not his fault he was stricken with glaucoma. So why punish him? Well, it’s not McGriff’s fault there was no labor agreement reached during the 1994 season11. To not extend the same courtesy of a pass to McGriff that Puckett gets seems like a crime, dog12.

By next season there will probably be 22 players on the ballot with a Bill James Hall of Fame Monitor number of 100 or better. It’s a rough measure that attempts to capture how likely a player is to make it into the HoF—over 100 means likely, under 100 less likely. Eight of those players are tied to either the Mitchell Report or a positive test (and another handful probably suffer under very heavy suspicion of use). McGriff’s Hof Monitor number? One hundred even.

At some point maybe voters should stop hand-wringing about what to do with guys who almost certainly used PEDs and instead focus more on evaluating those who maybe didn’t.

1 Although… At ages 33 and 34 his OPS dipped to .797 and .815 (he had back-to-back seasons over 1.000 three and four years prior), then he rebounded to where his OPS stays between about .825 and .950 for another five years, until he falls off the cliff at ages 39 (.750) and 40 (.577). Still it’s nothing like Bonds’ and his 1.422 OPS at age 39.

2 Somehow Sheffield escapes the overwhelming taint of guilt. While he admitted to using the cream supplied to him by Bonds’ trainer, Greg Anderson, he claims he had no idea what it was, and that it didn’t help him anyway. Ignorance is apparently the perfect excuse for Sheffield who defended himself to Andrea Kramer on HBO thusly: “I don’t care what anybody says, steroids is something you shoot in your butt.” It’s hysterical both as a terrible denial and as a poor understanding of medical science. Regarding Sheffield, Patrick Arnold, the chemist behind ‘the clear’ used by many of Victor Conte’s BALCO clients was a little more charitable if still blunt: “That’s some sort of weird rationalization. No, he took steroids.” I guess the take away is in defending yourself, say something so phenomenally stupid. The bigger the lie, etc. (Cf. Pettitte, Andy)

3This is an extraordinarily dumb model, if it even qualifies as a model. It’s just a percentage calculation a grade schooler could do. Also, this is not to be confused with the ‘pretty dumb’ way alluded to in the paragraph just above it. That’s farther down the page.

4 So we’re cutting off two years on the front and back ends of his career. In ’86 and ’87 he played 3 then 107 games respectively. In ’03 and ’04 it was 86 and 27. In fact in no other non-strike season did McGriff play fewer than 144 games.

5 Probably because, again, you can’t hit fractional amounts of home runs. The nine total here breaks down as about 4.3 in what’s left of August and around 4.7 in September. So maybe 9 is really 8; still more than 7 though.

6 A couple of notes. It’s 263 player months to be exact. Some of the factors—age, decade—look like numbers. But we convert them to factors. So age becomes an age range, similarly we don’t want to put a coefficient on the number 1960, but we want to capture some effects from the changes in the quality of players over time. Finally, what players look like McGriff? Other home run hitters. I looked for guys hitting between 35 and 45 in a given season and randomly grabbed some of those seasons. This isn’t a perfect approach—you’re basically trying to run prediction on something that looks similar to what you’re trying to predict.

7 For comparison, Andrew McCutchen’s monthly splits for his 2013 MVP season were: .636, .985, .933, .914, 1.079, .743. I’m using 2013 because the 2014 NL MVP was a pitcher (Kershaw) and the 2015 MVP hasn’t been announced yet.

8 Okay, so his lifetime .284 average isn’t that good, but it is identical to Griffey’s lifetime batting average. And by the time some of you read this, he’ll probably already have a guy carving his bust.

9 And Puckett had all of 207 career home runs (to McGriff’s 493, about 40% of his total; that despite playing half his games in a building nicknamed ‘The Homerdome.’)

10 I made up this number, but if I’m allowed some confirmation bias, it seems like it’s even higher.

11 Okay, it was a players’ strike, so collectively they are responsible to some extent. However, I imagine those players would have preferred to play those games. Especially the Montreal Expos and Tony Gwynn (who had a legit shot at hitting .400).

12 I’ll show myself out.

