New scorecards!

As promised yesterday! Most people know that I love keeping score. I score almost every game I attend in person and some I watch on TV. (This is part of my love of stats.) For some time I have used scorecards of my own design and last week I completed a re-design of the scorecards for this season. This post isn’t just pointless narcissism though; I produce these scorecards using Python code and this year I have put the code on GitHub for anyone to use.
The code can produce a blank scorecard, but it can also read in a text file to populate the teams and lineups. For example, my completed scorecard from today’s game is below. The Royals didn’t announce their lineup before I had to run it off the printer, so I filled that out manually, whereas the Twins side is done by the code. The default colour is red because it’s easiest to see red against both the white background and dark grey pencil marks. (It’s the same reason that red pens are used for editing.)

This is not my best code right now. I’m hoping to make further improvements to the code, including improving the documentation, making it more user-friendly and easier to tweak some parameters like size and colour, and if there’s an appetite for it, maybe making it easier for people to tailor it to their own scoring styles.

2023 MLB Predictions

Some people know this, but a few years ago I was given a full set of miniature MLB team helmets and a board to display them in the order of the standings. The board wore out quite quickly, but I replaced it with a couple of little cases from Amazon, and I usually update it about once a week during the season. (Less often in early April, since the standings are so volatile that time of year.) Before the season though, I set them in the order of my predictions for that year, and I usually write some breakdown of why.

This year, I strongly considered building a full ‘objective’ model to make the predictions for me, but objective in the sense that MLB Network’s Top 10 Right Now lists are advertised as being objective and unbiased. Yes, they came out of a computer, but there’s subjectivity and bias in what goes into the model. Brian Kenny’s abject inability to understand this and many other things will be a post at some point. Unfortunately, I had this idea at about eleven o’clock this morning and even if I didn’t have to work, that really isn’t enough time to build even a satirical model. That said, I did find a couple of interesting things that factor into these predictions and might go into future models:

  • Teams average eleven fewer wins the season after having a player hit sixty or more home runs without gratuitously cheating. Did I look this up just because I wanted an ‘objective’ way to lower the Yankees in the model? Yes. Admittedly, it’s not unreasonable to think there might actually be a connection there, as the team’s win total was probably boosted by a performance that is unlikely to be repeated, but also it’s only happened twice and both times the team really had nowhere to go but down the following year. (Which is why I looked this up in the first place.)
  • According to this paper, teams who change managers have a boost in winning percentage equal to about five wins over the course of a season. The teams with new managers this year are Miami, Texas, Kansas City, and Chicago (AL). I didn’t see anything for the effects of a new GM or pitching coach, but either way there’s something of a boost for the Royals.

Anyway, these predictions were all done by a neural net that has been trained inconsistently for more than thirty years and hates the Yankees. If they’re right, the neural net will be given positive feedback in the from of smugness and beer and if they’re wrong the neural net will get negative feedback in the form of annoyance… tempered by beer.

American League


AL East

  1. Toronto (95-67)
  2. Tampa Bay (90-72)
  3. New York (85-77)
  4. Baltimore (82-80)
  5. Boston (80-82)

I didn’t want to put the Blue Jays in first. I really didn’t. Maybe it’s a lingering resentment for how ungraciously they took losing the 2015 ALCS, or maybe it’s how much they’ve been hyped for over three years without really doing anything to earn it, but they annoy me. However, John Schneider did seem to take them from a quasi-functional group of individuals with decent talent to an actual team that can win games. (Win games in the regular season, anyway.) So fine, they get first. The Rays should make it a race, as they usually do. I thought about putting them in first, but it really would have been one of those ‘thy wish was father to that thought’ placements and I try to limit myself to only applying that to teams I really like or hate. The Rays, by their standards, struggled a bit last year. I would not bet against that being an anomaly, but they did lose some of their coaching staff this offseason and are swimming against the tide of the new CBA. Between the Yankees‘ 99 wins last year already being an overperformance for a team that looked hapless for a lot of the season and the aforementioned eleven game penalty for the year after a sixty home run season, they get third and can thank me for it. The Orioles and Red Sox could really go in either order. They’ll both probably be at least decent, but both are fairly uncertain. Baltimore have the issue that teams which took as big a step as they did last year usually need a year to at best consolidate while they react to other teams adjusting. Boston spend the first part of the winter trying to tank, then remembering they were the Red Sox and having an okay winter in the end. I don’t think it’s enough to really compete though, and I think the Orioles have a better chance.

AL Central

  1. Cleveland (92-70)
  2. Chicago (90-72)
  3. Kansas City (80-82)
  4. Minnesota (72-90)
  5. Detroit (57-105)

The AL Central is, as has been the case for several years now, a pretty weak division that really just needs one team to play well to finish first. Last year that was the Guardians and I don’t see any real reason why it shouldn’t be again. They were a young, pretty balanced, team that should be at least at the same level this year. They do have the same issue I mentioned with Baltimore that it tends to be hard to take another big step after the one they took last year, but unlike Baltimore, they don’t need to do much more than consolidate the gains they made already. The White Sox could put some pressure on them though. The White Sox have spent years acquiring individually talented players who played as individuals, completely skipped the fundamentals, and could be relied upon to underperform. That might change under Pedro Grifol. He’s not a miracle worker, obviously, but he has an attention to detail that has been very lacking in Chicago. I could see him doing with the White Sox what Schneider did with the Blue Jays and if that happens the Guardians might actually have to take another step this year. I could write a whole section for the Royals, but they also should benefit from a managerial change and I really think the boost from getting rid of bad leadership both in the coaching staff and in the form of toxic players is underrated. The Twins are hard to call. They arguably had a better year last year than their record makes it look; they were in the race late until a horrific collapse down the stretch. But at the same time, the Twins have seemed to defy gravity for years. I’m not convinced that the collapse was the anomaly. They don’t look like a substantially improved team from last year, though they’ve been proactive about shoring up some possible weak spots. I’m just very, very unconvinced by them and that’s without even factoring in the possibility that Carlos Correa has a season-ending injury at some point. I’m not that kind of doctor, but given how skittish his medical report made teams, I wouldn’t want to rely on him to carry the team. And then there’s the Tigers. I was tempted to leave it at ‘The Tigers also play in this division’, but I want to mention how completely baffled I am by their approach. They seemed to be really going in on a rebuild centred around some decent prospects. And then they also went and spend big money on the most over-rated player in the league, Javy Baez. I know I call a lot of players over-rated, because players in big markets do tend to be more highly rated, but no one really compares to Javy Baez. His season last year should not have been a shock. What was a shock was that the Tigers seemed to abandon their rebuild to give him over $20 million per annum. Now they’re stuck with that and prospects who might still develop, but who are at least a year away still.

AL West

  1. Houston (104-58)
  2. Seattle (93-69)
  3. Texas (74-88)
  4. Los Angeles (70-92)
  5. Oakland (48-114)

Just after the All Star Break in 2009, the Royals had a stretch of six games—all at home—in which they took a lead into the eighth inning four times and lost all of those games. There was a grim inevitability about the bullpen in those days. I mention this because it’s the same grim inevitability about the Astros winning the AL West. I don’t want it to happen, no one outside the Houston area wants it to happen. But although the Astros have weaknesses, they’re all the sort of weaknesses that might be a problem in a short playoff series, not a problem getting there in the first place. The race for second place is fairly open. The Mariners have been playing well for a few years and they finally turned that into a playoff berth last year. The system they built looks sustainable for a while, so they’re probably favourites for that runner-up spot just by default, but also I don’t see the other three teams as having improved enough to close the gap. The Rangers spent big in the offseason after having a miserable couple of years. They look like they’re really trying to test the extent to which a team can just buy their way into contention under the new CBA. I don’t think they spent particularly wisely though, and in any case they’d have to actually spend as much as the Mets to go from 68 wins to contention. Changing managers might help, but I’m also not sold on bringing someone out of retirement. I think they’ll be better than last year, but I’m not really predicting them moving into third place as much as I am predicting the Angels moving out of first place. The Angels have a couple of really good players you’ve probably heard of, the problem is the other 24 are average at best. More importantly, the Angels front office looks all at sea, though they’re hamstrung by the uncertainty around the ownership. Either way, I just don’t see them improving from last year, and probably sinking further. And then there are the A’s. I don’t even know where to begin with the A’s, partly because I don’t actually know without looking it up who is still on their team. I feel sorry for their fans and pretty furious at their owner, John Fisher, who is damaging the whole structure of major league baseball. The whole thing is a mess.

National League


NL East

  1. New York (104-58)
  2. Atlanta (102-60)
  3. Philadelphia (87-75)
  4. Miami (75-87)
  5. Washington (54-108)

Oh, while we’re on the topic of terrible owners making the sport worse, we have the Mets‘ Steve Cohen. Just because he’s going about it the other way doesn’t make it any less damaging. Much as I hope his attempt to go full Monty Burns fails, it probably won’t. The Braves should also be very good again. I’m not convinced that they’re a better team than last year, but they also won 101 games last year, so basically carrying that forward would normally be enough. Then there are the Phillies. The Phillies are hard for me to judge. They’re the defending NL Champions, but they’re also coming off an 87-win, third-place regular season and only made the expanded playoffs on the last weekend of the season. On the one hand, I want to give them credit, and say that the grand strategy of buying every DH to slug their way into an expanded postseason did actually work. On the other hand, it’s really hard to know if that will work consistently. (And also, I hate that being a viable strategy.) They added even more hitting over the winter, but I don’t know how much that moved the needle. It doesn’t really matter in their division. They’re not finishing above third unless one of the teams above them collapses and they’re not finishing below third unless one of the teams below them makes a miracle run. But the divisions don’t really matter in MLB now, and a few games here and there might make all the difference in the wild card race. The Marlins always feel kinda irrelevant (which has been fair a lot) but I think they’ll be a little better than last year and better than the lack of attention would indicate. They still have the reigning NL Cy Young award winner and made some other minor upgrades over the winter. But they’re also going to be outclassed by the division and league. The Nationals should play a full 162 games and continue their tribute to the old ‘first in war, first in peace, last in the American League’ Washington Senators.

NL Central

  1. St Louis (91-71)
  2. Chicago (84-78)
  3. Milwaukee (80-82)
  4. Pittsburgh (63-99)
  5. Cincinnati (61-101)

There are five teams in this division. Five! It seems like a lot, given that none of them really seem like a division winner to me. I’m going with the Cardinals; they still have some good players and anyway they just seem to always find a way to win the division, no matter how annoying it is. But they also feel like the shell of a good team. I’m going with the Cubs to make a big leap and move into second place in the division. I’m not super comfortable with making that prediction, but it’s partly a reflection of the other teams in the division. The Cubs did improve though, and having a solid defensive shortstop I think will really help them. The Brewers are an extremely frustrating team. They’re talented and they’re not that far removed from being a game away from winning the NL pennant. But they just seem uninterested in winning. I can’t understand their front office at all and that disfunction seems to bleed down to the team itself. The Pirates and Reds will prop up the table again; they finished tied last year and they seem very similar again this year. I think Pittsburgh have a bit more talent in the end, but they also have some leadership issues—Ke’Bryan Hayes taking his glove off to eat sunflower seeds whilst watching a play unfold around him is the prime example—so they might cancel out again.

NL West

  1. Los Angeles (98-64)
  2. San Diego (92-70)
  3. Arizona (80-82)
  4. San Francisco (77-85)
  5. Colorado (64-98)

Almost done! The Dodgers are not the team they were a year ago, but also the team they were a year ago won one hundred and eleven games. They won the division by twenty-two games. They don’t have to be the same team they were a year ago, at least in the regular season. The Padres are also basically the same team as last year and not really that different to the team they were the last few years. They won 89 games last year and haven’t won more than ninety games in a year since 1998. Even if they’re a little better than last year and even if the Dodgers are reasonably worse, it’s hard to think that San Diego aren’t playing for a wild card spot again. But there are a lot of those, so maybe it’ll work again. I actually put the Diamondbacks in third place. Partly because it felt too much like phoning in the last division to just set everything the same as last year, but they also really do have some decent young players. What I saw of them last year looked like a better team than the 74 wins they actually got. The Giants by contrast, looked like a worse team than 81 wins last year. I do think they improved in the offseason more than the big near-misses made it seem, but there’s only so much their pitching staff can do. And then I had to put the Rockies in last. I wish I could just quote some of the Rockies fans I’ve talked to over the last few months here, because they tore apart their front office more thoroughly than I could.

I’m not going to try to predict the postseason, because those short series especially are very unpredictable even when they start. But even in the extremely unlikely scenario that these are actually the playoff teams, there’s really no way to know what the rosters will be by October. That and it’s late and I’m tired and I want to work on my scorecard. (To be revealed tomorrow!)

Two Ways to Improve the World Baseball Classic

To start with, I like the World Baseball Classic. It’s fun, and it was genuinely tremendously exciting when Great Britain qualified. I also think most of the standard criticisms are a little overplayed. But at the same time, I am way more apathetic about it than I am about almost any other international tournament. It’s fun, but I was shocked when players were comparing it to the World Series. I would rather see the Royals when the World Series than… well, almost anything, actually. Insofar as the World Baseball Classic is meant to be a World Cup, it does not stand close to the Football, Rugby, or Cricket versions. I was thinking about why that is, and how MLB can make it better. Obviously this is my personal take on it; clearly a lot of people love it as-is and I don’t want to dismiss that. But I think there are a some lessons that MLB could learn from the three big World Cups (especially the Rugby World Cup, because I think that’s probably the closest analogue) that would make it better, even for the people who already love the tournament.

The biggest issue is that the World Baseball Classic is not localised. There is no host nation or nations like for other World Cups. I get the reason why; this way teams like Japan and Korea can play in front of their home fans, which otherwise never happens. That’s pretty cool! But the big problem with this was highlighted by a Tweet this morning from JP Morosi

Statistically this could be a coincidence, of course. Five straight is not many. But at the same time, it is a 13 hour flight between games. It would be like if the Cricket World Cup had a quarterfinal at Old Trafford and the winner had to play the semifinal at the SCG five days later. That’s ridiculous in any tournament, or even just a normal tour. At this stage no team should have a structural advantage. That doesn’t mean Japan won’t overcome it (though as I write this they trail 3-0), but it’s not a hurdle they should have to face.

I also think there’s an underrated aspect to how much having a host country makes it an event. The spectacle of seeing all the players converging on one spot, a big Opening Ceremony, and that sense of ‘the eyes of the world are on this’. There’s also the aspect of matches being on a pretty regular schedule. As a fan you can settle into a nice pattern of just looking to see what game is on in what time slot that day and I think that makes it a lot easier to watch and see new teams. For instance, during the Rugby World Cup if I look and see that the match on in the most watchable time slot is Romania v Tonga, then I’m watching Romania v Tonga, never mind that I don’t know any of the players. The distributed nature of the WBC doesn’t allow this; in this WBC I turned on Israel v Venezuela in the sixth inning because I had no idea what times the games were on.

This seems like the easiest place for the WBC to improve. The number of potential hosts is much closer to cricket or rugby, so hosting would be a once-in-a-generation thing, not a once-in-a-lifetime thing. You can grow the game and improve the tournament, and I really think MLB should do this for the next edition.

Another thing MLB might look at is the timing. I feel less strongly about this than the hosting of the tournament, but I do think running concurrently with Spring Training hampers the tournament a little bit. Not only are the tournament games not the only ones being played, you also still have things like pitch limits as players still build up for the regular season. If this is meant to be a World Cup, then there can’t be those sort of restraints. Of course, the clubs who actually pay the players and have the biggest stake in their health would never buy into letting them go 100% in mid-March. So maybe play it in late November/early December instead. It’s not ideal, but baseball players are so professional now that if they wanted to—and it’s clear that a lot of them do—there’s no reason they couldn’t still be conditioned at that stage of the year.

The obvious counterpoint is that the weather sucks in November and the players will be tired after a long season. But there are domed stadia in the northern hemisphere and if someplace like Australia hosted then the weather would be fine. And I don’t think the players being tired is worse than the players being not yet fully conditioned. The latter is something that can be worked through, but the former gives the group-stage games a very Spring Training feel. It’s also worth noting that the Rugby World Cup does actually take place near the beginning of the northern hemisphere season and it’s not really an issue. Playing in April might actually be the long-term solution, but in the short-term I don’t think it’s feasible to adjust the MLB schedule enough—there’s not enough leeway—and there would certainly be a lot of pushback. I think November/December would be best; there’s no other baseball on then and the human rights abuses World Cup last year showed there is a niche for a big tournament at that time of the calendar.

This does get to the limit of what baseball can learn from other World Cups though. All World Cups are unique because all sports are unique, but there are a lot of things about baseball that just don’t have an analogue in any other sport. One is the marathon, every day, nature of the baseball season. This makes scheduling the tournament hard, but it’s also an integral part of the game that—much like Test cricket—does not lend itself to a knockout tournament. Also, rugby and cricket have many domestic leagues that all have about the same stature, which adds to the global feel of any international match. Baseball really just has MLB* and with all the best players already in the same competition it does take some of the uniqueness away. The WBC is way better than the All-Star Game, but also there’s a reason the All-Star Game is a shell of what it used to be**. Approached smartly, I don’t see either of those as being things that will make the WBC worse, just different. But MLB does not have a great track record of approaching things smartly, and one glance at the history of the Cricket World Cup shows a number of ways things could go very badly.

*There’s a case to me made for the Japanese leagues being comparable to MLB. Obviously this is something that can’t really be quantified, so your mileage may vary, but since the best Japanese players still come to MLB I don’t think the Japanese leagues currently have the same stature. It’s not obvious though.

**I actually think one of the cool things about the WBC is that it taps into some of what made the ASG cool before the differences between the leagues were eroded. But it does make for a bit of a caution for MLB to not let this go the same way.

Lastly, I do want to make it clear that I don’t think the complaints about injuries have legs. This is the kind of tired complaint I always hear and I think it’s pretty short-sighted. I’m sure Mets fans disagree and I don’t really blame them because there’s no reason they should care about anything other than what is good for the Mets, it’s in the definition of ‘fan’. I’m sure I would be furious if Bobby Witt Jr, Brady Singer, or Salvador Perez got injured. But I would hope I would also rememeber that Salvy also got injured slipping carrying his luggage a few years ago. Someone—and I forget who—got injured falling through a roof one offseason. Weird things happen and weird injuries happen and you can’t blame the context. The WBC is a good tournament, and I think some small changes could take the it from a good tournament to a more universally-recognised centrepiece of the game.

The new schedule is easier for the Royals, but MLB still somehow scored an own goal

This started as part of the post on the rule changes, but then I realised it was really its own category. As part of the new CBA, MLB released what I keep hearing described as ‘the new balanced schedule’ late last year, with expanded interleague play and less intra-division games. The first thing to note about it is that it isn’t actually balanced, it’s just less unbalanced than the schedule that’s been used since 2013. Teams still play their own division more than anyone else, for example the Royals play 13 games against Detroit next year and 15 games total against the entire NL West. The fact that this seems to continually escape the notice of analysts is kind of baffling to me. The other day on MLB Network they were talking about how it would be harder for the Royals next year, playing fewer games against the AL Central and more against the big teams in the National League. I get why, on the face of it, one would think that, if the schedule were actually balanced. But it’s not. Yes, we play the Tigers six fewer times, but we also play Cleveland six fewer times and the games against big teams like Los Angeles and San Diego are balanced out by games against Pittsburgh and Cincinnati.

Of course, we can quantify this instead of just vaguely saying there are good teams and bad teams. This is a bit of a digression, but I like numbers and analysis, so if you want to skip the next couple of paragraphs I don’t blame you. Under the old schedule the Royals would be playing the NL East in interleague play, plus four games against St Louis, our ‘natural rival’. (Much more on this later, because it’s surprisingly important.) We’re still playing the NL East, and also playing three games each against the five teams in the NL West and the other four teams beside St Louis in the NL Central. Those extra 27 games mostly come from playing six fewer games against the other four teams in the AL Central, though we actually also play one fewer game against the NL East than we would have under the old schedule. That leaves two games unaccounted for, as far as I can tell they come from playing one fewer game against each of the AL East and West. There’s always a random element to which of those teams we played six or seven times, so there’s no way to know exactly which teams we would have played one extra time. That makes for a bit of uncertainty, but uncertainty always exists and it is important to acknowledge it and ideally quantify it, not just ignore it.

The standard way to judge strength of schedule is to just aggregate the winning percentage of the opponents weighted by number of games and compare before and after. This has the advantage of being simple, but it doesn’t really work because it blurs the distinction between how good (or bad) a team is and how many times you play that team. For example, The Los Angeles Dodgers were 111-51 last year. The two teams at the bottom of the NL West, Colorado and Arizona, were 68-94 and 74-88, respectively. The combined winning percentage is above .500, specifically .521, but if you played each team the same number of times—as the Royals do next year—that’s twice as many games against teams below .500 than above! Clearly averaging winning percentage doesn’t work. Instead you have to classify opponents (which you can kind of see in that above example—teams were classified into above and below .500) and see how the number of games against different classes changes. I’m going to use 70 wins as that’s decently close to the Royals win total last year. The only important thing is that a game against Cleveland or Detroit counts the same as a game against Los Angeles or San Diego. Of the 27 games lost from the old schedule, the Royals had at least six against teams with fewer than 70 wins last year, the six against Detroit. There might also have been two more, depending on which teams in the AL West and NL East we would have played one more time. (The AL East had no teams with fewer than 70 wins, so it doesn’t matter.) Of the 27 games gained under the new schedule, we actually have nine such games, against the Reds, Pirates and Rockies. It’s a pretty small difference, but the schedule is actually slightly easier for the Royals next year. Again, that 70 win number is an arbitrary one. But the answer actually doesn’t change even as you move the threshold around: The more balanced schedule is at worst the same and at best slightly easier for the Royals in 2023.

It’s also important to note that the Royals would normally play the NL East next year, which had two 100-win teams and an 87-win team, so the comparison might be different in 2024. But it’s useful to demonstrate two things. One is how superficially a lot of the analysts are approaching the new schedule, which bugs me. The other is how small the actual difference in strength actually is. It doesn’t really matter, and that’s without even getting into the fact that even a lopsided matchup in a single baseball game is a lot more even than most other sports.

Anyway, I’m not nearly as annoyed about the change itself as I am with the discussion about it. I have nothing against playing all 29 teams in a year. I’m enough of a traditionalist that I don’t really like interleague play and now that the DH is universal (grumble) there’s no extra appeal to playing in an NL city. But at the same time, I remember when I was a kid and how excited I was to see teams and players that I had never seen before come to the K. I always insisted we go to the game when there was an NL team in town. And I still like seeing new teams come to town. I have a goal of seeing every team play in person, which I’ll be able to achieve a lot quicker now. So I’m fine with that.


MLB did not do a good job of actually implementing this change. To be fair, it is hard to build a good schedule for all thirty teams, especially without changing the total number of games, which at this point is probably a non-starter*. But this year the Royals have back-to-back off days at the end of May/start of June and a Sunday off day in August. Back-to-back off days are annoying, but not really an issue. The issue is a Sunday off day. I know that way back when there were Sunday off days and Monday double headers, but that’s not what’s happening here. This is just a Sunday afternoon in August with no baseball, which ought to be illegal. (And I don’t mean against MLB rules, I mean there should be a federal law against this, along with the Constitutional amendment banning Astroturf and the designated hitter.)

*I say this because 162 feels like one of the game’s sacred numbers now, but emphasis on now. For almost sixty years the season was 154 games long, and it was only changed to 162 because of the change in the schedule necessitated by expansion to ten-team leagues—it was a balanced schedule of 18 games against each of the other nine teams. But when the leagues continued to expand the length stayed at 162 games. It’s probably not going to change again, but it might make things easier.

This is particularly frustrating because as difficult as schedule creating is in general, this one actually has a pretty easy solution. Both of those weird off days come about because of a two-game series against St Louis being put into a slot for a three-game series. But it would be very easy to make both into a three-game series! First off, both are mutual off days; the Cardinals could play us without them having to move another game. That would make for a 164-game schedule, which we don’t want, so we have to take away two games from elsewhere. Luckily, as mentioned previously, we have some ‘extra’ games against AL East and AL West opponents. Of those ten opponents, we play six of them six times (two three-game series) and four of them seven times (a three-game series and a four-game series). There is no reason we could not make two of those four-game series into three-game series, preserving both the 162-game schedule and the conventions of playing every Sunday and not having back-to-back off days. Hopefully the front offices complain about the lost revenue from weekend attendance and this gets fixed next year, because it is very easy.

New Rules in MLB in 2023

After listening to a week’s worth of games and watching a few, I wanted to give my initial take on the new rules. Of course, it’s not just new rules this year, it’s also a new scheduling system that I have heard a lot about, but that’s a different post. For now, I’m just going to focus on the rules.

The big thing this year is the pitch clock, but I actually want to address the new bases first. These have mostly been an afterthought, because on the face of it, they don’t really change much. I can see on TV that they are bigger, and yeah, sure that means the distances are a little reduced. Maybe that means more steals or infield hits (although the distance is shorter for the throws too) or whatever. But I doubt that’ll be noticeable. The reason I am starting with these is that I am really hoping the new bases help with the one place MLB need to change the rules and didn’t: the slow-motion replays of runners coming off the bag for a split second. This has been one of the most frustrating things about the sport in the last few years, mostly because every umpire interprets ‘clear and convincing’ differently and you could have the same play called two different ways on successive days. But this is also one of the few places where I think the application of the letter of the rule is actually contrary to the spirit of the rule. There’s nothing that a runner can do differently or better to stay on the bag—an impact at that speed is going to jostle the runner no matter what—and I’ve never thought it was fair to punish them for being subject to the laws of physics. The flip side of that is that no one (that I know of) wants runners being able to gratuitously overslide with no consequence. Ideally a rule change here would just restore the previous status quo. This probably reads like a bit of a digression, but it’s relevant because I really don’t know how the bigger bases will impact this, if at all. But there’s a reasonable chance that by giving runners a bigger target they have more chance to keep contact during the impact or more room to make it harder for a fielder to keep the tag on. It’s not a perfect solution—to be fair, I don’t think there is one*—but maybe this will help.

*The best idea I’ve had so far is simply to make that aspect of the play off limits for review. If the umpire can see the runner come off the base in real time, fine. But if the effect is so small that it takes replay, then there’s probably nothing the runner could do and it should not be reviewed.

Okay, so the big noticeable changes this year: Firstly the pitch clock, of course. Most people who follow me on Twitter will know I have been in favour of this for years, because watching some relievers pitch is just painful. But there are some aspects to it that are probably necessary for the concept to work that do introduce some unfortunate wrinkles. The basic premise—that the pitcher has 15 seconds with no one on and 20 seconds with runners on, and the hitter must be ready with eight seconds remaining—is great. It’ll cut down on relievers taking forever and it’ll cut down on hitters faffing about with their gloves between every single pitch. But with this comes the stipulations that the pitcher can only step off the rubber twice without recording an out and the hitter can only call time once. I understand the necessity of this, otherwise players could completely circumvent the rules at will. But the limits on stepping off the rubber and throwing over might have some huge knock-on effects. The onus is mostly on the pitcher to control the running game, and for all the talk about the larger bases being an incentive to steal, taking the threat of throwing over away from the pitchers will do a lot more. (Even as I write this I watched a player steal third almost unopposed because the pitcher wasn’t doing anything to hold him on.) MLB wants to increase stolen bases, so they probably see that more as a feature than a bug, but I am a little less convinced. Stolen bases are fun, but partly because of the difficulty and risk. Diminishing the pitcher’s ability to control the running game felt before the start of games like tilting the scales too much, and maybe it will be, but it’s been okay so far. Though the first dozen or so games I’ve watched or listened to, I only think it’s been relevant once or twice. I definitely think the pitch clock overall is a net positive, and certainly when I was planning an outing with some friends of mine who are more casual fans it was a selling point that a Saturday game starting at six would probably be over by nine.

The other big rule change is the shift, or lack thereof. I care less about this, partly because I don’t think it’ll make a huge difference. The argument about the shift usually centres on the batting average of left-handed pull hitters, but advanced analytics have basically meant that left-handed pull hitters aren’t judged on batting average anyway. (This is a topic I’ve been slowly and vaguely writing about, but it’s more time-consuming than I thought.) So what’s the point of having or not having the shift? Just from a fan’s perspective I think the biggest difference will be the end of the frustration of watching your pitcher make a great pitch, induce weak contact the other way and have it be a hit because the field was set for a bad pitch instead of a good one. But in practice it might just make pitchers even more single-minded about strikeouts. I suspect the most it’ll be talked about is if or when a team actually gets called for a violation early in the year.

It’s also technically a change that the extra inning Manfred runner is now permanent. It’s a stupid change and I hate literally every aspect about it, not least that it’s ‘solving’ a problem that barely existed and to the extent that it did exist could be solved in any number of better ways. I’m not going to dignify it with a lot of attention, but it is important in that it shows what a low bar MLB has for ‘success’ for these new rules. (Or, equivalently, what a high bar there is to actually dropping any of these rules.) Unless any of the important rules dramatically and unarguable backfire, I expect they will all be made permanent, and that’s the one aspect of all this that I really dislike. MLB does not seem interested in reconsidering at any stage; we all knew for months that these rules were coming in no matter what and it’s clear that they are basically permanent.

Answering Questions With Data

No, not that Data. (Source: Wikipedia)

When we try to apply statistics to sports, or to anything else for that matter, what we are doing is trying to answer a question with data. The data part of that is obvious, but what’s less clear is usually what the question is. But there is always something we are trying to know or understand more clearly. This process is also at the heart of all research science and the issue of what questions should be or can be meaningfully asked is actually a very difficult one and it often takes years of experience to do this correctly so that your data are not fooling you. This is why research articles in science journals are written so weirdly and often with so much jargon. There are a lot of tiny distinctions that we easily conflate in everyday language that are vitally important when doing research. This is true in any data-based research, including analysis of sporting statistics. This is always an issue and part of the conflict between old- and new-school statistics really boils down to misunderstandings of what questions the data are actually answering.

One of the most common distinctions that gets lost in sport (and in everyday life, really) is the difference between statistics that tell you how often something has happened in the past and the odds that something will happen in the future. All sporting statistics are the former. If a batter in baseball has a .300 average it tells us the rate at which he or she has got a hit so far in a season or career, specifically three times out of ten. If a cricketer averages 45 with the bat it is the same thing: in the past that cricketer has averaged 45 runs for every dismissal at the relevant level.

Such stats tell stories, often extremely effectively. If I tell a baseball fan that a hitter had a .287 batting average, 12 home runs and 58 RBIs, that immediately gives a sense of the hitter, albeit an incomplete one. I could add to the story by saying they scored 93 runs, or had an OPS of .671 or some such. All those tell us about the player without ever having to watch a single plate appearance. Extremely importantly in the ongoing (and probably never ending) debate of old- versus new-school statistics, all of them fall into this same category of frequentist statistics. One person might understand a .287 batting average better than a .671 OPS and people might (okay, do) disagree about which is more important, but they both tell you about things that already happened.

The problem is, the question of what happened in the past isn’t usually the question to which we want to know the answer. It’s great to know that our hypothetical hitter has got a hit in 28.7% of official at bats for the year, but usually what we want to know is something like the likelihood of said hitter getting a hit in their next at bat, or at what rate they will get a hit next year. It’s fairly obvious that the latter is not the same, but it’s less obvious&mdash but just as true&mdash that the former is not the same either. And this is why I started with the importance of knowing what question we are asking of the data; it is extremely common to see people not just in sport, but when dealing with probability in general to assume that the frequency of an event happening in the past is the same as the likelihood of that event happening in the future.

This distinction is the impetus behind a lot of advanced stats and Sabermetrics. I do have some issues with Sabermetrics, but on the whole I quite like it. (This surprises a lot of people, but it is true.) Part of that is just having a natural affinity for playing with huge datasets from a sport I love, but also most people behind Sabermetrics understand this distinction and a lot of other important scientific principles to working with data. They are very good. The issue is that most people in the media and even a lot in front offices don’t understand that distinction, and then completely misapply advanced statistics.

This is why I have started here. In practice people are almost always going to use frequentist statistics to approximate likelihoods. The alternative is building a proper Bayesian formulation, and whilst that is increasingly feasible, it’s still well beyond what most people can do, or even find it worth doing. But what’s important is understanding when we are using frequentist statistics and what their limitations are.

(Re-) Introduction

Hello. Or hello again, as the case may be. If you’re coming to this as a new blog and never read The Forward Defensive you might wonder why a ‘new’ blog is clearly not that new. If you know I used to write a blog ten years ago you might wonder what the hell happened and why that blog is here now. Either way, I felt I should stick some sort of re-introduction before just writing things.

About eleven years ago I decided to start blogging about sport and primarily cricket. I started The Forward Defensive because I was spending a lot of my time watching sport and had more options and analysis than could fit on Twitter. (This was in the days before threading.) I really enjoyed blogging; I enjoyed the excuse it gave me to watch even more cricket at even stranger hours in the middle of America and I really enjoyed virtually meeting so many people with a shared interest. But when I started grad school in 2013 I stopped having the same amount of time to devote to writing and in particular I stopped having the same amount of time to devote to watching. I still did watch sport, of course—anyone who has followed me on Twitter for the last decade will be keenly aware of this—but I didn’t watch as much and I didn’t have the same energy to devote to opinions about it. I don’t know if you’ve heard this, but grad school is exhausting. It also doesn’t pay well, and eventually I couldn’t justify continuing to pay for hosting fees, so I made a backup of the data and let the hosting expire. (I thought I had kept the relatively cheap domain name registration though, but apparently not.)

Great, that explains why the post before this is a half-baked one from 2014 about Mike Moustakas being sent to Triple-A that I wrote from a room at a now-defunct radio observatory in eastern California. But you may ask yourself: ‘Why is there now this post?’ ‘Why is the name of the blog different?’ ‘How do I work this?’ ‘Where is that large automobile?’

The short answers are that I’m restarting (for lack of a better word) the blog now because I finished grad school in May and now have a job that affords me a lot more money and spare time. So I’ve gone back into watching sport in detail and I have gone back to having thoughts and opinions about sport again. I noticed this in earnest during the recent MLB postseason when I realised I was texting unsolicited long-form analysis to family members every night, usually because someone in the broadcast media was completely failing to understand (or at least convey an understanding) of the nuances of statistics and mathematics. Consider the return of this blog an effort to spare my loved ones from getting series of paragraph-long texts from me every time I sit down to watch a game.

The name is changed for two reasons. Firstly and most importantly, it is because—as alluded to above—I accidentally let the domain name registration lapse. As soon as that happened it was picked up by a dodgy-looking reseller to whom I have no intention of giving money. I could have just changed the URL to some close variant of it, but at the same time the blog is not going to be quite what is was previously either. A new name seemed a better option. I’m still going to write about whatever sport I happen to have an opinion on (probably one of the first posts is going to be about the ongoing World Cup) and I’m still going to write some stuff that is just an opinion. But I’m going to focus more on deep analytical dives. I have almost a decade of experience as a professional scientist now including a PhD in Astrophysics. I have tools and experience to apply that I couldn’t even imagine when I was blogging a decade ago. This also ties back into my motivation to start blogging again in the first place. There is no shortage of modern statistical analysis in either cricket or baseball, but despite the fact that every industry now has data scientists to perform customer analyses there does not seem to be any effort to upgrade from advanced statistics to data science in sport. In practice, this also means I am probably going to end up focussing more on baseball than I used to. The data collection is a lot more thorough and organised in baseball and the records are easier to access. (Also, although both sports have inexplicably become harder to watch in the modern media landscape, cricket more so because I have less flexibility to deal with the time zones involved now.) This is why I chose the name Defensive Indifference. It is a similar style and has a direct nod back to The Forward Defensive, but it’s a baseball term.

Other than that, it’s the same as it ever was.

(NB: This post will be pinned for a little while, until there are enough new posts that it isn’t necessary any more. A summary of the relevant information is still in the About section.)

Moose sent down

I’m a day or two after the party on this (see the end of the last post re: being in the mountains of California), but I did see that the Royals finally sent Mike Moustakas to Triple-A Omaha.

It had to happen. Although he has shown glimpses of breaking out, the fact remains that it is the last week of May and he is yet to have a stretch of consistently performing even passably well. Even worse, between the few games where he has looked like remembering how to bat (eg, the 3 RBI game against Colorado) he has looked completely lost. I lost count of the number of times in the past few weeks I saw him strike out swinging at a pitch well outside the strike zone. He has clearly been straining mentally and it was not getting better. Sending him to Omaha was certainly the best thing for him and the club at this point.

How long he will be there is the interesting question. He batted superbly in Spring Training, so I would not rush to recall him even if he hits .500 for a couple or weeks. Danny Valencia has performed decently and can has played every day at the major league level plenty, so he can hold the third base spot for a while. I would let Moose play at Omaha for at least a month, even if he looks like he has got things back together right away. I suspect he needs time to really settle in, find his form again and just put the first seven weeks of this season out of his mind.

Roses preview

After a two year wait, there is finally going to be a four day Roses match this week. It has to be said though, that unless Lancashire play a lot better than they have shown for most of the start of the year and especially better than a fortnight ago against Middlesex, the match may not be worth the wait.

The match against Middlesex was not quite a shambles, but our batting effectively failed again. Whilst it was our highest first innings score this season, it was not nearly enough on a fairly flat wicket on which our bowlers toiled. Middlesex admittedly batted well, but it is worth remembering that going into that match they had faired little better than we had with the bat. Even with Kyle Hogg returning, only scoring 266 in a flat wicket in the first innings was simply not enough. Much as I hate to say it, Yorkshire look to be a strong side and they will provide just as much of a test as Middlesex did. The batsmen in particular will have to rise to this challenge much better.

Whether or not that will actually happen, we will have to see. The signs in the three day match against Loughborough MCCU were mixed, but it did seem a bit more of the same: a poor first innings total bailed out by a good performance with the ball and then a better batting display in the second innings. I don’t think that will be good enough against Yorkshire. Unfortunately there isn’t really an obvious solution. Luis Reece and Karl Brown both scored second innings runs, but that isn’t really a cause for optimism as much as it is a reason not to drop them.

If we do manage to get some runs on the board, I would back our bowling to be able to make inroads, but as we saw against Middlesex, that isn’t a guarantee. Jimmy Anderson will be absent again, though an attack of Glen Chapple, Kyle Hogg, Tom Smith (on current form) and Simon Kerrigan should be quite capable. It might be worth playing Kabir Ali as he has looked fairly sharp over the start of the season, but ultimately I would prefer not to weaken the batting any more.

I won’t actually be able to follow this match very closely though; I am currently at a radio observatory in the mountains of eastern California and will be throughout the match. This post was actually supposed to go up days ago, but due to packing and travel I could not quite finish it. I’ll be seeing score updates and my fingers are crossed, but I do worry that the typical turgid draw of a Roses match may be the best case result this time.

Lancs’ batting woes

Lancashire have played a quarter of their Championship matches this season and although it is still certainly early there are some areas of concern. Although our record (one win, two draws, one loss) is not really dire on the face of it, both draws were losing draws. We were saved by bad light against Warwickshire (admittedly after putting up a good fight) and by rain against Sussex. The bowling has been decent so far; the problem has very much been the batting. The extent to which we have struggled with the bat is highlighted by a glance at the Division One table; we have just one batting point from four matches. That by itself has actually cost us a place; our record is better than that of Nottinghamshire, but they have managed ten batting points which is enough for them to sit in sixth whilst we are in seventh.

Paul Horton has batted well at the top of the order, but then the entire middle order has consistently struggled and the fact that we scored enough runs to beat Northamptonshire was down largely to the efforts of Jos Buttler and Tom Smith down the order. Luis Reece still has promise, but he is yet to do in the first division what he did in the second last year. Andrea Agathangelou was dropped after the first three matches, but at least against Sussex Karl Brown and Steven Croft did not fare any better. Possibly most worrying is that Ashwell Prince has done very little to follow up his century in the opening match. Even before the season started it was clear that we were going to be relying on him to stabilise an inexperienced batting order and our struggles are directly tied to his struggles.

There isn’t an easy fix to this. It is reasonable to expect that a batsman of the potential of Reece will find some form as the season goes on and the same will likely be true of Prince. Brown and Croft have only had one innings and so might improve, but at the same time there is a reason they did not play at the start of the season. The only real active step Glen Chapple and Mike Watkinson can take right now is to try to find an overseas batsman for the remainder of the season. Simon Katich did an excellent job last year in that role; right now we really need someone who can do that again. There are unfortunately no obvious options and the fact that we are five weeks into the season with no overseas signing suggests that most of the less-obvious ones are not interested either. So it looks like we will be spending most or all of the summer hoping our current batsmen remember how to bat. Our bowling is good enough and there is enough promise in the batsmen that this isn’t a disaster, but I worry it will mean a pretty nervous (not to mention frustrating) summer in the bottom half of the table.

There is some good news ahead of tomorrow’s match against Middlesex, however: Kyle Hogg has recovered from the injury that kept him out of the first four matches of the season. Although Jimmy Anderson is unavailable after playing against Scotland this weekend, it does mean a return to something close to our first choice attack against a Middlesex side whose batting has almost been as frail as ours. If we can bowl first we have a good chance to bowl them out cheaply and then we might be able to ease some of the pressure on our own middle order. Fingers crossed…