Sportsball uses machine learning to build teams for daily fantasy sports. It is capable of predicting the performance of NBA or MLB players on a day-to-day basis.
I developed it as a side project, and you should be advised that it's very rough around the edges.
It does OK, but not great. Here's a graph of Sportsball attempting to predict pitcher performance across a series of games for Major League Baseball during one stretch of the 2015 season.
To read this chart, take a look at the top right box first. That has Sportsball's predictions on the x-axis and the actual number of fantasy points earned by the pitcher on the Y-axis. You can see there's a significant correlation, but of course it's not perfect.
In this graph, I've also included some (but not all) of the features that Sportsball uses to inform its predictions. You can see how well they correlate with each other. "NFPitcher: FP" is a consensus of expert predictions; as you can see, Sportsball's predictions tend to be heavily correlated with these.
Probably not. In practice, using teams generated by Sportsball to compete on daily fantasy sites (like FanDuel or DraftKings) results in slowly bleeding money. You will definitely win a fair amount of the time, but the expected value is negative.
Some major challenges that Sportsball has a hard time with:
- Detailed injury information is critical (especially in the NBA), hard to source in real-time, and often difficult for a simple algorithm to interpret.
- Past performance may have human-visible explanations (meaningless games? game already out of hand?) that are hard for a program to pick up on. At the very least you'd have to hard-code a lot of special cases.
- Finally, and most importantly: DFS grinders (the humans who play these games all day every day) are actually quite good. They use automated tools to help themselves, so it's not like a program automatically has an edge, either. If you want your program to make money from playing DFS in an automated fashion, the bar is high.
When Sportsball misses on predictions, it's hard to see how much is due to random variance at a day-to-day level, and how much is due to missing out on additional features. That humans consistently beat Sportsball suggests more features and data are necessary, and that the program is far from as good as it could be.
It's true: Sportsball did do slightly better than the experts at predicting pitcher performance in the sample above. But remember, these are the experts who are publishing their predictions online in advance of the games being played. Not only are their predictions freely available to human players, as well, but you have to figure that if the "expert" predictions were actually that good, they'd be making money off them instead of distributing them.
And again, the grinders are good at this. FanDuel shared an estimate of how many points it usually takes to win money in one of their NBA games (the columns correspond to how many real-life NBA games are included in the fantasy competition).
For comparison, Sportsball is pretty good at getting above 270-280 points, but struggles to consistently beat the line to cash.
Plus, pitcher performance is actually a lot easier for Sportsball to look at than batter performance. It's more consistent overall, since pitchers have more chances to affect each game. And because of how important getting a win is, the Vegas line (which is a good predictor of pitcher wins) is extremely useful in suggesting which pitchers are going to do well. Batters are harder:
Here "BatterTarget" is the actual points scored by the batter, and you can see the correlation is... not very good (so first column, second row is the cell you want to look at this time). I've included fewer features in this graph to highlight how the predictions struggle to deliver value over the experts. You can maybe talk yourself into believing that cell shows a correlation... but it's pretty minimal at best.
I haven't really put in the work to make this easy to use -- it's more of a personal project and now a bit of a demo. So be warned!
I'd start by editing dfs/__init__.py
and setting up the directories where you're keeping this project.
Then (setting up a virtualenv if you want) run setup.py install
and... you're built! (realistically you will probably have to fix some errors before this works)
The basic flow is
- Run scraping scripts to acquire data
- Run main pipeline to build model
- Run "live" pipeline to make predictions for a specific game
Looking at the contents of setup.py
will show you some of the CLI tools and what they do. The main "pipeline" scripts like pl-mlb
are used to build the models; scripts like pl-mlb-live
will let you use a model to generate predictions for an upcoming gameday.
Good luck! Let me know if you find this useful or interesting.
MIT