Introducing Understat, a Python package for revolutionary football metrics

in #understat5 years ago


understat.png



https://github.com/amosbastian/understat

What is Understat?

It's a Python wrapper for the website Understat, which provides revolutionary football metrics multiple leagues. An example of this is expected goals (xG), which is the main new revolutionary football metric, and allows you to evaluate team and player performance. In a low-scoring game such as football, the final score does not really provide a clear picture of the teams' performances, and this is why more and more sports analytics turn to the advanced models like xG, which is a statistical measure of the quality of chances created and conceded. Understat's goal was to create the most precise method for shot quality evaluation. They did this by training neural network prediction algorithms with a large dataset (>100,000 shots, over 10 parameters for each), and have now made this data available for the public!


This website has come up before in one of my previous posts, as some of its data was used in one of the features of my Reddit bot for /r/FantasyPL. Since it only used a part of the data that is available on the website, I decided it would be nice to create a Python package that makes everything available, and can easily be used by others who are also interested in this information.

https://github.com/amosbastian/understat/pull/1

Getting the data

As I mentioned in the previous post, they unfortunately do not have an API. So instead of using an API, the way the data is retrieved is by scraping their website for <script>s, and using a regular expression to match the data we want. Basically most of the data look something like in the picture below.



The data embedded in their website

It's pretty consistent across most of the pages, with the biggest difference being the variable's name. Because of this it was easy to create a couple of utility functions that could be used in most of the functions!



A couple of the utility / helper functions

Creating the functions

Once I had a way to actually get the data consistently, it relatively straightforward to implement the functions. I would basically go to each page that has information, open Chrome's developer console, and use the following to log all the <script>s:

Array.from(document.getElementsByTagName('script')).forEach(script => console.log(script));

and then go through them manually to see if they contained some useful information. For each bit of useful information I found I created at least one function in the Understat class. On their home page, they have the following chart for example:



Average goals per match, split by month

In the <script> tag of this graph there is a variable called statData, and so that is retrieved, matched and parsed by the functions in the first picture, using the get_stats() function in the Understat class. An example of its usage can be found below.



Usage example

This results in the following output (which is basically all information you see in the graph).



Example output

For some reason not all data on their website is available in the same format, and sometimes it's not really useful. Because of this, sometimes the data had to be cleaned up beforehand as well. For example, in the positional data for a player, for some reason they have the position as the key, and their performance as the value - instead I changed it to return a list of dictionaries with the position simply a key value pair in the dictionaries themselves.

Adding options

I didn't want to just return the data and let the user go through the trouble of filtering it afterwards. After thinking about it, I thought of a way to dynamically pass options (with the responsibility being left to the user) using either an optional dictionary with specific options, or by passing keyword arguments. For example, if you wanted to get all players playing in the Premier League for Manchester United in 2018, then you could use the following code:



Basically how it works is that the **kwargs team_title="Manchester United" results in the same dictionary {"team_title": "Manchester United"}. The filter_data() function then takes the data and returns all dictionaries for which this key value pair is true! It's a pretty nice way to let people decide how to filter stuff, without having to check everything. Of course, it can be improved, because sometimes you will need to pass a more complex dictionary to get the information you want, which can be tedious and difficult for the user. For now, it's great imo!

Testing!

I wanted to make sure the output of all functions is exactly how I want it to be, so I also wrote loads of tests. Also, since the website isn't mine, and it could change at any moment, it's pretty important to know exactly what they changed, and hopefully the tests will help with this.

Roadmap

I'll be posting about this package on Reddit and seeing what kind of requests come in, as I think this can be really useful for people who don't even play Fantasy Premier League and are just interested in football in general. I'm hoping that this will mean that people come up with some good suggestions or even decide to contribute. Another thing I will be doing is writing some documentation, as the filtering is left to the user, so it's pretty important to know how and what you can actually filter the data by - look forward to a post about this in the future!

Usage & installation

The recommended way to install understat is via pip.

pip install understat

To install it directly from GitHub you can do the following:

git clone git://github.com/amosbastian/understat.git

You can also install a .tar file
or .zip file

$ curl -OL https://github.com/amosbastian/understat/tarball/master
$ curl -OL https://github.com/amosbastian/understat/zipball/master # Windows

Once it has been downloaded you can easily install it using pip::

$ cd understat
$ pip install .

Import Understat and call its functions like so:

import asyncio
import json

import aiohttp

from understat import Understat


async def main():
    async with aiohttp.ClientSession() as session:
        understat = Understat(session)
        data = await understat.get_players("epl", 2018, {"team_title": "Manchester United"})
        print(json.dumps(data))


if __name__ == "__main__":
    loop = asyncio.get_event_loop()
    loop.run_until_complete(main())

Contributing

  1. Fork the repository on GitHub.
  2. Run the tests with pytest tests/ to confirm they all pass on your system.
    If the tests fail, then try and find out why this is happening. If you aren't
    able to do this yourself, then don't hesitate to either create an issue on
    GitHub, or send an email to [email protected].
  3. Either create your feature and then write tests for it, or do this the other
    way around.
  4. Run all tests again with with pytest tests/ to confirm that everything
    still passes, including your newly added test(s).
  5. Create a pull request for the main repository's master branch.

Documentation

Coming soon!

Sort:  
Loading...

Hi, @amosbastian!

You just got a 0.08% upvote from SteemPlus!
To get higher upvotes, earn more SteemPlus Points (SPP). On your Steemit wallet, check your SPP balance and click on "How to earn SPP?" to find out all the ways to earn.
If you're not using SteemPlus yet, please check our last posts in here to see the many ways in which SteemPlus can improve your Steem experience on Steemit and Busy.

Hi @amosbastian!

Your post was upvoted by @steem-ua, new Steem dApp, using UserAuthority for algorithmic post curation!
Your post is eligible for our upvote, thanks to our collaboration with @utopian-io!
Feel free to join our @steem-ua Discord server

Hey, @amosbastian!

Thanks for contributing on Utopian.
We’re already looking forward to your next contribution!

Get higher incentives and support Utopian.io!
Simply set @utopian.pay as a 5% (or higher) payout beneficiary on your contribution post (via SteemPlus or Steeditor).

Want to chat? Join us on Discord https://discord.gg/h52nFrV.

Vote for Utopian Witness!

Coin Marketplace

STEEM 0.24
TRX 0.11
JST 0.031
BTC 60936.15
ETH 2921.43
USDT 1.00
SBD 3.70