Skip to main content

Shot data for the top 5 European leagues - 2019/2020

Not to sound too nostalgic, but when i first started messing about with football event data I was manually plotting Oxford shot locations and visualising in Tableau. Access to to data was limited, however we currently have more access to data than has ever previously been available.

A few resources:

- Statsbomb (covered in previous posts)

- Canadian Premier League (yet to dive in but a super interesting league and regularly updated data)

- Wyscout (top 5 european leagues 2017-2018)

- UnderstatR (will use this in this post)

This tweet from Sushruta Nandy prompted me to write this.

Anyway, the plan:

- Load up Tidverse and UnderstatR

- Run 2019 team/player data 

- Extract 2019 x/y shot locations

- Save to .csv

Here we go then....

1) Load in Tidverse and UnderstatR

2) Set the working directory (will need this later when saving the data to .csv - just select the path you wish for the file to be saved to)

At this point, you should have something pretty straight forward:





Right, lets dig into the UnderstatR package:

We can use the get_leagues_meta() to observe the available leagues. Run this and you will see leagues are available from 2014-2019 (plus the start of a few 2020 seasons at the time of writing!). This includes the EFL, La Liga, Bundesliga, Ligue 1, Serie A and RFPL.

As we are focussed on the top 5 European leagues we can drop the RFPL using the dplyr filter. At this point we should have:












Check 'leagues' and you should have:
















Great - now to pull team data:

team_data<-map_dfr(unique(leagues$league_name), get_league_teams_stats, year = 2019)

This will use the purrr package to cycle through each unique league name and pull the team data for each match within the top 5 leagues in 2019. We end up with a data frame of 3450 rows including individual match xG, xGA, NPxG etc. 

Call team_data to have a little look - this is useful in itself but we can take this a step further to obtain player data:

player_data<-map_dfr(unique(team_data$team_name), get_team_players_stats, year = 2019) 

Once more this runs through each team getting the 2019 players stats. This may take a minute, however once done this will create a database of 2732 players and their summary stats for 2019:




This is that good stuff. You can go to town filtering by minutes played, play around with xG etc etc. For example:





I quickly created the above in ggplot using mutate() to create some P90 stats. Very quick and basic but you get the idea!

Another example:


Nothing revolutionary but in a few lines of code you've pulled Understat data and throw it into a vis...obviously get as creative as you wish - this data set alone can give you plenty to dive into. 

Finally, the real juicy stuff - x/y shot locations. A pre-warning, this will pull every shot of the 2732 players 2014-2019/20...so once you hit run on this grab a drink and relax.

First, create a vector of player_id (this is required in the UnderstatR get_player_shots() function.

players<-c(player_data$player_id)

Yup yup - now to pull the event data:

shot_data <- players %>% 
  map_dfr(.,possibly(get_player_shots,otherwise=NULL))

Hit run and let watch that baby goooooooo.

Okkkkk, we now have 212k shots in our shot_data() database! Lovely stuff. 



Take a quick glimpse and you will see we have loads of useful stuff. The shot's x/y, outcome, individual shot xG and the player that assisted the shot. All pretty useful and good fun to play around with. At this point, I would save to .csv so you can simply load the file in future. To do this:

write_csv(shot_data, "top_5_shot_data.csv")

This will save the shot_data() as "top_5_shot_data" in your earlier defined file path (for me, 'Downloads'). 

At this point you are into the fun stuff of plotting. As always, the ggsoccer package is a great little start for plotting this sort of thing. A few ideas:

Vardy outperforming his xG each season (you probably want to strip penalties out of this):




Douglas Costa shot assist locations in 2019:



All pretty basic and stuff you see regularly but access to the data allows you to get creative. 

There we go, 10 lines of R code to grab a whole load of data. Full code below:


Thanks to Sushruta Nandy for inspiring this, Saintsbynumbers, and ewen_ for the package!

Always credit any of your data and check usage permissions.

Let me know if you need anything, enjoy.



















































































































Comments

  1. This comment has been removed by the author.

    ReplyDelete
  2. Could you please share the code for plotting "Douglas Costa shot assist locations in 2019"?

    ReplyDelete
  3. CL 4K UHD Video Player - all video format player – App

    CL 4K UHD Player- High Quality Video Player app allows you to watch your favorite movies, shows and other videos in Ultra HD quality. You can watch high quality videos from your device or SD card and stream from web. This app also works as whatsapp status downloader.

    Install CL 4K UHD Player- High Quality Video Player on your android device and enjoy 4K ultra HD videos anytime, anywhere. all video format player

    ReplyDelete

Post a Comment

Popular posts from this blog

Getting started in R with StatsBomb Data

As always, I should caveat that I'm not an expert either in football or programming...I started learning R in December and have gradually reached a 'mildly competent' level. This will go through installing R, loading the StatsBomb data, then plotting a pass map - something like this: Anyway, away we go. Thing number 1 - install R. There are two things to load...the R 'base' and Rstudio. You can download Rstudio here: https://rstudio.com/products/rstudio/download/ The first 3 minutes of the below shows the process: https://www.youtube.com/watch?v=BuaTLZyg0xs&list=PL6cDc8Xxld162nSsZ14bQnFn1cYStsrtk&index=2&t=0s That is now hopefully R loaded. Open Rstudio and you should be greeted with something like this: Press the arrow areas to reveal: Under the 'Packages' tab select 'install' and search 'devtools'..install package. Repeat the previous step however search 'tidyverse'. Next steps are to load in th

Using Wyscout in R

It's pretty clear that within a football setting, clubs are largely using the same data. Most clubs will be using Wyscout/Instat...others may have access to StatsBomb and Metrica. None the less, data quality discussion aside, Wyscout is used predominantly to quickly gain an overview of players (both from a video and data perspective). This dovetails with people up-skilling through the lockdown, taking various courses and becoming increasingly proficient in languages such as R and Python. This is a big asset within football! Those that have read previously know that I am self teaching R and sharing any learnings that may be of interest around football analytics to others. By no means am I an authority on this, I've just found something that works, that might help others...I'm always happy to be corrected! Anyway, the aim is to: - Download Wyscout data - Import into R - Clean the headers - Re-format the data from "wide" to "long" format - Some e

Shot Maps In R using StatsBomb Data

Im not sure if anyone is following these, but I will do one more and see what happens! I have covered some passing based stuff, I thought it might be useful to look into shots. Therefore, the rough plan for this piece: 1) Total player xG in the WSL for this season 2) Find the top 9 players based on xG 3) Plot all shots taken including xG 4) Add labels 5) Plot the shot map of the 9 players against one another As always, my coding is in the learning stage so this isn't a definitive way...just something that works for me and might help others! Anyway, load in this seasons WSL data as we have previously. We want to extract 3 things from the data - the number of shots, numbers of goals and total xG (initially including penalties) To start - tallying player shots: player_shots<-StatsBombData%>%   filter(type.name == "Shot")%>% ##filter all shots in StatsBombData   group_by(player.name)%>% ##group by player   tally(name = "total_shots"