Skip to main content

Using StatsBomb data - part 3

I have more time on my hands so thought it would be good fun to show some steps from creating creating scatters to plotting event data to show those that show up favourably in initial searches.

The question that I posed myself was:

"How can we go beyond scatters to learn more about a players output?"

There has been an increase in scatter plots to show players that excel, however they don't tell the whole story.

As always, I'm still learning code myself so I'm sure there is code in here that will upset experienced coders! Sorry!

Anyway, a rough idea of how this will look -

- download data from fbref.com
- plot a scatter using the data
- filter the top 9 performers for one parameter
- plot the event data
- compare the player outputs

Lets goooooooooooo

Lets fire up:

library(StatsBombR)
library(tidyverse)
library(ggsoccer)
library(ggrepel)

We could create the P90 data for the WSL ourselves by using the StatsBomb data, however I'm being lazy by downloading from fbref.com. I'm looking at two simple metrics - shots P90 and shots assisted P90.

Once I've downloaded from fbref I remove the top row and save as .csv.

As the data we require sits in two different files, there is some preparation to do before plotting.

The below loads the key pass (df) and shot (df1) data and adds to a data frame.

From here, I selected the columns we require in both dataframes (df = Player, X90s, and KP) and df1 = Player, Sh.90. Finally, I combined the two data frames using left_join before filtering players that have only played >5 90s.


View(all.data) should return 4 columns of Player, KP, X90s and Sh.90

Now that we have the numbers we require in one place, we can move on to plotting. Personally, I like to throw a quick scatter together as it gives a quick overview - plus I tend to need something visual for it to sink in!

Anyway, ggplot to throw a scatter together. You can make it as fancy as you like, I just use geom_point and ggrepel.





Following this, we will just focus on shot assists. A very quick look at the scatter and in the bottom right we can see the players which produce the top shot assists P90 in the league.

We can take this a step further by doing a quick dplyr filter of the top 9 players. The code:

##find top 9 key passers
all.data=all.data%>%
  select(Player, KP)%>%
  arrange(-KP)%>%
  top_n(9)

The above selects the 'Player' and 'KP' column, arranges from highest to lowest (the minus sign arranges high to low), before only showing the top 9 players. You can obviously play around with the number, I just chose 9 for ease of plotting later. Your all.data show now look like below:




So thats the preliminary filter complete. We now know the top 9 shot assist WSL players. You could obviously do a filter on the fbref site, but where's the coding fun in that?

Anyway, on to the event data. This is the same as the previous two tutorials, however I have removed the specific game filter. We will be loading in all events, from all matches in the 19/20 WSL season:

##load competitions - this season WSL
Comp<-FreeCompetitions()%>%filter(competition_id==37, season_name=="2019/2020")

##load all matches
Matches<-FreeMatches(Comp)

##load all events
StatsBombData<-StatsBombFreeEvents(MatchesDF = Matches, Parallel = T)

##clean data
StatsBombData = allclean(StatsBombData)

Using all.data that was created earlier, we have the top 9 shot assist players so can now filter the StatsBomb events.




We now have a new dataframe, d1 and d2 which includes includes the 9 players earlier highlighted along with all passes that assist a shot, including goal assists. We can now plot this using the ggsoccer package. I use this a lot as it created in ggplot so is great for editing and over plotting. To draw the pitch:

ggplot()+
annotate_pitch(dimensions = pitch_statsbomb)

Will plot the pitch, however the grey plot theme remains - this can be removed by:

theme_pitch()

This will also adjust the aspect ratio.

You should have something that looks like:



We have the blank pitch, now to over plot the shot assists. I have just used geom_segment() to create the pass map, however we need to add 80- in front of  'y=' to reverse the axis within geom_segment(). This was done in the previous 2 tutorials using scale_y_reverese(), however implementing this with the ggsoccer package removes some of the pitch annotations...therefore I used the code:

ggplot()+
  annotate_pitch(dimensions = pitch_statsbomb)+
  theme_pitch()+
  geom_segment(data = d1, aes(x = location.x, y = 80-location.y, xend = pass.end_location.x, yend = 80-pass.end_location.y), alpha = 0.5, arrow = arrow(length = unit(0.08,"inches")))+
 geom_segment(data = d2, aes(x = location.x, y = 80-location.y, xend = pass.end_location.x, yend = 80-pass.end_location.y), alpha = 0.5, arrow = arrow(length = unit(0.08,"inches")))

With the output of:



This already tells us a fair bit, however we can further break this down by the 9 players we are looking at by using facet_wrap(~player.name). This essentially repeats the above plot for each player so we can view all alongside one another:




We now have all shot assist passes for the top 9 players (by shot assist p90) in the WSL. We can glean that many of Weir's shot assists come from corners, whilst Beckie and Groenen *appear* to come from open play situations. To further filter, we can add:

play_pattern.name == "Regular Play"

To the d1 and d2 filter to remove all set pieces, resulting in:



This changes things considerably as Weir's shot assists look primarily to come from set pieces, whilst Beckie, Groenen and Wullaert create from open play situations. Reiten has a specific zone on the left side that she utilises whilst Staniforth mainly creates outiside of the 18 yeard box whilst Beckie has decent volume from the final third delivering between the penalty spot and 6 yard box. We could further break this down by pitch locations where the passes start or end. Using the previous tutorial we could also add counts etc to add further information.

This is a quick overview of how to elevate scatter plots. Scatters are great for an initial overview, however further filtering into event data can allow us to differentiate between playing styles, how players create and the main locations in which that creativity comes from/to.

In the next tutorial I will use the y axis of the initial scatter, shots, to plot the shot locations of 9 players. Yeeha.

As always, feedback - good or bad is welcome.

The full code:


























Comments

  1. Thanks for this! Finding these posts real good fun to follow! Just a quick question... I seem to be struggling getting players names to be shown after the command:

    ggplot(all.data,aes(x=KP,y=Sh.90,label=Player))+...

    The graph and all the relevant points are created, but no label. Have you got any suggestions for a remedy?

    JB

    ReplyDelete
  2. Extraordinary message. I like to inspect this message considering I satisfied such a lot of brand-new authentic elements worrying it really. Indyjska-wiza-medyczna

    ReplyDelete

Post a Comment

Popular posts from this blog

Getting started in R with StatsBomb Data

As always, I should caveat that I'm not an expert either in football or programming...I started learning R in December and have gradually reached a 'mildly competent' level. This will go through installing R, loading the StatsBomb data, then plotting a pass map - something like this: Anyway, away we go. Thing number 1 - install R. There are two things to load...the R 'base' and Rstudio. You can download Rstudio here: https://rstudio.com/products/rstudio/download/ The first 3 minutes of the below shows the process: https://www.youtube.com/watch?v=BuaTLZyg0xs&list=PL6cDc8Xxld162nSsZ14bQnFn1cYStsrtk&index=2&t=0s That is now hopefully R loaded. Open Rstudio and you should be greeted with something like this: Press the arrow areas to reveal: Under the 'Packages' tab select 'install' and search 'devtools'..install package. Repeat the previous step however search 'tidyverse'. Next steps are to load in th

Shot Maps In R using StatsBomb Data

Im not sure if anyone is following these, but I will do one more and see what happens! I have covered some passing based stuff, I thought it might be useful to look into shots. Therefore, the rough plan for this piece: 1) Total player xG in the WSL for this season 2) Find the top 9 players based on xG 3) Plot all shots taken including xG 4) Add labels 5) Plot the shot map of the 9 players against one another As always, my coding is in the learning stage so this isn't a definitive way...just something that works for me and might help others! Anyway, load in this seasons WSL data as we have previously. We want to extract 3 things from the data - the number of shots, numbers of goals and total xG (initially including penalties) To start - tallying player shots: player_shots<-StatsBombData%>%   filter(type.name == "Shot")%>% ##filter all shots in StatsBombData   group_by(player.name)%>% ##group by player   tally(name = "total_shots"

Using Wyscout in R

It's pretty clear that within a football setting, clubs are largely using the same data. Most clubs will be using Wyscout/Instat...others may have access to StatsBomb and Metrica. None the less, data quality discussion aside, Wyscout is used predominantly to quickly gain an overview of players (both from a video and data perspective). This dovetails with people up-skilling through the lockdown, taking various courses and becoming increasingly proficient in languages such as R and Python. This is a big asset within football! Those that have read previously know that I am self teaching R and sharing any learnings that may be of interest around football analytics to others. By no means am I an authority on this, I've just found something that works, that might help others...I'm always happy to be corrected! Anyway, the aim is to: - Download Wyscout data - Import into R - Clean the headers - Re-format the data from "wide" to "long" format - Some e