Skip to main content

Shot Maps In R using StatsBomb Data

Im not sure if anyone is following these, but I will do one more and see what happens!

I have covered some passing based stuff, I thought it might be useful to look into shots.

Therefore, the rough plan for this piece:
1) Total player xG in the WSL for this season
2) Find the top 9 players based on xG
3) Plot all shots taken including xG
4) Add labels
5) Plot the shot map of the 9 players against one another

As always, my coding is in the learning stage so this isn't a definitive way...just something that works for me and might help others!

Anyway, load in this seasons WSL data as we have previously.

We want to extract 3 things from the data - the number of shots, numbers of goals and total xG (initially including penalties)

To start - tallying player shots:

player_shots<-StatsBombData%>%
  filter(type.name == "Shot")%>% ##filter all shots in StatsBombData
  group_by(player.name)%>% ##group by player
  tally(name = "total_shots") ##tally

Once you've run the above you should have two columns - one with the player name, another with the total shots that player has taken. You essentially run the same as above for goals and xG:

player_goals<-StatsBombData%>%
  filter(shot.outcome.name == "Goal")%>% ##filter goals
  group_by(player.name)%>% ##group by player
  tally(name = "Goals") ##tally goals

player_xg<-StatsBombData%>%
  filter(type.name == "Shot")%>% ##filter shots
  group_by(player.name)%>% ##group by player name
  tally(shot.statsbomb_xg, name = "total_xg", sort = TRUE)%>% ##tally xG for each player and sort
  top_n(9) ##pick top 9 players

You should have something like below:

Next challenge, to combine the three tables into one and calculate xg_per_shot:

summary <- left_join(player_xg, player_shots, by = "player.name")%>% ##join player_xg and player_shots
  mutate(xg_per_shot = sprintf("%0.2f",total_xg/total_shots)) ##calculate xg_per_shot

Finally, left_join() the goals:

summary<-left_join(summary, player_goals, by = "player.name")

view(summary) and we should get:


Lovely stuff, this table already gives us some reasonable insight...Bremer with 10 goals off 5.3xG!

Now that we have the top 9 players by total xG plus some additional numbers, we can filter the original StatsBombData data frame by the 9 players and all shots they have taken:





As always, give the data frame a check and have a look at the player.name and type.name columns - hopefully nothing unexpected in there!

With the above completed, we can move on to the plotting. Again, using the ggsoccer package we can build a pitch from the ground up allow us to plot shots on top.

ggplot()+
  annotate_pitch(dimensions = pitch_statsbomb)+
  theme_pitch()+
  coord_flip(xlim = c(55, 120),
             ylim = c(-12, 105))

Equates to:




Good start...now to add all shots with geom_point():




We appear to have a couple of shots from corners....fun...or something has gone wrong! Will check this later.

We can adapt the size of the geom_point() by using the size function within the aes(). Size = shot.statsbomb_xg will result in:


This tells us what we know about xG...the closer you are to goal and the more central you are = higher xG. Unsurprisingly, the shot attempts from the corners don't score too highly on xG! Now to colour the points to highlight those shots that were goals:

colour = shot.outcome.name == "Goal"



Using scale_colour_manual we can alter the TRUE/FALSE legend:

scale_colour_manual(values = c("#ff4444", "#5e9a78"), labels = c("No-Goal", "Goal"), name = "Shot Outcome")

I have just picked random red and green colours to highlight no-goal or goal. You can obviously get playful altering this as you wish.

We can glean a decent amount of information from the above, where goals are scored from, the xG values of those goals, but would be handy to compare across the 9 players we filtered. Much like the passing, and an advantage of using ggsoccer we can use facet_wrap() to compare players:




This now gives us loads of insight. If you are a scout/coach you can establish some decent information...just from a quick scan:
- Both Kelly and Williams scored from corners
- Miedema and England are the stand out for attempts in the six yard box
- Kelly favours the inside right on the edge of the 18 yard box - pretty unsuccessfully

Etc etc!

Finally, to add some labels to give further information. You can use geom_text() or geom_label(), but you can use the 'summary' table we compiled earlier to plot this. Using geom_label():

geom_label(data=summary, size = 3, colour = "black", aes(x = 65, y=65, label = paste0("Goals: ", Goals)))

You can add any of the statistics that were calculated earlier and position wherever you like - mess about with it and see what works.

Finally, I will add labs() to rename the xG legend and add a title/subtitle, then you can go mad on the theme() setting to change colours, fonts etc!

The final plot should look like:





Hopefully this was a logical and sensible walk through of how to produce an xG shot map. As always, feel free to message me if you have any questions or queries! (Equally if I can do better - I'm still learning myself!)

To take this further you can remove penalties and create a NPxG shot map also!

The final code:





















































Comments

  1. Hello there, I tried using this StatsBomb package, and I'm using it for a project regarding the debate between Messi and Ronaldo, but for some reason, there is so little data for Ronaldo. I've made various attempts to solve the issue, but it just seems that the data simply does not include a lot of Real Madrid data (basically exclusively showing Barcelona data) and thus Ronaldo has basically no data. Do you happen to know why this is happening?

    ReplyDelete

Post a Comment

Popular posts from this blog

Getting started in R with StatsBomb Data

As always, I should caveat that I'm not an expert either in football or programming...I started learning R in December and have gradually reached a 'mildly competent' level. This will go through installing R, loading the StatsBomb data, then plotting a pass map - something like this: Anyway, away we go. Thing number 1 - install R. There are two things to load...the R 'base' and Rstudio. You can download Rstudio here: https://rstudio.com/products/rstudio/download/ The first 3 minutes of the below shows the process: https://www.youtube.com/watch?v=BuaTLZyg0xs&list=PL6cDc8Xxld162nSsZ14bQnFn1cYStsrtk&index=2&t=0s That is now hopefully R loaded. Open Rstudio and you should be greeted with something like this: Press the arrow areas to reveal: Under the 'Packages' tab select 'install' and search 'devtools'..install package. Repeat the previous step however search 'tidyverse'. Next steps are to load in th

Using Wyscout in R

It's pretty clear that within a football setting, clubs are largely using the same data. Most clubs will be using Wyscout/Instat...others may have access to StatsBomb and Metrica. None the less, data quality discussion aside, Wyscout is used predominantly to quickly gain an overview of players (both from a video and data perspective). This dovetails with people up-skilling through the lockdown, taking various courses and becoming increasingly proficient in languages such as R and Python. This is a big asset within football! Those that have read previously know that I am self teaching R and sharing any learnings that may be of interest around football analytics to others. By no means am I an authority on this, I've just found something that works, that might help others...I'm always happy to be corrected! Anyway, the aim is to: - Download Wyscout data - Import into R - Clean the headers - Re-format the data from "wide" to "long" format - Some e