Skip to main content

Getting started in R with StatsBomb Data

As always, I should caveat that I'm not an expert either in football or programming...I started learning R in December and have gradually reached a 'mildly competent' level.

This will go through installing R, loading the StatsBomb data, then plotting a pass map - something like this:



Anyway, away we go.

Thing number 1 - install R. There are two things to load...the R 'base' and Rstudio.

You can download Rstudio here:
https://rstudio.com/products/rstudio/download/

The first 3 minutes of the below shows the process:
https://www.youtube.com/watch?v=BuaTLZyg0xs&list=PL6cDc8Xxld162nSsZ14bQnFn1cYStsrtk&index=2&t=0s

That is now hopefully R loaded. Open Rstudio and you should be greeted with something like this:



Press the arrow areas to reveal:



Under the 'Packages' tab select 'install' and search 'devtools'..install package. Repeat the previous step however search 'tidyverse'.

Next steps are to load in the StatsBomb data and FCrSTATS pitch:

devtools::install_github("statsbomb/StatsBombR")
devtools::install_github("FCrSTATS/SBpitch")




If you check the packages, there should now be 'SBpitch' and 'StatsBombR'.

Now the fun stuff starts...

Load in the packages 'StatsBombR', 'tidyverse' and 'SBpitch'.


Once this is typed, hit 'run' to fully load. You do this after each section of code.

Now to start loading in the data...

Input:

Comp<-FreeCompetitions()

Again, after the 'Comp' code hit run and it should show in the window on the right.

This loads all the available StatsBomb competitions. By clicking 'Comp' you can view the competitions including the Messi data (competition_id = 11) and the FAWSL data (competition_id = 37, season_id = 42). We will use the FAWSL 2019/2020 data.


We now want to filter that competition data specifying the 'competition_id' and 'season_name'...

Comp<-FreeCompetitions()%>%filter(competition_id==37, season_name=="2019/2020")

Once again if you click 'Comp' on the right it should just display FAWSL 2019/20 season.

We now want to load all FAWSL matches -

Matches<-FreeMatches(Comp)

Click 'Matches' on the right and have a look around the match data:


Next, to load the free event data associated with the Matches:

StatsBombData<-StatsBombFreeEvents(MatchesDF = Matches, Parallel = T)

Click StatsBombData on the right and have a look. Lots of scary numbers but StatsBomb have created an 'allclean' function that makes it less scary:

StatsBombData = allclean(StatsBombData)

At this point, your screen should look like the following:


Now to filter the above for a single match:

d1<-StatsBombData%>%
  filter(match_id == 2275096, type.name == "Pass", team.name == "Arsenal WFC")

This essentially reads as:

 'get(<-) StatsBombData and filter (%>%filter) the match (match_id == 2275096) for passes (type.name == "Pass") associated with Arsenal (team.name == "Arsenal WFC") and assign to d1'.

Click d1 on the right and you should have all the Arsenal WFC event data associated with Arsenal WFC vs West Ham United LFC.

Yay.

Now to plot!

Lets get a pitch loaded. With the FCrSTATS 'SBpitch' package this is easy. Add:

create_Pitch()


Great - now to add the passes. The best way to think of this is simply as an elaborate scatter plot where you add layers. We have the base layer with the pitch, now to add where passes occurred:

geom_point(data = d1, aes(x = location.x, y = location.y))

This reads as:

 "create a point (geom_point) using the filtered match data (data=d1) and plot the x and y coordinate (aes(x = location.x, y = location.y))"

You should hopefully now have:



Woi oi. You have now plotted the start points of every Arsenal WFC pass vs West Ham United LFC. Go grab a biscuit and treat yourself.

Refreshed and fuelled lets build on this plot. Lets add some lines so we can establish where the pass started and ended:

geom_segment(data = d1, aes(x = location.x, y = location.y, xend = pass.end_location.x, yend = pass.end_location.y))

Again, this uses calls for a line (geom_segment) to be drawn using the match data (d1) from the x/y start point to the x/y end point.


Great! Final few steps...lets add some arrows so the pass direction is clearer:

arrow = arrow(length = unit(0.08,"inches"))


Looks decent however the arrows are pretty dominant and cover the pitch. You can use the 'alpha' command to adjust both the point and line transparency:

alpha = 0.5

You need to add this both the geom_point and geom_segment section - have a play with differing transparency from 0.1 - 0.9 and see what you like!

Final two steps! You can change the colour of the passes by adding the colour command:

colour = "red"

You can replace the 'red' with hex codes to customise further. Your full code should now look like this:




You should now have a lively plot, looking like this:



The y axis is incorrect on the create_pitch function...therefore if you plot the passes of a left back it will show up on the right.

To correct this you need to add:

scale_y_reverse()

You can add this after the geom_segment

Now, to finally add a title.

labs(title = "Arsenal WFC",
       subtitle = "vs West Ham United LFC")

I have removed the "red" command but your final plot should now look like this:


Bang! Final little tinker...you can filter the above further by specifying which player you which to see the passes of. If you click 'd1' and scroll across to the 'player.name' header and column. From here, you can select a player, I have chosen Leah Williamson. You can add this to you 'd1' function:

d1<-StatsBombData%>%
  filter(match_id == 2275096, type.name == "Pass", team.name == "Arsenal WFC", player.name == "Leah Williamson")

Run the plot once more and you should have:


Your final code should now be:




Hopefully this is helpful and has got you started on your R football coding journey! I'm a few months in but doing small amounts each day twinned with breaking things and trying to fix it seems to be how I progress fastest.

StatsBomb created a primer which is a must look to get a great overview: http://statsbomb.com/wp-content/uploads/2019/07/Using-StatsBomb-Data-In-R-English.pdf

If you need any help, let me know...I will do my best!






Comments

  1. Hi Mark!

    I follow you on Twitter and I am fairly new at R too. I like your heatmap you posted on March 31st. Do you mind sharing your code on that one?

    Best,

    Simon from Denmark.

    ReplyDelete

Post a Comment

Popular posts from this blog

Shot Maps In R using StatsBomb Data

Im not sure if anyone is following these, but I will do one more and see what happens!

I have covered some passing based stuff, I thought it might be useful to look into shots.

Therefore, the rough plan for this piece:
1) Total player xG in the WSL for this season
2) Find the top 9 players based on xG
3) Plot all shots taken including xG
4) Add labels
5) Plot the shot map of the 9 players against one another

As always, my coding is in the learning stage so this isn't a definitive way...just something that works for me and might help others!

Anyway, load in this seasons WSL data as we have previously.

We want to extract 3 things from the data - the number of shots, numbers of goals and total xG (initially including penalties)

To start - tallying player shots:

player_shots<-StatsBombData%>%
  filter(type.name == "Shot")%>% ##filter all shots in StatsBombData
  group_by(player.name)%>% ##group by player
  tally(name = "total_shots") ##tally

Once you…

Searching for a Right Back

Since the last blog I have listened to the @PureFitbaw podcast which features Ram Srnivas who works in player recruitment. He outlined a basic methodology around recruitment, the use of data and traditional scouting methods. I will look to apply this here (on a very basic level!)

Now Cadden has departed, Oxford are left with Sam Long. On the surface Long has different attributes to Cadden, however that doesn't diminish the impact Long could make (this will mainly look in League 1&2).

Firstly establishing Oxford's playing style.

(Some boring stuff: I normalised the team data for the individual metrics on a scale 0-1 - the deep purple = in a higher percentile compared to the other teams. For example, looking at the first 'Crosses P90' column, it is clear Rochdale are the lowest rank in the League. In comparison, Portsmouth are the highest.

In the PPDA (Passes Allowed Per Defensive Action) metric a lower rank = a high pressing team. Ipswich are the fore runners in this…