Skip to main content

Posts

Shot data for the top 5 European leagues - 2019/2020

Not to sound too nostalgic, but when i first started messing about with football event data I was manually plotting Oxford shot locations and visualising in Tableau. Access to to data was limited, however we currently have more access to data than has ever previously been available. A few resources: - Statsbomb (covered in previous posts) - Canadian Premier League (yet to dive in but a super interesting league and regularly updated data) - Wyscout (top 5 european leagues 2017-2018) - UnderstatR (will use this in this post) This tweet from Sushruta Nandy prompted me to write this. Anyway, the plan: - Load up Tidverse and UnderstatR - Run 2019 team/player data  - Extract 2019 x/y shot locations - Save to .csv Here we go then.... 1) Load in Tidverse and UnderstatR 2) Set the working directory (will need this later when saving the data to .csv - just select the path you wish for the file to be saved to) At this point, you should have something pretty straight forward: Right, lets dig
Recent posts

Application of PCA in data driven recruitment

When I first started to learn R, after 4/5 weeks I decided to answer a recruitment based question concerning Oxford United and the right back position. This lead me to creating a piece utilising Principal Component Analysis (PCA) at a very basic level, to see if there is a quick and efficient way to categorise and analyse player styles. Can this then form the basis of an indicator highlighting those players with similar playing styles and such, play a role in replacing players/finding players to fit a specific system? My original piece is here . Its always weird to read stuff back, but I will try to build on this! There is a quick and brief explanation into PCA there along with a few other links to PCA within football. Since I produced the above, Mark Carey has done some great work applying PCA to midfielders in the top 5 leagues. This is an area that has aways intrigued me, however after some limited work in professional football I'm certain PCA can play a large role in gui

Using Wyscout in R

It's pretty clear that within a football setting, clubs are largely using the same data. Most clubs will be using Wyscout/Instat...others may have access to StatsBomb and Metrica. None the less, data quality discussion aside, Wyscout is used predominantly to quickly gain an overview of players (both from a video and data perspective). This dovetails with people up-skilling through the lockdown, taking various courses and becoming increasingly proficient in languages such as R and Python. This is a big asset within football! Those that have read previously know that I am self teaching R and sharing any learnings that may be of interest around football analytics to others. By no means am I an authority on this, I've just found something that works, that might help others...I'm always happy to be corrected! Anyway, the aim is to: - Download Wyscout data - Import into R - Clean the headers - Re-format the data from "wide" to "long" format - Some e

Shot Maps In R using StatsBomb Data

Im not sure if anyone is following these, but I will do one more and see what happens! I have covered some passing based stuff, I thought it might be useful to look into shots. Therefore, the rough plan for this piece: 1) Total player xG in the WSL for this season 2) Find the top 9 players based on xG 3) Plot all shots taken including xG 4) Add labels 5) Plot the shot map of the 9 players against one another As always, my coding is in the learning stage so this isn't a definitive way...just something that works for me and might help others! Anyway, load in this seasons WSL data as we have previously. We want to extract 3 things from the data - the number of shots, numbers of goals and total xG (initially including penalties) To start - tallying player shots: player_shots<-StatsBombData%>%   filter(type.name == "Shot")%>% ##filter all shots in StatsBombData   group_by(player.name)%>% ##group by player   tally(name = "total_shots"

Using StatsBomb data - part 3

I have more time on my hands so thought it would be good fun to show some steps from creating creating scatters to plotting event data to show those that show up favourably in initial searches. The question that I posed myself was: "How can we go beyond scatters to learn more about a players output?" There has been an increase in scatter plots to show players that excel, however they don't tell the whole story. As always, I'm still learning code myself so I'm sure there is code in here that will upset experienced coders! Sorry! Anyway, a rough idea of how this will look - - download data from fbref.com - plot a scatter using the data - filter the top 9 performers for one parameter - plot the event data - compare the player outputs Lets goooooooooooo Lets fire up: library(StatsBombR) library(tidyverse) library(ggsoccer) library(ggrepel) We could create the P90 data for the WSL ourselves by using the StatsBomb data, however I'm being l