Skip to main content

Searching for a Right Back - Part 2

Since first writing about Oxford's search for Right Back in January I've had a feeling that whilst the steps are logical, they could probably be better. Ram Srinivas outlined a famework on the Purefitbaw podcast that led to me creating the first piece, but also thinking how it could be improved.

I started researching further and found this piece - it relates to NBA but why not adapt and see if this can be applied to football also? On first reading, I didn't have a clue what was going on so attempted to break down each element and see if this had a logical, football implication. This lead me to Will Gurpinar-Morgan's 2+2=11 blog and initially presented a process at Opta in 2015 and further presented this year. (You should follow his work and watch his presentations!)


This could provide a blueprint within recruitment when sourcing players of a specific skillset to fulfil a specific role within the squad.

In Oxford's example Chris Cadden was a creative Right Back that was an important part of chance creation in the squad. His alternative, Sam Long, whilst being a decent squad player had differing attributes and as such, did (does) not represent a like for like replacement.

****I should add, I have been self teaching R for the last few months and this seemed like a fun project to get involved with whilst learning some more programming. As with all my stuff, it could be good, it could be awful, but I learnt stuff along the way and was a fun process****

Anyway - the interesting stuff.

Principal Component Analysis (PCA)

I won't go into the mechanics as most probably don't have a huge interest, but in short, it allows patterns within a data set to be found, whilst allowing us to take a variety of dimensions and plot on a single axis. Traditionally I have plotted multiple scatter plots.



Firstly, the red arrows appear grouped. Aerial Duels, Shots blocked etc are in one grouping with the passing metrics once more grouped. Taking the 80 players that have played >450 minutes in League One so far this season at LB/RB we get the basic plot. This provides a decent starting point, but to take it to the next level it would be useful to establish the differing styles of players within the data.

Cluster Analysis
Again, I won't go over the boring stuff but essentially from assessing the data there were 3 primary clusters.




Cluster 1 contains 31 players
Cluster 2 contains 28 players
Cluster 3 contains 21 players

On the above, Cadden is #8 with Sam Long being #59. First observation,  they both fall into cluster 3 (yet to know what that is!). This is where team style plays a role and should be a discriminatory factor from the start (highlighted in the first blog!). We are now starting to get a little closer to establishing some player similarity with an understand as to their playing style.

The next phase, establishing the dominant features of each cluster.

Cluster 1 - primary features = Shots blocked // Aerial Duels // Interceptions // Long Passes

Cluster 2 - primary features = Dribbles // Shots // Touches in Box // xG // Crosses

Cluster 3 - primary features = Passes // short/medium/forward passes // shot assists (key passes) // crosses // crosses to box // through balls


With the above information, we can now describe:
Cluster 1 = Defensive
Cluster 2 = Attacking/goal threat
Cluster 3 - Creative

Marvellous, now through the above process we have some clear playing styles whilst attributing players to those styles.

The next challenge was to visualise the individual players against one another. This still needs further work, however I have normalised all values against those within the data set and allows a quick overview comparison (check MrktInisghts for a more polished version!)

To take a player from each cluster:

Defensive

Harry Brockbank:


Attacking

Ryan Giles:


Creative

Brandon Haunstrup:



From the above, having a quick scan the initial clustering seems be pretty spot on. Obviously needs further investigation but will move beyond this and start looking into Cadden replacements implementing the above process (using League 1, League 2 and Scotland data).


For reference, the profile of Cadden:




Now the fun bit....finding players in the same cluster as Cadden (have already filtered out players such as Perry Ng that would cost serious £££)

Callum Brittain - MK Dons:



Stephen O'Donnell - Kilmarnock:



Lewie Coyle - Fleetwood:



Brad Halliday - Doncaster:



Tom James - Hibernian:




As always the above comes with the caveat that this is a filtering process that would lead to further traditional scouting processes. Interestingly, in my first piece Coyle was flagged for similarity...that obviously still holds having gone through the above process. Another interesting addition is that of Stephen O'Donnell who Oxford extensively attempted to bring in during the January window - a clear indication that the recruitment team and I are on the same page ;)

There will possibly be a part 3 to this using event data, going beyond the initial numbers and drilling down into the location of crosses, shots, shot assists etc (will just have to use Statsbomb WSL data instead!)










Comments

  1. Hi Mark,

    Great post - I am starting out my football analytics journey with a keen interest in the EFL. Would you be able to share which data source you use for the EFL and/or in this post specifically?

    Also do you have any recommendations for event data in the EFL?

    Keep up with the posts, they are really interesting!

    Thanks

    Ryan

    ReplyDelete
    Replies
    1. Hi Ryan,

      I got the above from Wyscout but event data for the EFL is pretty tricky to come by!

      I would suggest manually plotting data to start using: https://torvaney.github.io/projects/tracker

      I would intially start with shots but it gives you data to play with! Alternatively you can use the free StatsBomb data.

      Thanks,
      Mark

      Delete

Post a Comment

Popular posts from this blog

Getting started in R with StatsBomb Data

As always, I should caveat that I'm not an expert either in football or programming...I started learning R in December and have gradually reached a 'mildly competent' level. This will go through installing R, loading the StatsBomb data, then plotting a pass map - something like this: Anyway, away we go. Thing number 1 - install R. There are two things to load...the R 'base' and Rstudio. You can download Rstudio here: https://rstudio.com/products/rstudio/download/ The first 3 minutes of the below shows the process: https://www.youtube.com/watch?v=BuaTLZyg0xs&list=PL6cDc8Xxld162nSsZ14bQnFn1cYStsrtk&index=2&t=0s That is now hopefully R loaded. Open Rstudio and you should be greeted with something like this: Press the arrow areas to reveal: Under the 'Packages' tab select 'install' and search 'devtools'..install package. Repeat the previous step however search 'tidyverse'. Next steps are to load in th

Shot Maps In R using StatsBomb Data

Im not sure if anyone is following these, but I will do one more and see what happens! I have covered some passing based stuff, I thought it might be useful to look into shots. Therefore, the rough plan for this piece: 1) Total player xG in the WSL for this season 2) Find the top 9 players based on xG 3) Plot all shots taken including xG 4) Add labels 5) Plot the shot map of the 9 players against one another As always, my coding is in the learning stage so this isn't a definitive way...just something that works for me and might help others! Anyway, load in this seasons WSL data as we have previously. We want to extract 3 things from the data - the number of shots, numbers of goals and total xG (initially including penalties) To start - tallying player shots: player_shots<-StatsBombData%>%   filter(type.name == "Shot")%>% ##filter all shots in StatsBombData   group_by(player.name)%>% ##group by player   tally(name = "total_shots"

Using Wyscout in R

It's pretty clear that within a football setting, clubs are largely using the same data. Most clubs will be using Wyscout/Instat...others may have access to StatsBomb and Metrica. None the less, data quality discussion aside, Wyscout is used predominantly to quickly gain an overview of players (both from a video and data perspective). This dovetails with people up-skilling through the lockdown, taking various courses and becoming increasingly proficient in languages such as R and Python. This is a big asset within football! Those that have read previously know that I am self teaching R and sharing any learnings that may be of interest around football analytics to others. By no means am I an authority on this, I've just found something that works, that might help others...I'm always happy to be corrected! Anyway, the aim is to: - Download Wyscout data - Import into R - Clean the headers - Re-format the data from "wide" to "long" format - Some e