When I first started to learn R, after 4/5 weeks I decided to answer a recruitment based question concerning Oxford United and the right back position. This lead me to creating a piece utilising Principal Component Analysis (PCA) at a very basic level, to see if there is a quick and efficient way to categorise and analyse player styles.
Can this then form the basis of an indicator highlighting those players with similar playing styles and such, play a role in replacing players/finding players to fit a specific system?
My original piece is here. Its always weird to read stuff back, but I will try to build on this! There is a quick and brief explanation into PCA there along with a few other links to PCA within football.
Since I produced the above, Mark Carey has done some great work applying PCA to midfielders in the top 5 leagues. This is an area that has aways intrigued me, however after some limited work in professional football I'm certain PCA can play a large role in guiding recruitment (certainly as an initial step!).
To check my process I thought it would be best to replicate some of Mark's work - similar output = I'm doing something broadly right!
Sorry for the basic boy excel but wanted to quickly get this down. Similarly to Mark, I found 5 components for central midfielders that have played >500 minutes in 2019/2020. The naming of these components could be altered, however from a quick overview there is some good crossover to Mark's findings. At this point, it's probably handy to point out I have just thrown in Wyscout data whilst Mark used the StatsBomb powered fbref.
As my process has done a pretty reasonable job, I will look to apply this to 'second' leagues across the top 5 leagues in Europe. This could be applied to any leagues where data is available along with all positions on the pitch. The process that follows could easily be applied to wingers or defenders.
For the purpose of this blog the leagues included were:
- England - Championship
- Spain - Segunda
- France - Ligue 2
- Italy - Serie B
- Germany - Buli.2
The above amounts to 632 players. These players satisfy the criteria:
- Position is Centre Midfield (as deemed by Wyscout) - there will be some positional anomalies
- Have played >500minutes in 2019/2020
- Play in the above leagues
The metrics used (again, sorry for the basic table!):
All performance metrics are P90. The first column made up of filters that will be applied later in the process. I avoided xA/xG etc as wanted to find player style and felt these are more a by-product (am happy to be corrected though!)
Lovely job, away we go. I have scaled all the data prior to performing the PCA to avoid any weird weightings...lets check the result:
Ah look, another excel table. Anyway, we can establish those features dominant within each style.
Primary metrics within each style:
Creator:
Passes to penalty area
Deep completions
Key passes
Through passes
Crosses
Engine:
Passes
Short/medium passes
Lateral passes
Forward passes
Progressive passes
Carrier:
Offensive duels
Dribbles
Progressive runs
Playmaker:
Defensive duels
Interceptions
*Long passes
*Forward passes
*Progressive passes
Defensive:
Aerial duels
Shots blocked
Interceptions
The 4th style, playmaker, is a weird mix that probably needs further investigation as only defensive duels really impact, however thereafter expansive passing (partly) correlates. Will leave as is and see what happens!
Now we have the 5 styles and know the metrics that contribute, we can investigate which style players fall into. Assessing all players, the top 20 in each category:
A few notes on the above. I have ranked the players based on the 600+ players in the data set - therefore Hernandez is the top creator and midfield engine (passer) compared to all other players across the 5 leagues analysed. If a player crops up in two styles they are probably worth looking at!
The above, essentially creates a 20 man short list to look into if looking for a specific style. Obviously this only makes up an initial filter process but gives a good indication as to a player style and their strengths.
To take this further, a data driven club will probably be looking for players with resale value. As such, the above lists can be filtered via age, market value and contract expiry. I will simply filter by age....lets go 24 or under and have played over 900 minutes (this reduces the number of players to 195):
We are starting to pick up some decent young talent here including:
- D'Arpino (crops up in 3 of the styles!)
- Frattesi linked with Everton
- Gueye who appears to be moving to Watford
- Fein looks to be returning to the Bayern first team this summer
- Julien Ponceau linked with Swansea and Sevilla
- Samuele Ricci currently linked with Napoli
The PCA appears to have picked out some players of pedigree...always a good sign!
The playmaker dimension is a strange one with Krystian Bielik cropping up. Looking back to the contributory factors defensive duels and interceptions have a reasonable influence along with progressive passing metrics (to a lesser extent!). Whilst PCA does a good job of assigning players to a playing style, some will be mis-placed.
Finally, to take a final step, we can validate some of the findings by looking into the individual metrics of a player, creating a (very basic!) dashboard.
For example Tommaso Pobega ranks second in the Defensive midfielder style overall, and first amongst 24&U over 900minutes. Looking back we would expect Pobega to rank highly in aerial duels, interceptions and blocked shots (as these are strongly correlated with the playing style)...
Nice. Turns out Pobega is on loan from AC Milan...here he is with some lo-fi backing track - https://www.youtube.com/watch?v=butbGPKJAa0
To double check this we can look at other styles, such as creator. We would expect to see high rankings for:
Passes to penalty area
Deep completions
Key passes
Through passes
Crosses
An example: Lilian Egloff (ranks 4th overall and 2nd for the U24s):
In all of the influential creative metrics Egloff ranks in the top 25%...pretty encouraging for a 17 year old! Looking into this further, Wyscout has used Egloff's Stuttgart U19 minutes alongside his (25minutes in Buli. 2), therefore a majority of this is based on youth football. Oh. None the less, probably one to keep an eye on!
We can take this several steps further but I will leave it there!
As always, this makes up just a small part of recruitment, but at the very least can inform decisions as to the style of player you are scouting. This can be used to validate video and live scouting, or to flag players that weren't otherwise on the radar. By applying a variety of filters you can find those players that fit within the club model, limiting errors in the transfer market, potentially finding talent early. Match the above with team playing styles (you could perform PCA on clubs) and this forms the base of a powerful recruitment tool. This can be applied to all positions - I will look to perform a similar analysis solving a specific recruitment problem.
If you have any feedback, just let me know! Am always happy to help/receive guidance or criticism!
Can this then form the basis of an indicator highlighting those players with similar playing styles and such, play a role in replacing players/finding players to fit a specific system?
My original piece is here. Its always weird to read stuff back, but I will try to build on this! There is a quick and brief explanation into PCA there along with a few other links to PCA within football.
Since I produced the above, Mark Carey has done some great work applying PCA to midfielders in the top 5 leagues. This is an area that has aways intrigued me, however after some limited work in professional football I'm certain PCA can play a large role in guiding recruitment (certainly as an initial step!).
To check my process I thought it would be best to replicate some of Mark's work - similar output = I'm doing something broadly right!
Sorry for the basic boy excel but wanted to quickly get this down. Similarly to Mark, I found 5 components for central midfielders that have played >500 minutes in 2019/2020. The naming of these components could be altered, however from a quick overview there is some good crossover to Mark's findings. At this point, it's probably handy to point out I have just thrown in Wyscout data whilst Mark used the StatsBomb powered fbref.
As my process has done a pretty reasonable job, I will look to apply this to 'second' leagues across the top 5 leagues in Europe. This could be applied to any leagues where data is available along with all positions on the pitch. The process that follows could easily be applied to wingers or defenders.
For the purpose of this blog the leagues included were:
- England - Championship
- Spain - Segunda
- France - Ligue 2
- Italy - Serie B
- Germany - Buli.2
The above amounts to 632 players. These players satisfy the criteria:
- Position is Centre Midfield (as deemed by Wyscout) - there will be some positional anomalies
- Have played >500minutes in 2019/2020
- Play in the above leagues
The metrics used (again, sorry for the basic table!):
All performance metrics are P90. The first column made up of filters that will be applied later in the process. I avoided xA/xG etc as wanted to find player style and felt these are more a by-product (am happy to be corrected though!)
Lovely job, away we go. I have scaled all the data prior to performing the PCA to avoid any weird weightings...lets check the result:
Ah look, another excel table. Anyway, we can establish those features dominant within each style.
Primary metrics within each style:
Creator:
Passes to penalty area
Deep completions
Key passes
Through passes
Crosses
Engine:
Passes
Short/medium passes
Lateral passes
Forward passes
Progressive passes
Carrier:
Offensive duels
Dribbles
Progressive runs
Playmaker:
Defensive duels
Interceptions
*Long passes
*Forward passes
*Progressive passes
Defensive:
Aerial duels
Shots blocked
Interceptions
The 4th style, playmaker, is a weird mix that probably needs further investigation as only defensive duels really impact, however thereafter expansive passing (partly) correlates. Will leave as is and see what happens!
Now we have the 5 styles and know the metrics that contribute, we can investigate which style players fall into. Assessing all players, the top 20 in each category:
A few notes on the above. I have ranked the players based on the 600+ players in the data set - therefore Hernandez is the top creator and midfield engine (passer) compared to all other players across the 5 leagues analysed. If a player crops up in two styles they are probably worth looking at!
The above, essentially creates a 20 man short list to look into if looking for a specific style. Obviously this only makes up an initial filter process but gives a good indication as to a player style and their strengths.
To take this further, a data driven club will probably be looking for players with resale value. As such, the above lists can be filtered via age, market value and contract expiry. I will simply filter by age....lets go 24 or under and have played over 900 minutes (this reduces the number of players to 195):
We are starting to pick up some decent young talent here including:
- D'Arpino (crops up in 3 of the styles!)
- Frattesi linked with Everton
- Gueye who appears to be moving to Watford
- Fein looks to be returning to the Bayern first team this summer
- Julien Ponceau linked with Swansea and Sevilla
- Samuele Ricci currently linked with Napoli
The PCA appears to have picked out some players of pedigree...always a good sign!
The playmaker dimension is a strange one with Krystian Bielik cropping up. Looking back to the contributory factors defensive duels and interceptions have a reasonable influence along with progressive passing metrics (to a lesser extent!). Whilst PCA does a good job of assigning players to a playing style, some will be mis-placed.
Finally, to take a final step, we can validate some of the findings by looking into the individual metrics of a player, creating a (very basic!) dashboard.
For example Tommaso Pobega ranks second in the Defensive midfielder style overall, and first amongst 24&U over 900minutes. Looking back we would expect Pobega to rank highly in aerial duels, interceptions and blocked shots (as these are strongly correlated with the playing style)...
Nice. Turns out Pobega is on loan from AC Milan...here he is with some lo-fi backing track - https://www.youtube.com/watch?v=butbGPKJAa0
To double check this we can look at other styles, such as creator. We would expect to see high rankings for:
Passes to penalty area
Deep completions
Key passes
Through passes
Crosses
An example: Lilian Egloff (ranks 4th overall and 2nd for the U24s):
In all of the influential creative metrics Egloff ranks in the top 25%...pretty encouraging for a 17 year old! Looking into this further, Wyscout has used Egloff's Stuttgart U19 minutes alongside his (25minutes in Buli. 2), therefore a majority of this is based on youth football. Oh. None the less, probably one to keep an eye on!
We can take this several steps further but I will leave it there!
As always, this makes up just a small part of recruitment, but at the very least can inform decisions as to the style of player you are scouting. This can be used to validate video and live scouting, or to flag players that weren't otherwise on the radar. By applying a variety of filters you can find those players that fit within the club model, limiting errors in the transfer market, potentially finding talent early. Match the above with team playing styles (you could perform PCA on clubs) and this forms the base of a powerful recruitment tool. This can be applied to all positions - I will look to perform a similar analysis solving a specific recruitment problem.
If you have any feedback, just let me know! Am always happy to help/receive guidance or criticism!
Hi Mark,
ReplyDeleteFantastic analysis!
How did you generate the ranking value for the x axis?
Many thanks.
Principal Component Analysis (PCA) is applied in data-driven recruitment to identify and reduce the dimensionality of candidate attributes, improving the accuracy of candidate matching and selection.
ReplyDeleteBy transforming and analyzing large datasets, PCA helps recruiters uncover hidden patterns and correlations among candidate characteristics, enhancing decision-making in talent acquisition.
The application of PCA in recruitment facilitates efficient screening processes, optimizing resource allocation and promoting more informed hiring decisions based on comprehensive data analysis.
Best Recruitment Agency In Pakistan