Sutter Kane wrote:
Mmmm I have a relavant question that tests formula vs hunch. Formula says Milner is the correct pick (Chelsea H) against Bale (W.Ham A) by over a ppg average - one of them will be benched this GW, my hunch says I should go Bale in my first xi, is this right?
hancockjr wrote:
Your (the) formulas presumably uses Milner's stats from AV last season, hence is useless.
When a player changes clubs, or changes his FPL position, obviously his potential is affected. So maybe last season's stats aren't worth much.
But we do have this season's stats (albeit for 6 weeks only)...
Bale (7.1m) 30 points from 540 minutes
Milner (9.1m) 30 points from 534 minutes
My formula is fairly easy to understand...
predicted score next match (FPL points) = 0.34*price + 41*historicalYield + 0.053*opposition + 1.01*venue + 0.04*form - 1.63
where:
price = the player's price we have
historicalYield = points per minute played
opposition = the rank of the opposing team...let's take my 1-20 ranking above: Bale v Villa (rank 6), Milner v Newcastle (rank 15)
venue = home advantage: Bale at home = 1, Milner at home = 1
form = points scored in last game: Bale 2, Milner 3
-1.63 = the regression intercept (don't worry about it, just subtract it)
Thus...
Bale, predicted score next match (FPL points) = 0.34*7.1 + 41*(30/540) + 0.053*6 + 1.01*1 + 0.04*2 - 1.63 =
4.5Milner, predicted score next match (FPL points) = 0.34*9.1 + 41*(30/534) + 0.053*15 + 1.01*1 + 0.04*3 -
5.7Thus my formula predicts that Milner will score more points this weekend. Note that, for this particular comparison, the factors historicalYield, venue, and form are nearly the same or else trivial.
Sutter Kane wrote:
All ideas in this whole thread have been insufficiently rigorous because it's near impossible to be rigorous enough. But Wyld can maybe 'prove' his formulas worth by explaining why Milner will ouscore Bale this week (or correcting me in a calculation error)...
My formula says that the main difference between these two players, in this instance, is their price. Simply by being a more expensive player, Milner is expected to score more points.
Along with this, Milner also has a slight advantage this week in facing weaker competition.
hancockjr wrote:
Wyld - did you consider % ownership as a variable? This would represent how much people agree with the price.
Also "ranking" teams 1-20 is silly - they are not spaced like that in reality - better use spread "season points" prices
Home advantage matters more to poor teams than good - Che will get 60% of points from home games, strugglers closer to 70% maybe more.
Ownership: no I didn't consider that. That is an interesting suggestion. Where is the data?
Ranking: the 1-20 ranking is, I think, good enough. Of course the
order in which you rank the teams is important. I have toyed with the idea of using bookies tables for this.
Home advantage: please show the data which support your contention.
Sutter Kane wrote:
...Anyway... if you asked me which of two players was going to make a higher score this week, and provided me with the formula (from above in the thread), I would have a set of coefficients based on (I think) least-squares multiple linear regression (and let's assume that each of the explanatory variables included in the model meets the distributional requirements of this analysis). Together with the relevant data for each player, I am now able to calculate a point estimate for each player's score. But this isn't really enough, because even if the true values were identical, it is most unlikely that the estimates (which is what I have) would be. So I need to calculate whether the difference between the estimates is sufficiently large that I can say with a stated (small) probability (say, 5%) that this difference could not have arisen by chance alone (the true values being identical). Now I have a real problem, because the data on which the estimates were based were time sequences so we must recognise the possibility of internal serial correlations. Having used least-squares multiple regression to obtain the model coefficients, I can't now calculate valid estimates for the standard errors of the coefficients, so have no way of calculating a valid test of the difference between the two estimates. Neither can we calculate a valid prediction interval for the predicted score of each player. At this stage (being honest) I return your consultancy fee to you and tell you that your question can't be answered with what you have.
hancockjr wrote:
No (to SK) - his formula assumes a linear relationship between the variables, which it is not.
Bale is better value than Milner - As shown by % ownership, but formula assumes price is correct
The test of my formula should simply be this: does it predict the score of a player (over a reasonable number of games) significantly better than chance? If it does, it has some worth. (Wasn't anyone impressed, by the way, that it spat out some reasonable looking values for the points each player is likely to score?)
In fact I will go further and say that I believe that my formula is the most accurate formula for predicting a player's FPL score next gameweek ever posted on FISO. I challenge everyone and anyone to come up with a more accurate predictive formula. I'll go even further than that and say that I believe my formula will do better even than a human being who just says what he
thinks the player's score will be based on his knowledge of the game.
(Again, of course, based several players over a reasonable number of games.)
Least-squares multiple regression is quite a robust statistical method, and works well even when the data is a liitle skewed. I took the trouble to make sure that my variables were independent.
The Woolster wrote:
Whilst I do love a good multiple regression, and without wanting to piss on anyone's parade as some value can be gained from this type of thing, but my memory from statistics (which was gained sat at the back of the lecture theatre whilst half asleep) is that a regression with an adjusted R Square of 0.17 is a not a very good model for predicting future out comes. Is my memory correct?
For this kind of model, an adjusted R Square of 0.17 is neither high nor low. It means that my formula accounts for 17% of the variation in player scores, or to put it another way it
doesn't account for 83% of the variation. An important question to ask might be, what proportion do we actually expect luck to play? Some weeks a player scores 1 point and other weeks he scores 13.
cincirollers wrote:
Although I have no idea at how Wyld got to his formula, the one thing I did notice is that he uses form as the player's last score. This is a player's form only if they are playing under similar conditions (home/away, strength of opponent & position). Is Bale's prior week's points away to Chelsea as a left back his form for an upcoming match at home against Wigan & playing outside mid? (Anyone else notice how Etuhu's shots went from 2/game to 0 when moved from left to right mid?).
That's a good point about the weak measure for form that I used. Do you have any data to back up the contention that a player's form over, say, the last two games or four correlate with his actual score?
By the way, I built the Packing Solver.