
Horse Racing, greyhounds and snooker specialist with thirty years experience of writing about sport across multiple platforms. A QPR and Snooker fan
Statistics and Trends
Well, it is the time of year when I break left from the charging herd and run towards the same target from a different direction.
In the wilds of the betting jungle, maybe that leaves me vulnerable to a predator. Maybe, I am the predator.
So what excites and infuriates me at this rich and fertile time in the racing calendar?
Two different words judged largely to have the same meaning; ‘Stats' and ‘Trends'.
On OLBG there is an incredibly long article on Stats and Trends in the Systems and Statistics section, so I know many disagree with me.
Now those who know me may want to stop reading as this is a re-hash of articles past but as I say, this is the time of year I take a look at such things.
Of course, we all know what is meant by these terms but no harm in asking the Oxford English Dictionary to remind us:
Statistics: The collection and analysis of large amounts of information shown in numbers, a fact or piece of data obtained from a study of statistics.
Trends: A general direction in which something is developing or changing.
Cheltenham Trends
I suggest that most peddlers of Cheltenham stats have been doing very little analysis on what is almost certainly not a large amount of data.
Trends are clearly quite different from stats and it is possible to find a trend with some meaning but a simple 8/10 result does not constitute a trend.
If I pull out my magnifying glass and examine the crime scene:
A good starting point is to re-iterate something I have written many times before.
Stats and trends are utterly meaningless at best and downright misleading at worst unless compared against the ‘expected probability'.
It really is a crime to present a stat without a qualifier, so when you read a statistic on a website or in the press, ignore it unless it is qualified.
If I have landed safely on the trampoline 8 times out of 10 when jumping out of my bedroom after one beer too many, that would be a dreadful stat but if I say something is 8/10, it sounds positive.
When reading the racing press on a Saturday, I will read that the big 25-runner handicap is a poor or even dreadful race for favourites, with only one winning favourite in the past 10-years.
Pretty much universally, the expectation in these races would be one-point-something winning favourites every 10-years.
So why do we consider that a 1/10 stat is dreadful? Because the author implies it so we tend to accept it.
The question has to be two-fold:
What should have happened?
What did happen?
In the previous NH season, the Racing Post made great play of the fact that Paul Nicholls had won a particular novice chase 5-times in the past 7-years.
This was delivered with the clear intention of indicating a strong pointer to him winning the race this time around.
Is there anything wrong with that?
Harmless, if lazy journalism some might say but a couple of things are not in order here.
i) Why 7-years? Why not 4, 6 or the commonly used 10-year trend?
Very simply because 7-years suited the author as he chose a window in the data stream that best illustrated something which may not have really been there.
In actual fact, the trend was 5 wins in the past 9 years, or 6 wins in the past 10 years,
This is a classic case of manipulating the data set to achieve a preconceived result.
ii) Whilst it was clearly a fair achievement by Mr Nicholls to provide 5 of the last 7 winners, from a punting point of view, this information is largely worthless.
Let's suppose the 5-winners were all un-fancied and returned a huge LSP over the 7-year window.
Would that be any more interesting than if all 5 winners were returned odds-on? Yes.
I think it fair to say it would be but by crunching the numbers; taking the SP as a percentage of the book of all Nicholls entries (often 2 or more in the race), the mathematical expectation across this 7-year period was that the stable should have had more than 4-winners and less than 5.
So, 5-winners is actually nothing more than market expectations and so portraying this trend as a positive is misleading at best.
Returning to the comment above about manipulating the data set, the most common example we see of this at Cheltenham time is ‘the last 10-winners all ran between 21 & 56-days before the race' or 9/10 had been off the track for no more than x days.
Dear oh dear, how easy is it to draw a window around the last 10-years of data and say ‘all 10-winners were somewhere between the lowest value and the highest value?'
A quick Google search on Cheltenham stats immediately pulled up a bunch of stats for the Supreme Novices using 10-years, 13, 14 & 16 years so the ‘artist' is drawing round the data set to best suit his needs for each individual parameter.
Politicians will understand all about this, manipulating a data set to portray the point they want to make, even though the reality might be somewhat different to a neutral pair of eyes.
Other classic examples are ‘no 11-y-o has won this big handicap for 20-years' where only 3 have tried and all were 33/1+ and ‘9/10 were first or second last time out' in a race like the Supreme Novices, where about 97% of the book each year meets this criteria.
Finally, 10/10 is one thing but what about 9/10 or 8/10?
Well, if 8/10 horses made all in a race, that might be interesting.
If 8/10 carried less weight than a value I pick to suit my needs, so 8/10 carried less than 11st 2lb (the other 2 carried 11st 7lb and 11st 10lb), then that is clearly pretty worthless as a trend.
Again, it is quite easy to work out the expected number of winners below or above a certain weight but you won't see that done anywhere unless I do it myself!
Stats and trends must be measured against expected probability' if they are to be of any value to the punter at all.
One other point, the concept of data shift is also usually ignored, even though this is where true trends might be found.
Over time, things change and there is a natural shift of data.
With horse racing, such shifts might be the age of racehorses.
Not so long ago, a NH horse may not have been seen in a bumper until the age of 6, never mind jump a fence and so we might expect a shift downwards in the average age of winners.
Go back far enough and the ground might have been on the firm side whereas now, that is unlikely so changing ground conditions might cause data shifts in areas like weight carried.
Another very important point is when betting your opinion is worth nothing.
How your opinion compares with the opinion of others is everything.
So whilst punters cross off a horse because it finished third last time and the stats say 9/10 finished first or second prior to the race, the savvy punter is starting off with a mocking view.
Value is likely to open up where the least amount of eyes and minds are trained.
Conclusion
In conclusion, I feel that the significance of many stats and trends is massively overstated and adopted by those looking to 'shortcut' the hard work needed to be successful as a punter.
I have been there, believe me, nothing better than being able to cross off half the field or more because someone somewhere says 9/10 winners carried less than 11st.
Is it sensible to blindly put a pen through those carrying 11st or more? I suggest it is not.
If time allows, I will be looking at a few of the festival races and analysing the maths behind the stats with you a view to highlighting which may be significant and which may not.
The OLBG betting school has a detailed section on statistics and trends which may assist you alongside my blog.