“Timeo Danaos et dona ferentes” – Beware the Greeks bearing gifts
– Virgil, Aeneid (II, 49)
The more articles I begin with a Latin aphorism, the more I can feel myself slowly starting to recoup my expensive ancient history degree – one shoehorned semi-relevant Virgil quote at a time. I’ll try not to overdo it.
Back in the modern day, I deal with plenty of investors as part of my day job: retail investors, fund managers, asset allocators, financial advisers, investment consultants – and am often surprised at the number of people I see invest on the basis of one graph.
The simple performance graph is incredibly powerful. But just so we’re on the same page right from the start, when I mention performance graphs during the article, this is what I mean:
As its core, it’s just two lines – one showing the performance of a fund/portfolio, and one showing the benchmark. Nothing more complicated than that.
Although these graphs can make an investment look attractive at first glance, just like the Greeks’ Trojan Horse, they can hide all sorts of nasty surprises.
I can understand why retail investors might choose to base their decisions on a performance chart – It’s hard to know what to look for if you’re an inexperienced investor, after all. But the number of experienced and qualified professional investors using these graphs as the basis for their investment decision surprises me. Very few investors interrogate the data behind the charts with any sort of probity.
This post is my attempt at demonstrating a few ways in which performance graphs can be manipulated to tell a story, as well as exploring a few things investors should bear in mind when making decisions based on past performance alone.
It’s worth stating up front that it’s not my intent to portray every investment/fund manager as a morally bankrupt huckster intent on misleading you into handing over your money. Most managers I’ve met have been good people genuinely trying to do their best for their clients. But when you combine vast amounts of informational asymmetry between seller and buyer with incentive-based pay and bonuses for the seller, it’s no surprise to see many managers frolicking in the grey areas of investment advice.
All I’m trying to do is even the playing field a bit.
Contents
Chart crimes
Having touted my own experiences with mostly ethical managers, I have, however, had plenty of dealings with those operating in the darker shades of grey. Manipulating performance graphs is probably on the more vanilla end of ethical grey-ness, but still, it’s a common tool for the more nefarious managers out there.
So let’s start by taking a look at how outperformance graphs can be manipulated.
As a demonstration, I’ve created a fictional fund with a fictional set of returns for both the portfolio and the benchmark. Let’s call it the Occam Capital Management Balanced Fund (OCMBF).
The OCMBF has a 3 year performance history, and its performance graph looks like this:
That’s a pretty good-looking performance graph.
If I were the OCMBF manager, I’d be pretty happy with that. But we can do better.
By changing the scale on the Y-axis, we can turn that mediocre outperformance into something that looks much more impressive:
That looks much more convincing. Comparing the two graphs, this one certainly looks like the fund has done a better job at outperforming.
But the data used for this graph is exactly the same – the only difference is that the Y-axis starts at £100 rather than £0.
Given the quickest way to judge outperformance on a graph is to look at the space between the two lines, the smaller the range of Y-axis is, the larger the outperformance looks. Obviously I’ve made it extreme here in order to prove a point (I’ve never seen a ‘Growth of £100’ chart actually start at £100), but it’s not uncommon to see truncated Y-axes to try and maximise apparent outperformance.
I don’t know why anyone would want to do it, but it works the other way, too.
By extending the Y-axis, you can make the same level of outperformance look incredibly pedestrian. Nobody at first glance would want to invest in anything with returns this boring:
But again, it’s the same data. The Y-axis has just been expanded.
It’s possible to do something similar with the X-axis.
If you remember, the OCMBF 3 year returns looked like this:
If this were the only graph you were shown, you’d be pretty impressed. That’s a decent amount of outperformance.
But what you can’t tell from the graph is that the fund has been in existence for longer than 3 years.
If we look at returns over the last 5 years, the story looks quite different:
Not looking so good now.
And what the 5 year graph doesn’t tell you is that the fund has actually been in existence for 10 years:
The fund has gone from looking excellent, to looking average, to looking terrible – all based on the time period.
Obviously this is an extreme example (I’ve created the fictional returns to prove a point, after all), but it’s not uncommon for a fund pitch to show visually the most impressive time period they have. The graph may show 5-year returns versus benchmark, when the 10-year returns are poor. Or vice-versa – they’ll show you 10-year returns when the most recent 5 year returns have been poor. Or they’ll show you 3 years, or since inception, or really over whatever time period they feel like.
When you’re looking at an outperformance graph (or really, any graph – the points are equally valid regardless of what’s being displayed) make sure to check the axes. Have the scales been cherry-picked? And if so, how can you find the data which isn’t being displayed?
Of course, the full performance figures will be presented somewhere else in the pitch or on the factsheet, but they’ll usually be in a less visual table. Pictures always speak louder than words, and a well-chosen time period displayed graphically can do a great job at making up for lacklustre performance elsewhere.
While we’re on the subject of performance tables, it’s worth taking a slight diversion at the end of this section to touch on one way performance tables can be displayed to ‘tell a story’.
When performance numbers are displayed using tables, rather than charts, it’s always worth considering whether you’re looking at rolling or discrete performance. Rolling performance is performance looking back from now – i.e. 1 year/3 year/5 year performance figures. Discrete performance is during a discrete period – say, the year 2020.
Be careful when you’re only presented with rolling figures. Because recent strong outperformance affects every rolling performance statistic, a recent period of outperformance can mask significant discrete periods of underperformance.
For example, if you had a heroic year of outperformance last year, that’ll make your 1 year, 3 year, 5 year, 10 year, and since inception performance numbers all look exceptional – even if your performance in previous years was terrible.
These are the rolling performance figures for a different fictional fund – the Occam Capital Management Aggressive Fund (OCMAF). Returns greater than 1 year are annualised:
Based on these numbers, the fund looks fantastic. The fund outperformed over every rolling period. Looks like a good investment.
But if we break that data down into calendar year returns over the last 10 years, it tells a different story:
Now the fund doesn’t look incredible, in fact it looks pretty horrendous. The huge outperformance in 2020 masks 9 successive years of underperformance. Not only underperformance, but negative absolute returns every year compared to positive returns every year for the benchmark.
The first table looks excellent, the second looks awful – but the data used is exactly the same, it’s just presented differently.
Obviously I’ve created a set of fictional returns here to illustrate a point, but it’s still worth bearing in mind that while looking at rolling time periods is useful when making investing decisions, it’s also important to consider discrete periods. Because recent performance skews every rolling statistic, you never know what you might find if you dig into discrete periods.
Benchmarks
Not only can managers choose how their graphs and tables are displayed, they’re also able to choose their own benchmark. Obviously there are some regulations about what appropriate benchmarks are, but there’s still a surprising amount of leeway given to investment mangers for their choice of benchmarks.
To get some jargon out of the way first, benchmarks are broadly split into two camps – market benchmarks and peer group benchmarks.
Market benchmarks tend to be used for single asset class funds rather than multi-asset portfolios, and compare a fund to its investable universe. For example, an active US technology fund might compare itself to the S&P 500 Technology Index. A UK equity fund might compare itself to the FTSE 100.
Peer group benchmarks are more commonly used for multi-asset portfolios rather than funds. They compare the portfolio against a peer group of similar multi-asset portfolios. The peer group benchmark represents the performance of the average manager in the peer group. It’s often presented by the investment manager as a representation of what your portfolio’s performance would have been if you’d invested with a competitor.
When assessing whether something has outperformed, an important question is “Outperformed what?”
A simple way for a manager to get their fund/portfolio to look like it’s outperformed is to choose an easy benchmark. Because managers are able to choose their own benchmark, it’s in their best interests to do so.
Market benchmarks
Let’s say you’re a fund manager and you’re trying to make your balanced fund look as good as possible against a benchmark. An easy way to do so would be to choose cash as your benchmark. Cash (especially now) produces almost nothing in terms of returns, so even the worst balanced fund would look incredible by comparison.
Using cash is an extreme example, and would likely never happen in the real world. It’s so obviously a bad benchmark for a balanced portfolio that the manager wouldn’t be able to get away with it.
But what about cash +2%? Is that an appropriate benchmark? How about inflation +2%? Or inflation +5%
Things start to get a bit trickier now. +X% benchmarks are difficult for the investor to figure out whether it’s an appropriate benchmark or not. It’s therefore easier for the manger to choose the lowest X% they can get away with.
Let’s say the manager doesn’t use cash or inflation in their benchmark at all. Let’s say they’re a fund manager running a US small cap fund, and they want to use a market benchmark. A great market benchmark for them to choose would be the S&P 500. At first glance, they’re both US indices so it looks like a good benchmark – it’s representative of their investable universe isn’t it?
Well, not really.
Small caps are riskier than large caps, and have historically outperformed large caps over the long run (see this post on factor investing if you’re interested in which factors have well-documented evidence of long-run outperformance).
Because small caps are likely to outperform large caps over the long run, a large cap benchmark like the S&P 500 is a great benchmark if you’re trying to sell a small/midcap fund, as you’re highly likely to outperform it over the long run. The small cap fund will likely have much higher drawdowns and take much more risk than the benchmark, but if you want to get the line on your performance graph looking higher than the other one, that’s a great way to do it.
It’s easy to select market benchmarks which you’re likely to outperform over the long run if you simply choose one which takes less risk.
Peer group benchmarks
It’s also very simple to select (and convincing present) an easy peer group benchmark.
Because most retail investors will be choosing between multi-asset portfolios from investment managers (rather than single asset class funds from a fund manager), this is an especially important one to understand.
As managers are able to select their own peer group benchmark, it’s tempting for managers to pick an easy peer group. I don’t mean a peer group of poor performing investment managers (I don’t think a peer group containing only terrible managers exists), but it’s very possible to select a peer group which is taking less risk than your multi-asset portfolio.
For this example, let’s say I’m an investment manager trying to sell you a balanced portfolio – the Occam Capital Management Balanced Fund from before. Let’s say my portfolio is about 65%-70% as risky as the global index. Slightly on the high side for a balanced portfolio, but still very possible.
If I want to select a peer group benchmark, I’d know exactly which one I’d choose.
The ARC benchmarks are probably the most well-known family of peer group benchmarks, and are used extensively by investment managers. If I were trying to sell this balanced portfolio to you, I’d select the ARC Balanced Asset benchmark. It’s got ‘Balanced’ in the name, and my portfolio is balanced. So surely it’s an appropriate benchmark?
Predictably, no.
The reason I’d select this benchmark is because it contains managers whose portfolios are 40% to 60% as risky as the global market. Given that my portfolio takes 65%-70% risk, it’s almost guaranteed that my performance graph will look better than the peer group. After all, the peer group is taking much less risk. But they both say ‘Balanced’ in the name, so this is incredibly difficult for investors to spot.
So what makes a good benchmark?
There’s all sorts of technical details of what makes a good benchmark, but I think there are a few key points worth bearing in mind:
- If you’re being pitched a fund, make sure the benchmark is representative of the fund’s investable universe. Equally as importantly, make sure the risk levels are roughly equivalent.
- If you’re being pitched a multi-asset portfolio, make sure the peer group is appropriate. The risk levels for each of the ARC benchmarks can be found here.
However, as useful as this advice is, its relevance is dwarfed by a larger problem when assessing performance graphs.
Even if the benchmark is the most appropriate benchmark imaginable – it matches the portfolio’s risk levels perfectly, the investable universe is appropriate, and it’s presented in a clear and transparent manner – the real difficulty comes in knowing that you’ll simply never be presented with a graph which shows underperformance.
When it comes to portfolio selection, survivorship bias is an incredibly difficult and under-appreciated problem for investors to crack.
Survivorship bias
The idea of survivorship bias is so pervasive in the world of investing that I’ll be dedicating at least one full post to it eventually.
To summarise here, survivorship bias is the logical error of only concentrating on the things which have already made it past some form of selection process, and overlooking those things which didn’t.
As a non-investing example, the most famous survivorship bias story revolves around this image:
During World War II, the statistician Abraham Wald used the idea of survivorship bias when considering how to improve the design of fighter planes.
The red dots in the picture above represent damage to a returning plane sustained from enemy fighters during World War Two. The conventional wisdom at the time was to reinforce the plane in the red areas. After all, if planes are taking damage in those areas, shouldn’t they be more heavily protected?
But Wald recommended the opposite. He wanted to add armour to the areas that showed the least damage. His reasoning was that the military only considered the aircraft which had survived their missions. They weren’t considering any bombers which had been shot down.
The bullet holes in the returning aircraft, then, represented areas where a bomber could take damage and still fly well enough to return safely to base. It was the other areas, then, which required extra reinforcement.
This idea is just as applicable to investing today as it was to WWII fighter planes.
Its relevance when assessing performance comes when considering the fact that you will never see a graph showing underperformance. I’ll repeat that, for emphasis. You will never see a graph showing underperformance.
But believe me, they exist.
Just like the surviving planes during WWII were the only ones being analysed, it’s only the surviving funds/portfolios which are being presented to clients.
When you’re looking at a fund’s outperformance graph, you’re seeing a cherry-picked graph of one of many graphs the firm has – some of which will show underperformance, some of which won’t even exist any more because the fund closed down. Without knowing how many other graphs the firm has, it’s impossible to know how likely this one is to be indicative of strong future performance.
If the firm has 99 other graphs of funds locked away in their marketing department which show significant underperformance and you’re seeing the one which shows outperformance, then the chances are the fund doesn’t have a bright future. But you can’t tell that from looking at the graph. All you’re seeing is a fund with great outperformance.
A popular, and successful, strategy by fund houses – especially the larger ones – is to launch as many funds as physically possible, and then only market the ones which are doing well at that point in time. Essentially throwing everything against the wall and seeing what sticks.
A great way to think about this is by thinking about flipping a coin.
If we flip 1,000 coins 5 times, we expect 500 of the 1,000 coins to come out heads on the first toss. If we flip those coins which came up heads again, we’d expect 250 to come up heads again. We’d expect 125 on the third flip, 63 on the fourth, and 31 on the fifth. So we expect 31 coins to come out heads all 5 times. If you’re in the marketing department for coin-flippers, all you need to do is take one of those 31 coins which came out heads all five times and proclaim that your coin is the best at flipping heads and is therefore much more likely to be heads in the future. Just look at its track record!
And it’s the same with investments. Obviously some funds will perform well due to luck, and some will perform poorly due to luck. But by only marketing the ones which perform well, you can easily present the illusion that all your funds are incredibly successful, as the potential clients never see the ones which are underperforming.
If you’re a fund house and your artificial intelligence fund is starting to underperform, the best thing to do is to stop recommending it to potential clients. Quietly reduce the marketing efforts behind the fund, and redeploy them towards another fund which has been a perennial underperformer, but is now entering the limelight and starting to outperform – an emerging market equities fund, perhaps.
Without knowing how successful all the firm’s funds have been since they launched, it’s impossible to determine from looking at one graph whether the results are from genuine investment skill, or pure luck.
Which brings me neatly to the next difficulty with gauging future performance from past performance.
Luck vs skill
Distinguishing between luck and skill is probably one of the most difficult problems investors face. It’s so difficult to get right, and the downsides for getting it wrong can be so severe that it’s one of the reasons I favour passive investing. In my opinion, the easiest way to win this game is simply not to play.
After all, if you launch enough funds (which is common for marketing purposes, as we saw in the previous section), some of them are going to be terrible funds, but will outperform their benchmark due to luck. On the flipside, there will also be plenty of great funds which have terrible performance just because they’re unlucky.
When an investor’s reviewing a fund’s history of outperformance, one of the many questions they need to ask themselves is “How likely it is that this performance history demonstrates skill versus luck?”
And this comes with a host of additional difficult questions.
How long does it take to demonstrate luck vs skill? 1 year? 5 years? 10 years? Is there a threshold of outperformance which needs to be generated to prove skill? Does it make a difference whether outperformance is incremental, or can magnitude make up for consistency? Can skill even be proved? If not, is, say, a 90% confidence level high enough?
Taking the easiest question to answer out of all those, there’s been plenty of research into how long it takes to be confident that a manager’s returns are due to skill. Results vary depending on how the analysis conducted, but the consensus seems to be that it takes at least 20 years to be 90% confident that a manager’s returns are due to skill. Anything less than that an you’re running a much higher risk of picking a manager whose performance has been nothing but luck.
A few interesting links containing supporting evidence for the minimum 20 year time horizon are here, here, here, here, here, and here.
20 years is an eternity in investing. I can’t stress that enough. There’s not a single person on the planet who would stick with an underperforming fund for 10 years, let alone 20. Most investing decisions are based on 5 year performance numbers – frequently even 3 year numbers.
Timeframes are even shorter in the institutional world. If you think your investment manager is able to “invest for the long term” and “avoid the noise” then you’d be disappointed. Because almost all institutionally run money is intermediated, there’s significant pressure on short term performance. Two successive quarters of underperformance puts you on the investment consultant/IFA’s watch list. Three quarters and you’ll have to explain to the client why you’re underperforming. That’s not even 1 year of performance data!
If you’re an investment manager, there’s therefore significant pressure to shorten your time horizon and invest in assets based on short-term performance. After all, it’s much easier to justify an investment based on recent strong performance than to explain that it’s been underperforming for 5 years, but you believe it’ll recover because you’re a long-term investor.
Unfortunately, in the increasingly intermediated world of investing, more and more emphasis is placed on defensive decision-making: “Which decision can I most easily justify if things go wrong”, rather than “Which decision is most likely to improve client outcomes?” And it’s much easier to justify something which has strong recent performance than an underperforming long-term buy and hold strategy.
Even without an investment consultant or IFA involved, an investment manager has an incentive to base decisions on short term performance. After all, managers are hired for their fund selection and are ultimately accountable to the client. No client is going to be happy sticking with the manager who sticks with a fund which has been underperforming for 5 years, let alone 10 years.
Partly to make their conversations with clients easier, partly to demonstrate their own value, investment managers are also encouraged to switch funds regularly – which more often than not is based on recent performance.
The point is that if you’re not the one making your investment decisions, the person you’ve nominated is more than likely making decisions based on insufficient data. This is one of the reasons I’m a fan of DIY investing. Perversely, the person with the highest chance of making long-term investing decisions is likely to be the one who’s least qualified.
Again, I have more than enough research to put together at least one post on the luck vs skill debate, so won’t dive into the details here.
The key point is that when you’re presented with a performance graph containing anything less than 20 years of data (many even say 20 years isn’t enough), then remember that it’s not enough time to draw any conclusions about the manager’s level of skill. And if you can’t draw conclusions about a manager’s skill, then you equally can’t draw conclusions about whether that performance is likely to continue or not.
If you think a manager will outperform based on a history of recent outperformance (recent here is anything less than 20 years), then you’re going to have a bad time.
Which brings me to the next section – why outperformance is a poor basis for making investment decisions.
Mean reversion
Here we are again, with another crucial investing concept which will later have its own post.
Again, to be clear on jargon, mean reversion is the idea that if something is far away from its average, it has a tendency to move towards the average in the future.
For example, Leicester winning the Premier League in 2016 was incredibly unlikely. A first-place finish was far away from their average place in the league. Because the result was somewhat due to luck, their future performances were likely to revert towards their average place in the table.
Genuine skill is likely to persist, while luck is random and fleeting. So the more luck is involved in the activity, the faster the results revert to the mean.
In investing, because there’s an element of luck in performance (as we’ve seen), there’s also an element of mean reversion. This is useful for assessing the performance of active fund managers, because managers with a history of outperformance tend to revert to the mean and underperform in the future.
Again, there’s plenty of evidence to document this phenomenon, but today I’ll stick with one of my favourite sources of data.
The SPIVA Persistence Scorecard attempts to distinguish luck from skill by measuring the consistency of active managers’ success. It tracks the degree to which historical relative performance is predictive of future relative performance.
It basically tracks the data to help answer the question of “Is a manager likely to outperform in the future if they’ve outperformed in the past?”
The most recent report (mid-year 2020) shows that, regardless of asset class or style focus, active management outperformance is typically short-lived:
The chart shows that, of the equity funds which finished in the top half in terms of cumulative returns for the period from June 2010 to June 2015, 38.6% replicated that accomplishment during the period from June 2015 to June 2020.
If you chose to invest in a top-half fund, you’d have had less than a 40% chance of that fund remaining in the top-half after 5 years. In fact, it was more likely for a top-half fund to close its doors or change its style (41.5% combined) than repeat its performance in the top half.
Breaking this down into quartiles (rather than halves), and the same picture emerges.
Of the equity funds which finished in the top quartile in terms of cumulative returns for the period from June 2010 to June 2015, 30% replicated that accomplishment during the period from June 2015 to June 2020.
If you chose to invest in a top-quartile fund, you’d have had a 30% chance of that fund remaining in the top-half after 5 years. So there’s a roughly 70% chance of a top-quartile performing fund not remaining top-quartile in the next 5 years. In fact, the chances of it changing its style are almost as high as the chances of it remaining top quartile (24% vs 30%). Those sound like pretty poor odds to me.
Another interesting stat from the latest scorecard is that the chances of a top-quartile fund remaining top quartile each year for the next four years was 1.6%. That’s astonishingly low.
And it’s the same for bond funds, too. In 8 of the 13 categories considered, no top-quartile fund from June 2016 maintained that status annually through June 2020. In six categories, no top quartile fund from June 2018 could repeat that performance for even the next two years.
So SPIVA does a good job in suggesting that performance doesn’t persist. Again, there’s plenty more evidence where that came from, and I’ll go through it in more detail in a later post. The main takeaway for now is that performance isn’t persistent and mean reversion is a powerful force in investing.
When you’re presented with a graph showing outperformance over the last 5 years, remember the power of mean reversion. Active fund performance is rarely persistent, and the larger the outperformance, the faster the performance is likely to revert to the mean in the future.
Summary
- Check the axes. Have the scales been cherry-picked? And if so, how can you find the data which isn’t being displayed?
- If you’re looking at tabular performance data, check wither its discrete or rolling – rolling returns can mask significant underperformance over discrete periods.
- Check the benchmark it’s being compared to – managers are able to select their own benchmarks, so make sure it’s appropriate.
- Remember: you will never see a graph showing underperformance. Survivorship bias means you’ll only ever be presented with graphs showing outperformance – are you merely seeing one of the 1,000 coins which happened to flip heads five times?
- Consider how likely it is that the time period you’re assessing performance over is due to luck or skill – the shorter the time period, the more likely it is the results are due to luck.
- Remember the power of mean reversion – there’s a roughly 70% chance of a top-quartile performing fund falling out of the top-quartile in the next 5 years.
Graphic design is now being applied very widely and it really is something that is extremely promising. For example, I am now actively creating various presentations. This helps to easily convey, for example, to business partners key ideas that should catch their interest. By the way, I recommend here https://slidepeak.com/blog/how-to-create-a-scientific-presentation to read an interesting article about creating presentations. I think that this is the kind of material that will definitely come in handy for many people.
Thank you very much for this excellent article. I knew every bias and trick you mention, beginning with having read Huff’s “How to Lie with Statistics” while at school decades ago – and yet the years have dulled my sense of danger.
You have reminded me, in a very clear way, of the real threat of allowing myself to be misled by flattering pictures of past performance.
Thanks Jonathan. If you ever needed a job, it sounds like you’d find a home in a fund manager’s marketing department in no time! That’s a great book, but as you say, it’s so easy to forget these things. The allure of high returns also helps dull the critical faculties! I’ve seen plenty of people far smarter than me fall victim to these sorts of things.
Glad you are starting to recoup an expensive ancient history degree!
With this qualification, I am sure you appreciate the importance of nuances when it comes to translation, and I hope I’ll be forgiven if I am being pedantic.
Timeo Danaos et dona ferentes – Beware the Greeks even if they bear gifts
Hi Ludo,
Thanks for the correction. I did consider using that translation, but it made for a much less catchy title!
All the best,
Occam
I completely agree, Ludo! Everyone forgets, or mis-translates that et. One becomes so used to et as ‘and’ that one forgets its usage as ‘even’.
The reversion to the mean is predictable because the usually quoted arithmetic mean/average eventually withers to the geometric mean.
Great post!
I’ve had a few of the same thoughts myself but you’ve tied it all together really well.
The trick of how to present lacklustre performance as being benchmark beating requires more skill, training and insight from the marketing department than those in the fund management department have in picking stocks.
Name-drop a superstar fund manager (Woodford anyone?) and you have the recipe to make millions!
Best post yet….good arguments and well presented!