Original text came from one of the highest iq in the world (not mine)., and is currently paid to acess or accessed for members inside a High Iq society.
For @quepasta @N30N @Kaari
-----
WHY MOST ECONOMY TEACHINGS ARE BULLSHIT / WHY PEOPLE LOSE IN INVESTMENTS.
SUMMARY.
A few years ago, a study was conducted on the efficiency of the strategies used by 100 professional Wall Street managers. To do so, their performances were compared to the performance of a chimpanzee operating blindfolded.
The results were published in the Wall Street Journal and several other magazines[1]. The risk-adjusted return obtained by the chimpanzee was better than the average of the 100 professional managers.
How should we interpret this fact and how should we view the validity of the certificates issued by regulatory entities? Is there anything that needs to be changed in the criteria that determine the issuance of these certificates, to prevent approved managers from continuing to generate results for their clients comparable to those obtained by chimpanzees? This is what we will discuss in this article.
INTRODUCTION.
When someone buys a TV or a blender, they turn it on and check if it works. The test does not require in-depth knowledge of electronics or mechanics. Practically anyone can check if a device like this works, and the risk of making a mistake in this assessment is low.
If the product does not work, it should quickly show anomalies that are quite easy to identify. The person can try to turn it on 2 or 3 times, and in less than 1 minute the person will have a rough idea if the product is defective. It could be a strange noise, a stain, some asymmetry, a scratch, etc.
Even in the case of complex products, made up of many components, such as a car, there may be small faults that go unnoticed in the test, but do not compromise its overall functionality, so that defective items do not reach 1% of the product's value, and very rarely reach 10%, and do not cause major losses.
However, this only applies to products whose operation is predominantly deterministic. In such cases, there is no need for an expert to guide the consumer. The consumer himself, equipped with common sense and some basic knowledge, is able to assess whether a product is working well or whether it shows signs of a defect. In addition, there are laws (CDC) that regulate the conditions for returns, upon detection of hidden defects, as well as possible compensation, if the malfunction causes losses to the consumer.
However, when it comes to an investment product, the situation changes completely, because the results generated are almost completely random, with a small amount of determinism. This means that an efficient product can generate losses for months or years, while another that is inefficient can remain positive for months or years. This characteristic, inherent to any investment product, makes it extremely difficult to assess its quality and requires long periods of data collection, as well as analysis of this data, in order to make valid inferences.
An analogy can be drawn with poker players. A world-class player can be in the red for 2,000 tournaments, like André Akkari, while a beginner can have several consecutive wins by sheer luck. [see the article Yin Yang]
To assess whether a product of this class delivers what is expected, a considerably high level of knowledge of Statistics, Logic and Scientific Methodology is required.
Pharmaceutical products resemble financial products in some aspects, which is why some analogies presented in this article will be through comparisons with items of this type. A drug administered to hundreds of people may work in some of them but not in others, just like a placebo. What distinguishes a placebo from an active ingredient is a statistical difference that most of the population is not able to measure or interpret.
This is why it is common for healers to become successful and attract crowds of unsuspecting people, thanks to the placebo effect, which is not understood by laypeople. This success lasts until they are duly prosecuted by regulatory agencies. Ironically, people harmed by these charlatans still tend to defend their detractors, because they are unfamiliar with the concept of placebo, are unfamiliar with the fundamentals of the scientific method, and are susceptible to believing that if something worked in a specific case, then it must always work.
In the health sector, a fairly large percentage (perhaps more than 99%) of those working in this area are professionals whose qualifications range from reasonable to excellent, and all of them are better prepared to perform their duties than an average person randomly selected from the population. Occasionally, a charlatan emerges and ends up being denounced and prevented from continuing to practice. However, under normal circumstances, if two groups, A and B, group A with 100 doctors and group B with 100 people without medical training, are brought together, and these 200 people from groups A and B are asked to examine 50 patients with different diseases, it is likely that even the worst of the 100 doctors will make more correct diagnoses than the best of the 100 people without medical training.
On the other hand, in the investment sector, the opposite is true: more than 99% of those who work as professionals in this area do not have any qualifications that distinguish them from a layperson, in fact, they do not even have any qualifications that distinguish them from a chimpanzee[1] in terms of risk-adjusted performance. Occasionally, a truly competent professional emerges, but ends up going unnoticed, in addition to being boycotted by the charlatans who prevent their emergence in an environment that they have already dominated.
Given this morbid reality, it would be expected that regulatory entities would make an effort to reverse this situation and establish a minimum level of seriousness in this environment.
In this article, we will try to take the first step towards informing investors, as well as encouraging regulatory entities to promote a reformulation of the laws that regulate the Financial Market, in order to filter the investment products that are offered to the public, avoiding the sale of snake oil as if it were medicine. It is also expected that the criteria for issuing certificates to managers and analysts will be reviewed, so that those distinguished with a certification of this category are entitled to it, and do not continue to be surpassed by a chimpanzee[1].
1. FUNCTION EXPECTED FROM A REGULATORY AUTHORITY.
ANVISA and the Federal Council of Medicine (CFP) are entities that look after the health of the population, supervising the practice of medicine and regulating which supposedly therapeutic substances can be marketed by the pharmaceutical industry. In this context, it is understood that the general population does not have the knowledge and discernment necessary to judge whether:
a. A supposed health professional is qualified to correctly assess a person's health status and offer them appropriate treatment.
b. A certain medication and the respective treatment using this medication produce a superior effect to that of a placebo.
The evaluation and selection of qualified professionals, as well as of scientifically tested and validated medications, is a task that goes beyond the domain of the average citizen. For this reason, entities responsible for promoting this filtering were created, in order to prevent the practice of medicine by quacks and other charlatans, as well as to prevent water and flour from being sold to the population as a medication for the treatment of cancer and other diseases.
Neither ANVISA nor CFM are immune to failures, but, in general terms, ANVISA adopts criteria that are well-founded in contemporary scientific methodology, requiring studies that corroborate the efficiency of the active ingredients present in the medicines, which need to be rigorously tested, following appropriate protocols, first on guinea pigs and then on sufficiently numerous and diverse samples of humans, in studies conducted by independent researchers, producing statistically valid results.
In general, when it is assumed that a certain active ingredient can produce a certain effect in the treatment of a certain disease, a double-blind experiment is carried out, in which a sample of people is divided into equivalent groups: one group receives the active ingredient and the other receives a placebo. The evolution of the disease in these two groups is monitored and the efficiency of the active ingredient is verified in comparison to the placebo, and this contrast is measured by appropriate statistical tools.
Without meeting these requirements, a drug cannot be put into circulation, as there would be no reasonable reason to believe that the alleged drug could produce any effect beyond that which would be obtained by a placebo.
The Federal Council of Psychology adopts similar procedures to select the psychometric instruments that are approved for use in clinics and companies.
These procedures may not be 100% accurate, in the sense that they are not capable of guaranteeing that all approved drugs and treatments are effective, nor are they capable of guaranteeing that 100% of those not approved are ineffective. But at least they are correct in more than 90%. Therefore, this practice contributes greatly to separating the wheat from the chaff, through fair and scientifically recognized criteria.
2. CAPITAL MARKET REGULATORY ENTITIES.
(...)
The Financial Market is permeated with products of the lowest quality, offered without any restrictions, and consumed inadvertently in alarming quantities. These are magazines, books, courses, handouts, lectures, robots, indicators, strategies, etc., none of which presents the slightest evidence that they are capable of generating profits consistently. In fact, the vast majority present clear evidence that they generate losses.
Between 2006 and 2007, more than 500 quantitative strategies that are widely recommended in books, courses, brokerage websites, specialized magazines and other sources were examined[2]. To this end, these strategies were automated and executed on historical tick-by-tick series of EURUSD between April 2000 and October 2006.
Each strategy had its parameters optimized with the help of a genetic algorithm in the period from 4/20/2000 to 12/31/2002 and, then, the champion genotype among 10,000 generations in each strategy was put to operate in the period from 1/1/2003 to 10/1/2006. The result was that all the strategies tested were negative.
In 2014, new tests were carried out, with historical series from 1986 to 2014, and more than 1000 strategies were tested, with optimization between 12/4/1986 and 12/31/2005. Then, the champion genotype of each optimization was tested between 1/1/2006 and 3/1/2014. All were negative.
Considering that each strategy was tested with 10,000 different configurations, and the best configuration was used in the following period, then the 1,000 best among 10,000,000 configurations were tested and all were negative.
This provides an idea of the rarity of truly profitable strategies and also shows that all of the most consecrated and extolled strategies in books and courses are losers, but continue to be offered and consumed by millions of people.
In the case of medicines, if an inefficient product eventually enters the market, it is quickly reported, investigated and resolved. But in the area of investments, the market is so shamefully saturated with inefficient products that this has come to be accepted as “normal”. To get a quantitative idea of the problem, one can analyze an online fund ranking[3]
When consulting this ranking, at the end of 2014, there were 383 funds listed. By the end of 2015, several had gone bankrupt and the list was reduced to 241 funds, of which only 27 were nominally positive in the last 24 months. Of these 27, only 4 funds had matched inflation in the last 2 years and probably none of them had matched inflation in the last 3 years. Therefore, out of 383 funds, only 4 remained positive, in real terms, for 2 years, that is, 99% of negative funds. If the period considered were longer, the percentage of negative funds would be higher, since the few positive ones were generally benefiting from luck, and increasing the data sample, the luck factor would dissipate and reveal that even those few positive ones would not last for longer periods.
This situation causes a serious loss of confidence in investment products among the population, because the average citizen does not have the statistical and scientific knowledge necessary to distinguish between what is good and what is bad, and so ends up generalizing, as if all investment products belonged to the same class of extremely high-risk products, since the products they have come to know are nothing more than “water and flour”. This situation is very sad, because there are some really good investment products, but these rarely reach the general public.
What the average investor expects is for regulatory entities to filter the products that are made available on the Financial Market and monitor the legitimacy of the information disclosed by the suppliers of these products, in order to eliminate inefficient products from the “shelves”, as well as to ensure that the information about each product is properly documented and representative of the facts.
When a person goes to a pharmacy and buys a medicine, the probability that it is an ineffective substance is so low that the person practically ignores this risk and trusts the characteristics stated on the packaging, because they know that there is an official entity that filters the products made available to the public and supervises the accuracy of the information disclosed. Fraud is occasionally discovered in this sector, which is immediately reported and the offenders are punished, with irregular products being removed from circulation. For this reason, even with these isolated occurrences, there is still great confidence in the quality of health products.
On the other hand, a person who has had multiple negative experiences with investments, when faced with a new investment product that they are not yet familiar with, practically disregards all the information presented about the characteristics of this product, and assumes that it is as fraudulent as that of other investment products that they have already purchased and with which they have suffered substantial losses. There is no protective figure of an entity that the investor can trust, which looks after their safety, preventing the circulation of harmful products and inspecting the accuracy of the information disclosed.
As a result, the very few truly efficient products are penalized, as they get lost in the trash, and even if people find them, they end up not believing what they read, due to the trauma they suffered from their previous bad experiences, which led them to cultivate a very strong generalized rejection to protect themselves against the bad investment products that pollute the market.
In this context, not only the investor is harmed, but also the suppliers of good quality products, who are viewed with extreme distrust.
Another serious problem is the illusion of risk-free investments, such as real estate, government bonds or savings, which are among the favorites of less experienced investors, because in their eagerness to completely avoid risks, they fall into the trap of believing that these alternatives would protect them.
The fact is that there are no risk-free investments. As the poet Guimarães Rosa said, “living is a very dangerous business”. Anyone who knows a little about the history of defaults by the Brazilian government, with Eletrobrás bonds, public debt bonds and others, knows that the false security that is usually portrayed in these bonds actually hides a very high and incalculable risk of total loss. Real estate involves risks comparable to any other low-liquidity investment, therefore they are worse than stocks, and the confiscation of savings accounts in the 1990s was an exemplary lesson in dispelling the illusion of risk-free investment.
Regulatory entities should establish a standardized, objective and empirically validated scale to be adopted as a reference in risk classifications. Current risk classification methods fail so badly that almost all funds classified as low risk have been negative for 8 years, while some of the funds classified as high or very high risk (Sparta, Tempo Capital, etc.) have much smaller real losses than those classified as low risk, and some are even positive, such as the Verde fund.
How can a fund be classified as “low risk” and remain negative for 8 years? And how can a fund be classified as very high risk and remain positive for 23 years and never have been negative for more than 2 years? This shows how inefficient and distorted the scores used by banks and rating agencies are.
Such disparities between the risk classification and the result actually observed are frequent and indicate the imminent need to review the criteria for risk classification.
To make this situation even worse, even large financial institutions offer a huge variety of terrible investment products. Even banks, in which people tend to place more trust and almost blindly follow the recommendations of their managers, under the illusion that, as they are larger institutions, they should offer more solidly founded products, are crammed with some of the worst investment products on their menus.
Bad investments are distributed across all levels, affecting private, prime and retail clients, but the main victims are retail investors, who are generally less educated, less advised and, when they receive some advice, it is usually of low quality. As if that were not enough, they are also exposed exclusively to the worst products among which there are already many bad ones. Without understanding the concept of “opportunity cost”, without knowing the difference between “real profits” and “nominal profits”, without knowing how to calculate compound interest, these people are easily fooled by an avalanche of incorrect and distorted information.
In a report published in 2014 in the magazine InfoMoney[4], for example, the author comments that if a person invested R$105 million (MegaSena) in savings, they could use the income to buy an apartment per month, and he does the calculations to “prove” his statement using nominal profits! The real fact is that since the changes of 2012, savings have never managed to keep up with inflation and have generated real losses, so by investing in savings, a person would lose an apartment every six months, instead of gaining one per month.
If a specialized magazine publishes several articles with errors as basic as this one, what can we expect from the way most of the population analyzes investments? Magazines that should inform, guide and clarify, instead mislead the reader, feeding the illusion that savings allow for some gain, when in fact it only partially mitigates the loss produced by inflation.
Another point that requires attention is the criteria for certifying managers, which involve a course lasting a few days and then a test whose content does not have the slightest possibility of assessing the aspiring manager's ability to perform his or her role. As a result, most professional managers, certified by regulatory entities and hired by large banks and brokerages, perform worse than chimpanzees[1]. This does not mean that chimpanzees are more capable than human managers. This simply reflects the immense difficulty of the problem of modeling the Financial Market, to the point that educated and intelligent people who have dedicated decades to studying the subject cannot achieve better results than random purchases and sales.
The use of the chimpanzee in the experiment published in the Wall Street Journal served to dramatize and satirize the situation, but what is intended to show is that the results obtained by professional managers are indistinguishable from those that could be obtained by randomly selecting the components of the portfolio.
This fact should cause great concern and embarrassment to the US regulatory bodies that certify these managers and recognize the validity of these certificates.
Furthermore, one should not think that this is a problem exclusive to the US. Brazil is no better. The number of Brazilian managers with positive performances over the last 15 years and who have continued to perform well over the last 5 years, in real terms (discounting inflation), can be counted on the fingers of one hand. However, the number of people authorized and accredited to market investment products and services, including purchase and sale recommendations and portfolio management, is thousands of times greater.
3. LOGIC AND MATHEMATICS OF THE CHIMPANZEE PHENOMENON.
To get a better idea of why the chimpanzee performed better than the average human manager, suppose an exam consisting of 100 multiple-choice items, with 2 alternatives for each item. Let's say that the difficulty level of the items is compatible with the skill level of the people being examined. Then it is expected that the scores obtained by the people will be a combination of the skill level and the luck factor.
A person who can answer 60 of the 100 items correctly should get these 60 right plus part of the other 40. This part of the 40 will be defined by a binomial distribution, whose peak frequency will be 20, so the most likely result is that in total he will get 20 + 60 = 80 right. Since each item has 2 alternatives, there is therefore a 50% chance of getting it right by luck.
This does not mean that all people who manage to solve 60 will score exactly 80, because although the 60 “guaranteed” points are fixed among all people who know 60 of the answers, there is a part that varies with luck, which are the 40 “guessed” items. Therefore, some of the people who manage to solve 60 will score 79 points, others will score 78, 77, 81, 82, etc.
Similarly, people who know 56 answers should get these 56 right plus half of the 44 guessed, that is, they are expected to have an average score of 78. However, some who manage to solve 56 may have a little more luck and get 25 right out of the 44 guessed, scoring a total of 81, and, with that, some who manage to solve 56 are above the average of those who manage to solve 60.
Now imagine a test so difficult that the people examined cannot get any question right. It could be a test from the International Mathematical Olympiad, for example, or the last 10 items of the Sigma Test. In a situation like this, it is said that the test is not suitable for discriminating between the different skill levels of the people being examined, because both the people with the highest skill level and those with the lowest skill level would have a score of 0 if the test were discursive.
Since the test is multiple choice, instead of everyone scoring zero, the scores generated by random guesses will have a binomial distribution, but these scores do not represent skill levels. They are simply the result of statistical fluctuations and will be distributed randomly. This is exactly the situation with the chimpanzee and the 100 professional managers. Therefore, although managers are much more skilled than chimpanzees, and some managers are much more competent than others, the difficulty level of the “test” is so high that if there were no multiple choice (up/down), all of them, including the most skilled, would have a score of 0.
Since the prices can move in basically two groups of directions (up or down), and the probabilities are almost equivalent, the choices are like a test with 2 alternatives for each item. The extremely high difficulty in modeling the market means that all managers, including the most skilled, are unable to make any correct decisions, and this “flattens” all scores to the minimum level, which is the level defined by luck, so the differences in performance are the result of chance and do not reflect any relevant information about the ability of these managers. Therefore, no statistically significant difference was observed between the chimpanzee and the average of the 100 human managers.
4. SCOPE OF THE PROBLEM.
The chimpanzee study covers a small part of a much larger problem. Some investment professors claim to have “prepared” more than 100,000 students. The question is: prepared for what?
If just 1 professor spreads his superstitions among 100,000 unsuspecting people, what is the total number of victims? If you add up the students of all the professors, it must total somewhere between 200,000 and 300,000 (not much more due to redundancy, since many students participate in courses with several professors).
This is an alarming number, because if there are less than 10 people in Brazil who can actually make money with financial operations consistently, and 300,000 who buy some “educational” investment product in the expectation that they will become capable of making a profit using what they have learned, something is not right.
There is no 1:1 ratio between those who enroll in an Engineering, Medical or Law school and those who actually finish their degree, because a large proportion drop out without completing it. According to a report published in 2013[5], 44% of students enrolled in Engineering schools completed their degree. Engineering is considered one of the most difficult courses and has the highest dropout rate, yet almost 50% stayed until they finished.
Let's say that not everyone who finishes an Engineering course is sufficiently qualified to diligently perform the activities expected of their profession. In a very pessimistic estimate, we can assume that only 10% become successful professionals, in the sense that they are able to earn money practicing their profession.
Now let's compare this to participants in investment courses: 0.003% manage to produce positive results and generate some profit from what they learned. Why is it that 99.997% of participants in these courses fail to make money? Are the students missing something? Or is it that the course content does not provide enough support for them to be able to make a profit? To understand this phenomenon, we can analyze the results of an experiment carried out in the 1980s by one of the world's greatest traders.
In 1983, Richard Dennis made a bet with William Eckhardt about whether it was possible to train anyone to become a professional trader[6]. Dennis argued that it was possible, while Eckhardt argued that it was necessary to have specific innate skills, honed with training.
Dennis published an advertisement in the Wall Street Journal, recruiting candidates for training. About 15,000 people signed up. He selected them by written exam, then interviewed the candidates who obtained the best results on the exam and, in the end, selected 23 people, to whom he offered training for 2 weeks and then put them to trade. This group became known worldwide as the “Turtle Traders”.
In the second half of the 1980s, with the stock market boom, the turtles made a lot of money and enjoyed temporary success, but it soon became clear that they lacked long-term consistency. Of the 23 turtles, only 1 had long-term success: Jerry Parker, who is still active today, with an average performance of 12.17% per year since 1988, in his Chesapeake Capital fund[7].
Incredibly, Dennis thought he won the bet, and several people on websites, blogs and forums cite this story as if Dennis had won. However, it is clear that Dennis was wrong, while Eckhardt's opinion was much closer to reality. Considering that the newspaper ad reached millions of people, among whom there was a self-selection that led to 15,000 becoming interested, and of these 15,000 he selected the 23 he found most qualified and, after training them, only 1 of them was successful, then the result clearly suggests that about 1 in 1,000,000 people are able to be positive in the Market consistently in the long term.
In order for Dennis to be able to correctly test his thesis that anyone can win in the Market, as long as they receive adequate training, he would have to randomly select 100 people from the street. Not even the telephone directory would be valid, because in 1984 the possession of a telephone already represented a more select public and the sample would be biased.
In addition, it is likely that the 2-week “training” was almost irrelevant. Parker would probably have been just as successful if he had not received such training (perhaps it would have taken him longer to learn on his own).
What can be inferred from this? The facts are quite clear: Richard Dennis, one of the best traders in the world, with proven results for decades, selected some people who, in his opinion, had the most important attributes to have any chance of success in the Market, and trained them personally. Even so, only one of them was successful, and not very significantly (average annual profit of 12%).
This is part of the answer to the problem. It shows that even with an exceptional teacher, it is extremely difficult to learn to operate in a way that generates consistent profits. Therefore, even if the courses were offered by competent traders and they taught something useful, only a very small portion of the population would have the necessary attributes to be able to benefit from the teachings, while approximately 99.999% of people would waste time and money on the courses and more money trying to use what they believed they had learned.
But in the real scenario it is worse, because it is not Richard Dennis who offers the courses. Practically all sellers of courses, books, lectures, etc. are negative in the market. If teachers themselves are persistently losing, what is the probability that these people's students will be able to learn something useful and make a profit?
Since there is a 50% chance of winning by chance, it is clear that luck ends up contributing a lot to deceiving the masses, and that is precisely why filtering should be carried out by a regulatory entity, since the population does not have the knowledge to do so.
In addition to managers who operate with results worse than those of chimpanzees and course sellers who are like water and flour, the problem extends much further. In recent years, robots have been in fashion. People think that all they need to do is use cutting-edge technology, empty of content, to have some success.
Although robots are in the only group that has some possibility of enabling scientific investigations into the quality of the strategies used, what we observe in practice is that there is no seriousness whatsoever on the part of the developers. Some really do not know the basics about the studies that would need to be carried out to validate a strategy. Others may know, but they disguise the results in different ways.
(...)
Both Buffett in 1956 and Soros in 1968 would not have been approved as managers or for any of the other roles listed above based on these criteria. This led our friend J.A.L.J. to suggest that the criteria be changed to make them compatible with the approval of these legendary managers.
This creates some difficulties because, although Soros and Buffett were notable at that time, at the beginning of their careers there were not enough elements to predict the success they would achieve. These elements only became known as their gains began to materialize. Therefore, although a retrospective assessment might suggest that Buffett and Soros should be approved, this is a post facto assessment.
For example, in 1905, Einstein was as brilliant as he would be in the following decades, but he was still not recognized. He was an anonymous figure lost in the crowd, and he was not taken seriously. Even his “friend” Marcel Grossmann did not want to “waste” his precious time helping Einstein with the mathematics needed to formalize the General Theory of Relativity. Instead of dealing with the mathematical part, Grossmann simply made some bibliographical suggestions to Einstein, so that, if Einstein wished, he could learn about tensors himself and do the work.
With this, Grossmann entered history as a semi-anonymous, with a small figurative role, and in a letter from Einstein to Lorentz, the brilliant physicist suggested that he would have mentioned Grossman as a co-author of the Theory of Relativity, if Grossman had helped him with the formalization and calculations. When Einstein finally achieved the necessary mastery of the mathematical tools he needed, he was able to revise some of his 1911 calculations and make new predictions about the angle of deflection of light as it passes near the Sun, and it was only in 1919, with the eclipse observed in Sobral e Príncipe, that his work was recognized.
Who could be blamed for not recognizing Einstein's merits in 1905? There were several other attempts to explain the results obtained by Michelson and Morley, which assumed different premises and arrived at different results. Einstein's theory was just one of them. To know which of the theories would find better support in the experimental data, it would be necessary to test it through observation and measurement. Thus, several years passed until the tests were carried out and Einstein's hypotheses were experimentally verified and gained the status of "theory".
With Buffett and Soros the situation is similar. Today we know that they are extraordinary managers, just as they were since 1956 and 1968, but at that time there was not enough data to know. The most that could be done would be to apply tests that could predict these results. The tests at the time proved to be extremely inefficient, because in addition to both of them failing, several other less qualified managers passed, and almost all of those who passed did not present results better than those of a chimpanzee.
A test that would enable the correct identification of Buffett and Soros' potential and, at the same time, would correctly filter out the vast majority of insufficiently qualified managers, would be a kind of fundamentalist backtest. This will be the subject of a future article.
For @quepasta @N30N @Kaari
-----
WHY MOST ECONOMY TEACHINGS ARE BULLSHIT / WHY PEOPLE LOSE IN INVESTMENTS.
SUMMARY.
A few years ago, a study was conducted on the efficiency of the strategies used by 100 professional Wall Street managers. To do so, their performances were compared to the performance of a chimpanzee operating blindfolded.
The results were published in the Wall Street Journal and several other magazines[1]. The risk-adjusted return obtained by the chimpanzee was better than the average of the 100 professional managers.
How should we interpret this fact and how should we view the validity of the certificates issued by regulatory entities? Is there anything that needs to be changed in the criteria that determine the issuance of these certificates, to prevent approved managers from continuing to generate results for their clients comparable to those obtained by chimpanzees? This is what we will discuss in this article.
INTRODUCTION.
When someone buys a TV or a blender, they turn it on and check if it works. The test does not require in-depth knowledge of electronics or mechanics. Practically anyone can check if a device like this works, and the risk of making a mistake in this assessment is low.
If the product does not work, it should quickly show anomalies that are quite easy to identify. The person can try to turn it on 2 or 3 times, and in less than 1 minute the person will have a rough idea if the product is defective. It could be a strange noise, a stain, some asymmetry, a scratch, etc.
Even in the case of complex products, made up of many components, such as a car, there may be small faults that go unnoticed in the test, but do not compromise its overall functionality, so that defective items do not reach 1% of the product's value, and very rarely reach 10%, and do not cause major losses.
However, this only applies to products whose operation is predominantly deterministic. In such cases, there is no need for an expert to guide the consumer. The consumer himself, equipped with common sense and some basic knowledge, is able to assess whether a product is working well or whether it shows signs of a defect. In addition, there are laws (CDC) that regulate the conditions for returns, upon detection of hidden defects, as well as possible compensation, if the malfunction causes losses to the consumer.
However, when it comes to an investment product, the situation changes completely, because the results generated are almost completely random, with a small amount of determinism. This means that an efficient product can generate losses for months or years, while another that is inefficient can remain positive for months or years. This characteristic, inherent to any investment product, makes it extremely difficult to assess its quality and requires long periods of data collection, as well as analysis of this data, in order to make valid inferences.
An analogy can be drawn with poker players. A world-class player can be in the red for 2,000 tournaments, like André Akkari, while a beginner can have several consecutive wins by sheer luck. [see the article Yin Yang]
To assess whether a product of this class delivers what is expected, a considerably high level of knowledge of Statistics, Logic and Scientific Methodology is required.
Pharmaceutical products resemble financial products in some aspects, which is why some analogies presented in this article will be through comparisons with items of this type. A drug administered to hundreds of people may work in some of them but not in others, just like a placebo. What distinguishes a placebo from an active ingredient is a statistical difference that most of the population is not able to measure or interpret.
This is why it is common for healers to become successful and attract crowds of unsuspecting people, thanks to the placebo effect, which is not understood by laypeople. This success lasts until they are duly prosecuted by regulatory agencies. Ironically, people harmed by these charlatans still tend to defend their detractors, because they are unfamiliar with the concept of placebo, are unfamiliar with the fundamentals of the scientific method, and are susceptible to believing that if something worked in a specific case, then it must always work.
In the health sector, a fairly large percentage (perhaps more than 99%) of those working in this area are professionals whose qualifications range from reasonable to excellent, and all of them are better prepared to perform their duties than an average person randomly selected from the population. Occasionally, a charlatan emerges and ends up being denounced and prevented from continuing to practice. However, under normal circumstances, if two groups, A and B, group A with 100 doctors and group B with 100 people without medical training, are brought together, and these 200 people from groups A and B are asked to examine 50 patients with different diseases, it is likely that even the worst of the 100 doctors will make more correct diagnoses than the best of the 100 people without medical training.
On the other hand, in the investment sector, the opposite is true: more than 99% of those who work as professionals in this area do not have any qualifications that distinguish them from a layperson, in fact, they do not even have any qualifications that distinguish them from a chimpanzee[1] in terms of risk-adjusted performance. Occasionally, a truly competent professional emerges, but ends up going unnoticed, in addition to being boycotted by the charlatans who prevent their emergence in an environment that they have already dominated.
Given this morbid reality, it would be expected that regulatory entities would make an effort to reverse this situation and establish a minimum level of seriousness in this environment.
In this article, we will try to take the first step towards informing investors, as well as encouraging regulatory entities to promote a reformulation of the laws that regulate the Financial Market, in order to filter the investment products that are offered to the public, avoiding the sale of snake oil as if it were medicine. It is also expected that the criteria for issuing certificates to managers and analysts will be reviewed, so that those distinguished with a certification of this category are entitled to it, and do not continue to be surpassed by a chimpanzee[1].
1. FUNCTION EXPECTED FROM A REGULATORY AUTHORITY.
ANVISA and the Federal Council of Medicine (CFP) are entities that look after the health of the population, supervising the practice of medicine and regulating which supposedly therapeutic substances can be marketed by the pharmaceutical industry. In this context, it is understood that the general population does not have the knowledge and discernment necessary to judge whether:
a. A supposed health professional is qualified to correctly assess a person's health status and offer them appropriate treatment.
b. A certain medication and the respective treatment using this medication produce a superior effect to that of a placebo.
The evaluation and selection of qualified professionals, as well as of scientifically tested and validated medications, is a task that goes beyond the domain of the average citizen. For this reason, entities responsible for promoting this filtering were created, in order to prevent the practice of medicine by quacks and other charlatans, as well as to prevent water and flour from being sold to the population as a medication for the treatment of cancer and other diseases.
Neither ANVISA nor CFM are immune to failures, but, in general terms, ANVISA adopts criteria that are well-founded in contemporary scientific methodology, requiring studies that corroborate the efficiency of the active ingredients present in the medicines, which need to be rigorously tested, following appropriate protocols, first on guinea pigs and then on sufficiently numerous and diverse samples of humans, in studies conducted by independent researchers, producing statistically valid results.
In general, when it is assumed that a certain active ingredient can produce a certain effect in the treatment of a certain disease, a double-blind experiment is carried out, in which a sample of people is divided into equivalent groups: one group receives the active ingredient and the other receives a placebo. The evolution of the disease in these two groups is monitored and the efficiency of the active ingredient is verified in comparison to the placebo, and this contrast is measured by appropriate statistical tools.
Without meeting these requirements, a drug cannot be put into circulation, as there would be no reasonable reason to believe that the alleged drug could produce any effect beyond that which would be obtained by a placebo.
The Federal Council of Psychology adopts similar procedures to select the psychometric instruments that are approved for use in clinics and companies.
These procedures may not be 100% accurate, in the sense that they are not capable of guaranteeing that all approved drugs and treatments are effective, nor are they capable of guaranteeing that 100% of those not approved are ineffective. But at least they are correct in more than 90%. Therefore, this practice contributes greatly to separating the wheat from the chaff, through fair and scientifically recognized criteria.
2. CAPITAL MARKET REGULATORY ENTITIES.
(...)
The Financial Market is permeated with products of the lowest quality, offered without any restrictions, and consumed inadvertently in alarming quantities. These are magazines, books, courses, handouts, lectures, robots, indicators, strategies, etc., none of which presents the slightest evidence that they are capable of generating profits consistently. In fact, the vast majority present clear evidence that they generate losses.
Between 2006 and 2007, more than 500 quantitative strategies that are widely recommended in books, courses, brokerage websites, specialized magazines and other sources were examined[2]. To this end, these strategies were automated and executed on historical tick-by-tick series of EURUSD between April 2000 and October 2006.
Each strategy had its parameters optimized with the help of a genetic algorithm in the period from 4/20/2000 to 12/31/2002 and, then, the champion genotype among 10,000 generations in each strategy was put to operate in the period from 1/1/2003 to 10/1/2006. The result was that all the strategies tested were negative.
In 2014, new tests were carried out, with historical series from 1986 to 2014, and more than 1000 strategies were tested, with optimization between 12/4/1986 and 12/31/2005. Then, the champion genotype of each optimization was tested between 1/1/2006 and 3/1/2014. All were negative.
Considering that each strategy was tested with 10,000 different configurations, and the best configuration was used in the following period, then the 1,000 best among 10,000,000 configurations were tested and all were negative.
This provides an idea of the rarity of truly profitable strategies and also shows that all of the most consecrated and extolled strategies in books and courses are losers, but continue to be offered and consumed by millions of people.
In the case of medicines, if an inefficient product eventually enters the market, it is quickly reported, investigated and resolved. But in the area of investments, the market is so shamefully saturated with inefficient products that this has come to be accepted as “normal”. To get a quantitative idea of the problem, one can analyze an online fund ranking[3]
When consulting this ranking, at the end of 2014, there were 383 funds listed. By the end of 2015, several had gone bankrupt and the list was reduced to 241 funds, of which only 27 were nominally positive in the last 24 months. Of these 27, only 4 funds had matched inflation in the last 2 years and probably none of them had matched inflation in the last 3 years. Therefore, out of 383 funds, only 4 remained positive, in real terms, for 2 years, that is, 99% of negative funds. If the period considered were longer, the percentage of negative funds would be higher, since the few positive ones were generally benefiting from luck, and increasing the data sample, the luck factor would dissipate and reveal that even those few positive ones would not last for longer periods.
This situation causes a serious loss of confidence in investment products among the population, because the average citizen does not have the statistical and scientific knowledge necessary to distinguish between what is good and what is bad, and so ends up generalizing, as if all investment products belonged to the same class of extremely high-risk products, since the products they have come to know are nothing more than “water and flour”. This situation is very sad, because there are some really good investment products, but these rarely reach the general public.
What the average investor expects is for regulatory entities to filter the products that are made available on the Financial Market and monitor the legitimacy of the information disclosed by the suppliers of these products, in order to eliminate inefficient products from the “shelves”, as well as to ensure that the information about each product is properly documented and representative of the facts.
When a person goes to a pharmacy and buys a medicine, the probability that it is an ineffective substance is so low that the person practically ignores this risk and trusts the characteristics stated on the packaging, because they know that there is an official entity that filters the products made available to the public and supervises the accuracy of the information disclosed. Fraud is occasionally discovered in this sector, which is immediately reported and the offenders are punished, with irregular products being removed from circulation. For this reason, even with these isolated occurrences, there is still great confidence in the quality of health products.
On the other hand, a person who has had multiple negative experiences with investments, when faced with a new investment product that they are not yet familiar with, practically disregards all the information presented about the characteristics of this product, and assumes that it is as fraudulent as that of other investment products that they have already purchased and with which they have suffered substantial losses. There is no protective figure of an entity that the investor can trust, which looks after their safety, preventing the circulation of harmful products and inspecting the accuracy of the information disclosed.
As a result, the very few truly efficient products are penalized, as they get lost in the trash, and even if people find them, they end up not believing what they read, due to the trauma they suffered from their previous bad experiences, which led them to cultivate a very strong generalized rejection to protect themselves against the bad investment products that pollute the market.
In this context, not only the investor is harmed, but also the suppliers of good quality products, who are viewed with extreme distrust.
Another serious problem is the illusion of risk-free investments, such as real estate, government bonds or savings, which are among the favorites of less experienced investors, because in their eagerness to completely avoid risks, they fall into the trap of believing that these alternatives would protect them.
The fact is that there are no risk-free investments. As the poet Guimarães Rosa said, “living is a very dangerous business”. Anyone who knows a little about the history of defaults by the Brazilian government, with Eletrobrás bonds, public debt bonds and others, knows that the false security that is usually portrayed in these bonds actually hides a very high and incalculable risk of total loss. Real estate involves risks comparable to any other low-liquidity investment, therefore they are worse than stocks, and the confiscation of savings accounts in the 1990s was an exemplary lesson in dispelling the illusion of risk-free investment.
Regulatory entities should establish a standardized, objective and empirically validated scale to be adopted as a reference in risk classifications. Current risk classification methods fail so badly that almost all funds classified as low risk have been negative for 8 years, while some of the funds classified as high or very high risk (Sparta, Tempo Capital, etc.) have much smaller real losses than those classified as low risk, and some are even positive, such as the Verde fund.
How can a fund be classified as “low risk” and remain negative for 8 years? And how can a fund be classified as very high risk and remain positive for 23 years and never have been negative for more than 2 years? This shows how inefficient and distorted the scores used by banks and rating agencies are.
Such disparities between the risk classification and the result actually observed are frequent and indicate the imminent need to review the criteria for risk classification.
To make this situation even worse, even large financial institutions offer a huge variety of terrible investment products. Even banks, in which people tend to place more trust and almost blindly follow the recommendations of their managers, under the illusion that, as they are larger institutions, they should offer more solidly founded products, are crammed with some of the worst investment products on their menus.
Bad investments are distributed across all levels, affecting private, prime and retail clients, but the main victims are retail investors, who are generally less educated, less advised and, when they receive some advice, it is usually of low quality. As if that were not enough, they are also exposed exclusively to the worst products among which there are already many bad ones. Without understanding the concept of “opportunity cost”, without knowing the difference between “real profits” and “nominal profits”, without knowing how to calculate compound interest, these people are easily fooled by an avalanche of incorrect and distorted information.
In a report published in 2014 in the magazine InfoMoney[4], for example, the author comments that if a person invested R$105 million (MegaSena) in savings, they could use the income to buy an apartment per month, and he does the calculations to “prove” his statement using nominal profits! The real fact is that since the changes of 2012, savings have never managed to keep up with inflation and have generated real losses, so by investing in savings, a person would lose an apartment every six months, instead of gaining one per month.
If a specialized magazine publishes several articles with errors as basic as this one, what can we expect from the way most of the population analyzes investments? Magazines that should inform, guide and clarify, instead mislead the reader, feeding the illusion that savings allow for some gain, when in fact it only partially mitigates the loss produced by inflation.
Another point that requires attention is the criteria for certifying managers, which involve a course lasting a few days and then a test whose content does not have the slightest possibility of assessing the aspiring manager's ability to perform his or her role. As a result, most professional managers, certified by regulatory entities and hired by large banks and brokerages, perform worse than chimpanzees[1]. This does not mean that chimpanzees are more capable than human managers. This simply reflects the immense difficulty of the problem of modeling the Financial Market, to the point that educated and intelligent people who have dedicated decades to studying the subject cannot achieve better results than random purchases and sales.
The use of the chimpanzee in the experiment published in the Wall Street Journal served to dramatize and satirize the situation, but what is intended to show is that the results obtained by professional managers are indistinguishable from those that could be obtained by randomly selecting the components of the portfolio.
This fact should cause great concern and embarrassment to the US regulatory bodies that certify these managers and recognize the validity of these certificates.
Furthermore, one should not think that this is a problem exclusive to the US. Brazil is no better. The number of Brazilian managers with positive performances over the last 15 years and who have continued to perform well over the last 5 years, in real terms (discounting inflation), can be counted on the fingers of one hand. However, the number of people authorized and accredited to market investment products and services, including purchase and sale recommendations and portfolio management, is thousands of times greater.
3. LOGIC AND MATHEMATICS OF THE CHIMPANZEE PHENOMENON.
To get a better idea of why the chimpanzee performed better than the average human manager, suppose an exam consisting of 100 multiple-choice items, with 2 alternatives for each item. Let's say that the difficulty level of the items is compatible with the skill level of the people being examined. Then it is expected that the scores obtained by the people will be a combination of the skill level and the luck factor.
A person who can answer 60 of the 100 items correctly should get these 60 right plus part of the other 40. This part of the 40 will be defined by a binomial distribution, whose peak frequency will be 20, so the most likely result is that in total he will get 20 + 60 = 80 right. Since each item has 2 alternatives, there is therefore a 50% chance of getting it right by luck.
This does not mean that all people who manage to solve 60 will score exactly 80, because although the 60 “guaranteed” points are fixed among all people who know 60 of the answers, there is a part that varies with luck, which are the 40 “guessed” items. Therefore, some of the people who manage to solve 60 will score 79 points, others will score 78, 77, 81, 82, etc.
Similarly, people who know 56 answers should get these 56 right plus half of the 44 guessed, that is, they are expected to have an average score of 78. However, some who manage to solve 56 may have a little more luck and get 25 right out of the 44 guessed, scoring a total of 81, and, with that, some who manage to solve 56 are above the average of those who manage to solve 60.
Now imagine a test so difficult that the people examined cannot get any question right. It could be a test from the International Mathematical Olympiad, for example, or the last 10 items of the Sigma Test. In a situation like this, it is said that the test is not suitable for discriminating between the different skill levels of the people being examined, because both the people with the highest skill level and those with the lowest skill level would have a score of 0 if the test were discursive.
Since the test is multiple choice, instead of everyone scoring zero, the scores generated by random guesses will have a binomial distribution, but these scores do not represent skill levels. They are simply the result of statistical fluctuations and will be distributed randomly. This is exactly the situation with the chimpanzee and the 100 professional managers. Therefore, although managers are much more skilled than chimpanzees, and some managers are much more competent than others, the difficulty level of the “test” is so high that if there were no multiple choice (up/down), all of them, including the most skilled, would have a score of 0.
Since the prices can move in basically two groups of directions (up or down), and the probabilities are almost equivalent, the choices are like a test with 2 alternatives for each item. The extremely high difficulty in modeling the market means that all managers, including the most skilled, are unable to make any correct decisions, and this “flattens” all scores to the minimum level, which is the level defined by luck, so the differences in performance are the result of chance and do not reflect any relevant information about the ability of these managers. Therefore, no statistically significant difference was observed between the chimpanzee and the average of the 100 human managers.
4. SCOPE OF THE PROBLEM.
The chimpanzee study covers a small part of a much larger problem. Some investment professors claim to have “prepared” more than 100,000 students. The question is: prepared for what?
If just 1 professor spreads his superstitions among 100,000 unsuspecting people, what is the total number of victims? If you add up the students of all the professors, it must total somewhere between 200,000 and 300,000 (not much more due to redundancy, since many students participate in courses with several professors).
This is an alarming number, because if there are less than 10 people in Brazil who can actually make money with financial operations consistently, and 300,000 who buy some “educational” investment product in the expectation that they will become capable of making a profit using what they have learned, something is not right.
There is no 1:1 ratio between those who enroll in an Engineering, Medical or Law school and those who actually finish their degree, because a large proportion drop out without completing it. According to a report published in 2013[5], 44% of students enrolled in Engineering schools completed their degree. Engineering is considered one of the most difficult courses and has the highest dropout rate, yet almost 50% stayed until they finished.
Let's say that not everyone who finishes an Engineering course is sufficiently qualified to diligently perform the activities expected of their profession. In a very pessimistic estimate, we can assume that only 10% become successful professionals, in the sense that they are able to earn money practicing their profession.
Now let's compare this to participants in investment courses: 0.003% manage to produce positive results and generate some profit from what they learned. Why is it that 99.997% of participants in these courses fail to make money? Are the students missing something? Or is it that the course content does not provide enough support for them to be able to make a profit? To understand this phenomenon, we can analyze the results of an experiment carried out in the 1980s by one of the world's greatest traders.
In 1983, Richard Dennis made a bet with William Eckhardt about whether it was possible to train anyone to become a professional trader[6]. Dennis argued that it was possible, while Eckhardt argued that it was necessary to have specific innate skills, honed with training.
Dennis published an advertisement in the Wall Street Journal, recruiting candidates for training. About 15,000 people signed up. He selected them by written exam, then interviewed the candidates who obtained the best results on the exam and, in the end, selected 23 people, to whom he offered training for 2 weeks and then put them to trade. This group became known worldwide as the “Turtle Traders”.
In the second half of the 1980s, with the stock market boom, the turtles made a lot of money and enjoyed temporary success, but it soon became clear that they lacked long-term consistency. Of the 23 turtles, only 1 had long-term success: Jerry Parker, who is still active today, with an average performance of 12.17% per year since 1988, in his Chesapeake Capital fund[7].
Incredibly, Dennis thought he won the bet, and several people on websites, blogs and forums cite this story as if Dennis had won. However, it is clear that Dennis was wrong, while Eckhardt's opinion was much closer to reality. Considering that the newspaper ad reached millions of people, among whom there was a self-selection that led to 15,000 becoming interested, and of these 15,000 he selected the 23 he found most qualified and, after training them, only 1 of them was successful, then the result clearly suggests that about 1 in 1,000,000 people are able to be positive in the Market consistently in the long term.
In order for Dennis to be able to correctly test his thesis that anyone can win in the Market, as long as they receive adequate training, he would have to randomly select 100 people from the street. Not even the telephone directory would be valid, because in 1984 the possession of a telephone already represented a more select public and the sample would be biased.
In addition, it is likely that the 2-week “training” was almost irrelevant. Parker would probably have been just as successful if he had not received such training (perhaps it would have taken him longer to learn on his own).
What can be inferred from this? The facts are quite clear: Richard Dennis, one of the best traders in the world, with proven results for decades, selected some people who, in his opinion, had the most important attributes to have any chance of success in the Market, and trained them personally. Even so, only one of them was successful, and not very significantly (average annual profit of 12%).
This is part of the answer to the problem. It shows that even with an exceptional teacher, it is extremely difficult to learn to operate in a way that generates consistent profits. Therefore, even if the courses were offered by competent traders and they taught something useful, only a very small portion of the population would have the necessary attributes to be able to benefit from the teachings, while approximately 99.999% of people would waste time and money on the courses and more money trying to use what they believed they had learned.
But in the real scenario it is worse, because it is not Richard Dennis who offers the courses. Practically all sellers of courses, books, lectures, etc. are negative in the market. If teachers themselves are persistently losing, what is the probability that these people's students will be able to learn something useful and make a profit?
Since there is a 50% chance of winning by chance, it is clear that luck ends up contributing a lot to deceiving the masses, and that is precisely why filtering should be carried out by a regulatory entity, since the population does not have the knowledge to do so.
In addition to managers who operate with results worse than those of chimpanzees and course sellers who are like water and flour, the problem extends much further. In recent years, robots have been in fashion. People think that all they need to do is use cutting-edge technology, empty of content, to have some success.
Although robots are in the only group that has some possibility of enabling scientific investigations into the quality of the strategies used, what we observe in practice is that there is no seriousness whatsoever on the part of the developers. Some really do not know the basics about the studies that would need to be carried out to validate a strategy. Others may know, but they disguise the results in different ways.
(...)
Both Buffett in 1956 and Soros in 1968 would not have been approved as managers or for any of the other roles listed above based on these criteria. This led our friend J.A.L.J. to suggest that the criteria be changed to make them compatible with the approval of these legendary managers.
This creates some difficulties because, although Soros and Buffett were notable at that time, at the beginning of their careers there were not enough elements to predict the success they would achieve. These elements only became known as their gains began to materialize. Therefore, although a retrospective assessment might suggest that Buffett and Soros should be approved, this is a post facto assessment.
For example, in 1905, Einstein was as brilliant as he would be in the following decades, but he was still not recognized. He was an anonymous figure lost in the crowd, and he was not taken seriously. Even his “friend” Marcel Grossmann did not want to “waste” his precious time helping Einstein with the mathematics needed to formalize the General Theory of Relativity. Instead of dealing with the mathematical part, Grossmann simply made some bibliographical suggestions to Einstein, so that, if Einstein wished, he could learn about tensors himself and do the work.
With this, Grossmann entered history as a semi-anonymous, with a small figurative role, and in a letter from Einstein to Lorentz, the brilliant physicist suggested that he would have mentioned Grossman as a co-author of the Theory of Relativity, if Grossman had helped him with the formalization and calculations. When Einstein finally achieved the necessary mastery of the mathematical tools he needed, he was able to revise some of his 1911 calculations and make new predictions about the angle of deflection of light as it passes near the Sun, and it was only in 1919, with the eclipse observed in Sobral e Príncipe, that his work was recognized.
Who could be blamed for not recognizing Einstein's merits in 1905? There were several other attempts to explain the results obtained by Michelson and Morley, which assumed different premises and arrived at different results. Einstein's theory was just one of them. To know which of the theories would find better support in the experimental data, it would be necessary to test it through observation and measurement. Thus, several years passed until the tests were carried out and Einstein's hypotheses were experimentally verified and gained the status of "theory".
With Buffett and Soros the situation is similar. Today we know that they are extraordinary managers, just as they were since 1956 and 1968, but at that time there was not enough data to know. The most that could be done would be to apply tests that could predict these results. The tests at the time proved to be extremely inefficient, because in addition to both of them failing, several other less qualified managers passed, and almost all of those who passed did not present results better than those of a chimpanzee.
A test that would enable the correct identification of Buffett and Soros' potential and, at the same time, would correctly filter out the vast majority of insufficiently qualified managers, would be a kind of fundamentalist backtest. This will be the subject of a future article.
Last edited: