<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Montecarlo Methods &#8211; The Financial Hacker</title>
	<atom:link href="https://financial-hacker.com/tag/montecarlo-methods/feed/" rel="self" type="application/rss+xml" />
	<link>https://financial-hacker.com</link>
	<description>A new view on algorithmic trading</description>
	<lastBuildDate>Tue, 05 Apr 2022 13:26:46 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://financial-hacker.com/wp-content/uploads/2017/07/cropped-mask-32x32.jpg</url>
	<title>Montecarlo Methods &#8211; The Financial Hacker</title>
	<link>https://financial-hacker.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Why 90% of Backtests Fail</title>
		<link>https://financial-hacker.com/why-90-of-backtests-fail/</link>
					<comments>https://financial-hacker.com/why-90-of-backtests-fail/#comments</comments>
		
		<dc:creator><![CDATA[jcl]]></dc:creator>
		<pubDate>Mon, 04 Apr 2022 15:55:47 +0000</pubDate>
				<category><![CDATA[System Development]]></category>
		<category><![CDATA[System Evaluation]]></category>
		<category><![CDATA[Backtest]]></category>
		<category><![CDATA[Montecarlo Methods]]></category>
		<category><![CDATA[SPY]]></category>
		<category><![CDATA[Walk forward analysis]]></category>
		<category><![CDATA[White's reality check]]></category>
		<guid isPermaLink="false">https://financial-hacker.com/?p=4373</guid>

					<description><![CDATA[About 9 out of 10 backtests produce wrong or misleading results. This is the number one reason why carefully developed algorithmic trading systems often fail in live trading. Even with out-of-sample data and even with cross-validation or walk-forward analysis, backtest results are often way off to the optimistic side. The majority of trading systems with &#8230; <a href="https://financial-hacker.com/why-90-of-backtests-fail/" class="more-link">Continue reading<span class="screen-reader-text"> "Why 90% of Backtests Fail"</span></a>]]></description>
										<content:encoded><![CDATA[<p>About 9 out of 10 backtests produce wrong or misleading results. This is the number one reason why carefully developed algorithmic trading systems often fail in live trading. Even with out-of-sample data and even with cross-validation or walk-forward analysis, backtest results are often way off to the optimistic side. The majority of trading systems with a positive backtest are in fact unprofitable. In this article I&#8217;ll discuss the cause of this phenomenon, and how to fix it.<span id="more-4373"></span></p>
<p>Suppose you&#8217;re developing an <a href="https://zorro-project.com/algotrading.php" target="_blank" rel="noopener">algorithmic trading strategy</a>, following all rules of proper <a href="https://financial-hacker.com/build-better-strategies-part-3-the-development-process/">system development</a>. But you are not aware that your trading algorithm has no statistical edge. The strategy is worthless, the trading rules equivalent to random trading, the profit expectancy – aside from transaction costs – is zero. The problem: you will rarely get a zero result in a backtest. A random trading strategy will in 50% of cases produce a negative backtest result, in 50% a positive result. But if the result is negative, you&#8217;re normally tempted to tweak the code or select assets and time frames until you finally got a profitable backtest. Which will happen relatively soon even when applying random modifications to the system. That&#8217;s why there are so many unprofitable strategies around, with nevertheless great backtest performances.</p>
<p>Does this mean that backtests are worthless? Not at all. But it is essential to know whether you can trust the test, or not.</p>
<h3><strong>The test-the-backtest experiment<br />
</strong></h3>
<p>There are several methods for verifying a backtest. None of them is perfect, but all give insights from different viewpoints. We&#8217;ll use the <a href="https://zorro-project.com" target="_blank" rel="noopener">Zorro algo trading software</a>, and run our experiments with the following test system that is optimized and backtested with walk-forward analysis:</p>
<pre class="prettyprint">function run()
{
  set(PARAMETERS,TESTNOW,PLOTNOW,LOGFILE);
  BarPeriod = 1440;
  LookBack = 100;
  StartDate = 2012;
  NumWFOCycles = 10;

  assetList("AssetsIB");
  asset("SPY");

  vars Signals = series(LowPass(seriesC(),optimize(10,2,20,2)));
  vars MMIFast = series(MMI(seriesC(),optimize(50,40,60,5)));
  vars MMISlow = series(LowPass(MMIFast,100));

  MaxLong = 1;
  if(falling(MMISlow)) {
    if(valley(Signals))
      enterLong();
    else if(peak(Signals))
      exitLong();
  }
}</pre>
<p>This is a classic trend following algorithm. It uses a lowpass filter for trading at the peaks and valleys of the smoothed price curve, and a MMI filter (<a href="https://financial-hacker.com/the-market-meanness-index/">Market Meanness Index</a>) for distinguishing trending from non-trending market periods. It only trades when the market has switched to rend regime, which is essential for profitable trend following systems. It opens only long positions. Lowpass and MMI filter periods are <a href="https://zorro-project.com/manual/en/optimize.htm" target="_blank" rel="noopener">optimized</a>, and the backtest is a <a href="https://zorro-project.com/manual/en/numwfocycles.htm" target="_blank" rel="noopener">walk-forward analysis</a> with 10 cycles.</p>
<h3><strong>The placebo trading system<br />
</strong></h3>
<p>It is standard for experiments to compare the real stuff with a placebo. For this we&#8217;re using a trading system that has obviously no edge, but was tweaked with the evil intention to appear profitable in a walk-forward analysis. This is our placebo system:</p>
<pre class="prettyprint">void run()
{
  set(PARAMETERS,TESTNOW,PLOTNOW,LOGFILE);
  BarPeriod = 1440;
  StartDate = 2012;
  setf(TrainMode,BRUTE);
  NumWFOCycles = 9;

  assetList("AssetsIB");
  asset("SPY");

  int Pause = optimize(5,1,15,1);
  LifeTime = optimize(5,1,15,1);

// trade after a pause...
  static int NextEntry;
  if(Init) NextEntry = 0;
  if(NextEntry-- &lt;= 0) {
    NextEntry = LifeTime+Pause;
    enterLong();
  }
}</pre>
<p>This system opens a position, keeps it a while, then closes it and pauses for a while. The trade and pause durations are walk-forward optimized between 1 day and 3 weeks. <a href="https://zorro-project.com/manual/en/timewait.ht" target="_blank" rel="noopener">LifeTime</a> is a predefined variable that closes the position after the given time. If you don&#8217;t believe in lucky trade patterns, you can rightfully assume that this system is equivalent to random trading. Let&#8217;s see how it fares in comparison to the trend trading system.</p>
<h3><strong>Trend trading vs. placebo trading<br />
</strong></h3>
<p>This is the equity curve with the trend trading system from a walk forward analysis from 2012 up to 3/2022:</p>
<p><img decoding="async" src="https://financial-hacker.com/wp-content/uploads/2022/04/040422_1441_Why90ofBack1.png" alt="" /></p>
<p>The plot begins 2015 because the preceding 3 years are used for the training and lookback periods. SPY follows the S&amp;P500 index and rises in the long term, so we could expect anyway some profit with a long-only system. But this system, with profit factor 3 and R2 coefficient 0.65 appears a lot better than random trading. Let&#8217;s compare it with the placebo system:</p>
<p><img decoding="async" src="https://financial-hacker.com/wp-content/uploads/2022/04/040422_1441_Why90ofBack2.png" alt="" /></p>
<p>The placebo system produced profit factor 2 and R2 coefficient 0.77. Slightly less than the real system, but in the same performance range. And this result was also from a walk-forward analysis, although with 9 cycles &#8211; therefore the later start of the test period. Aside from that, it seems impossible to determine solely from the equity curve and performance data which system is for real, and which is a placebo.</p>
<h3><strong>Checking the reality<br />
</strong></h3>
<p>Methods to verify backtest results are named &#8216;reality check&#8217;. They are specific to the asset and algorithm; in a multi-asset, multi-algo portfolio, you need to enable only the component you want to test. Let&#8217;s first see how the WFO split affects the backtest. In this way we can find out whether our backtest result was just due to lucky trading in a particular WFO cycle. We&#8217;re going to plot a <strong>WFO profile</strong> that displays the effect of the number of walk-forward cycles on the result. For this we outcomment the <strong>NumWFOCycles = …</strong> line in the code, and run it in training mode with the <strong>WFOProfile.c</strong> script:</p>
<pre class="prettyprint">#define run strategy
#include "trend.c" // &lt;= your script
#undef run
#define CYCLES 20 // max WFO cycles

function run()
{
  set(TESTNOW);
  NumTotalCycles = CYCLES-1;
  NumWFOCycles = TotalCycle+1;
  strategy();
}

function evaluate()
{
  var Perf = ifelse(LossTotal &gt; 0,WinTotal/LossTotal,10);
  if(Perf &gt; 1)
    plotBar("WFO+",NumWFOCycles,NumWFOCycles,Perf,BARS,BLACK);
  else
    plotBar("WFO-",NumWFOCycles,NumWFOCycles,Perf,BARS,RED);
}</pre>
<p>We&#8217;re redefining the <strong>run</strong> function to a different name. This allows us to just include the tested script and train it with WFO cycles from 2 up to the number defined by CYCLES. A backtest is executed after training. If an <strong>evaluate</strong> function is present, Zorro runs it automatically after any backtest. It plots a histogram bar of the profit factor (y axis) from each number of WFO cycles. First, the WFO profile of the trend trading system:</p>
<p><img decoding="async" src="https://financial-hacker.com/wp-content/uploads/2022/04/040422_1441_Why90ofBack3.png" alt="" /></p>
<p>We can see that the performance rises with the number of cycles. This is typical for a system that adapts to the market. All results are positive with a profit factor &gt; 1. Our arbitrary choice of 10 cycles produced a less than average result. So we can at least be sure that this backtest result was not caused by a particularly lucky number of WFO cycles.</p>
<p>The WFO profile of the placebo system:</p>
<p><img decoding="async" src="https://financial-hacker.com/wp-content/uploads/2022/04/040422_1441_Why90ofBack4.png" alt="" /></p>
<p>This time the number of WFO cycles had a strong random effect on the performance. And it is now obvious why I used 9 WFO cycles for that system. For the same reason I used brute force optimization, since it increases WFO variance and thus the chance to get lucky WFO cycle numbers. That&#8217;s the opposite of what we normally do when developing algorithmic trading strategies.</p>
<p>WFO profiles give insight into WFO cycle dependency, but not into randomness or overfitting by other means. For this, more in-depth tests are required. Zorro supports two methods, the Montecarlo Reality Check (MRC) with randomized price curves, and <a href="https://financial-hacker.com/whites-reality-check/">White&#8217;s Reality Check</a> (WRC) with detrended and bootstrapped equity curves of strategy variants. Both methods have their advantages and disadvantages. But since strategy variants from optimizing can only be created without walk-forward analysis, we&#8217;re using the MRC here.</p>
<h3><strong>The Montecarlo Reality Check<br />
</strong></h3>
<p>First we test both systems with random price curves. Randomizing removes short-term price correlations and market inefficiencies, but keeps the long-term trend. Then we compare our original backtest result with the randomized results. This yields a <strong>p-value</strong>, a metric of the probability that our test result was caused by randomness. The lower the p-Value, the more confidence we can have in the backtest result. In statistics we normally consider a result significant when its p-Value is below 5%.</p>
<p>The basic algorithm of the Montecarlo Reality Check (MRC):</p>
<ol>
<li>Train your system and run a backtest. Store the profit factor (or any other <a href="https://zorro-project.com/manual/en/performance.htm" target="_blank" rel="noopener">performance metric</a> that you want to compare).</li>
<li><a href="https://zorro-project.com/manual/en/detrend.htm" target="_blank" rel="noopener">Randomize</a> the price curve by randomly swapping price changes (shuffle without replacement).</li>
<li>Train your system again with the randomized data and run a backtest. Store the performance metric.</li>
<li>Repeat steps 2 and 3 1000 times.</li>
<li>Determine the number N of randomized tests that have a better result than the original test. The p-Value is N/1000.</li>
</ol>
<p>If our backtest result was affected by an overall upwards trending price curve, which is certainly the case for this SPY system, the randomized tests will be likewise affected. The MRC code:</p>
<pre class="prettyprint">#define run strategy
#include "trend.c" // &lt;= your script
#undef run
#define CYCLES 1000

function run()
{
  set(PRELOAD,TESTNOW);
  NumTotalCycles = CYCLES;
  if(TotalCycle == 1) // first cycle = original
    seed(12345); // always same random sequence
  else
    Detrend = SHUFFLE;
  strategy();
  set(LOGFILE|OFF); // don't export files
}

function evaluate()
{
  static var OriginalProfit, Probability;
  var PF = ifelse(LossTotal &gt; 0,WinTotal/LossTotal,10);
  if(TotalCycle == 1) {
    OriginalProfit = PF;
    Probability = 0;
  } else {
    if(PF &lt; 2*OriginalProfit) // clip image at double range
      plotHistogram("Random",PF,OriginalProfit/50,1,RED);
    if(PF &gt; OriginalProfit)
      Probability += 100./NumTotalCycles;
  }
  if(TotalCycle == NumTotalCycles) { // last cycle
    plotHistogram("Original",
     OriginalProfit,OriginalProfit/50,sqrt(NumTotalCycles),BLACK);
    printf("\n-------------------------------------------");
    printf("\nP-Value %.1f%%",Probability);
    printf("\nResult is ");
    if(Probability &lt;= 1)
      printf("highly significant") ;
    else if(Probability &lt;= 5)
      printf("significant");
    else if(Probability &lt;= 15)
      printf("maybe significant");
    else
      printf("statistically insignificant");
    printf("\n-------------------------------------------");
  }
}</pre>
<p>This code sets up the Zorro platform to train and test the system 1000 times. The <a href="https://zorro-project.com/manual/en/random.htm" target="_blank" rel="noopener"><strong>seed</strong></a> setting ensures that you get the same result on any MRC run. From the second cycle on, the historical data is shuffled without replacement. For calculating the p-value and plotting a histogram of the MRC, we use the <strong>evaluate</strong> function again. It calculates the p-value by counting the backtests resulting in higher profit factors than the original system. Depending on the system, training and testing the strategy a thousand times will take several minutes with Zorro. The resulting MRC histogram of the trend following system:</p>
<p><img decoding="async" src="https://financial-hacker.com/wp-content/uploads/2022/04/040422_1441_Why90ofBack5.png" alt="" /></p>
<p>The height of a red bar represents the number of shuffled backtests that ended at the profit factor shown on the x axis. The black bar on the right (height is irrelevant, only the x axis position matters) is the profit factor with the original price curve. We can see that most shuffled tests came out positive, due to the long-term upwards trend of the SPY price. But our test system came out even more positive. The p-Value is below 1%, meaning a high significance of our backtest. This gives us some confidence that the simple trend follower can achieve a similar result in real trading.</p>
<p>This cannot be said from the MRC histogram of the placebo system:</p>
<p><img decoding="async" src="https://financial-hacker.com/wp-content/uploads/2022/04/040422_1441_Why90ofBack6.png" alt="" /></p>
<p>The backtest profit factors now extend over a wider range, and many were more profitable than the original system. The backtest with the real price curve is indistinguishable from the randomized tests, with a p-value in the 40% area. The original backtest result of the placebo system, even though achieved with walk-forward analysis, is therefore meaningless.</p>
<p>It should be mentioned that the MRC cannot detect all invalid backtests. A system that was explicitly fitted to a particular price curve, for instance by knowing in advance its peaks and valleys, would get a low p-value by the MRC. No reality check could distinguish such a system from a system with a real edge. Therefore, neither MRC nor WRC can give absolute guarantee that a system works when it passes the check. But when it does not pass, you&#8217;re advised to better not trade it with real money.</p>
<p>I have uploaded the strategies to the 2022 script repository. The MRC and WFOProfile scripts are included in Zorro version 2.47.4 and above. You will need Zorro S for the brute force optimization of the placebo system.</p>
]]></content:encoded>
					
					<wfw:commentRss>https://financial-hacker.com/why-90-of-backtests-fail/feed/</wfw:commentRss>
			<slash:comments>22</slash:comments>
		
		
			</item>
	</channel>
</rss>
