Zorro – The Financial Hacker

Build Better Strategies, Part 6: Evaluation

jcl — Thu, 05 Feb 2026 16:54:41 +0000

Developing a successful strategy is a process with many steps, described in the Build Better Strategies article series. At some point you have coded a first, raw version of the strategy. At that stage you’re usually experimenting with different functions for market detection or trade signals. The problem: How can you determine which indicator, filter, or machine learning method works best with which markets and which time frames? Manually testing all combinations is very time consuming, close to impossible. Here’s a way to run that process automated with a single mouse click.

A robust trading strategy has to meet several criteria:

It must exploit a real and significant market inefficiency. Random-walk markets cannot be algo traded.
It must work in all market situations. A trend follower must survive a mean reverting regime.
It must work under many different optimization settings and parameter ranges.
It must be unaffected by random events and price fluctuations.

There are metrics and algorithms to test all this. The robustness under different market situations can be determined through the R2 coefficient or the deviations between the walk forward cycles. The parameter range robustness can be tested with a WFO profile (aka cluster analysis), the price fluctuation robustness with oversampling. A Montecarlo analysis finds out whether the strategy is based on a real market inefficiency.

Some platforms, such as Zorro, have functions for all this. But they require dedicated code in the strategy, often more than for the algorithm itself. In this article I’m going to describe an evaluation framework – a ‘shell’ – that skips the coding part. The evaluation shell is included in the latest Zorro version. It can be simply attached to any strategy script. It makes all strategy variables accessible in a panel and adds stuff that’s common to all strategies – optimization, money management, support for multiple assets and algos, cluster and montecarlo analysis. It evaluates all strategy variants in an automated process and builds the optimal portfolio of combinations from different algorithms, assets, and timeframes.

The process involves these steps:

The first step of strategy evaluation is generating sets of parameter settings, named jobs. Any job is a variant of the strategy that you want to test and possibly include in the final portfolio. Parameters can be switches that select between different indicators, or variables (such as timeframes) with optimization ranges. All parameters can be edited in the user interface of the shell, then saved with a mouse click as a job.

The next step is an automated process that runs through all previously stored jobs, trains and tests any of them with different asset, algo, and time frame combinations, and stores their results in a summary. The summary is a CSV list with the performance metrics of all jobs. It is automatically sorted – the best performing job variants are at the top – and looks like this:

So you can see at a glance which parameter combinations work with which assets and time frames, and which are not worth to examine further. You can repeat this step with different global settings, such as bar period or optimization method, and generate multiple summaries in this way.

The next step in the process is cluster analysis. Every job in a selected summary is optimized multiple times with different walk-forward settings. The result with any job variant is stored in WFO profiles or heatmaps:

After this process, you likely ended up with a couple survivors in the top of the summary. The surviving jobs have all a positive return, a steady rising equity curve, shallow drawdowns, and robust parameter ranges since they passed the cluster analysis. But any selection process generates selection bias. Your perfect portfolio will likely produce a great backtest, but will it perform equally well in live trading? To find out, you run a Montecarlo analysis, aka ‘Reality Check’.

This is the most important test of all, since it can determine whether your strategy exploits a real market inefficiency. If the Montecarlo analysis fails with the final portfolio, it will likely also fail with any other parameter combination, so you need to run it only close to the end. If your system passes Montecarlo with a p-value below 5%, you can be relatively confident that the system will return good and steady profit in live trading. Otherwise, back to the drawing board.

The use case

For a real life use case, we generated algorithms for the Z12 system that comes with Zorro. Z12 is a portfolio from several trend and counter trend algorithms that all trade simultaneously. The trading signals are generated with spectral analysis filters. The system trades a subset of Forex pairs and index CFDs on a 4-hour timeframe. The timeframe was choosen for best performance, as were the traded Forex pairs and CFDs.

We used the evaluation shell to create new algos, not from a selected subset, but from all major Forex pairs and major index CFDs, with 3 different time frames from 60, 120, and 240 minutes. 29 algorithms passed the cluster and montecarlo analysis, of which the least correlated were put into the final portfolio. This is the equity curve of the new Z12 system:

Other performance parameters, such as Profit Factor, Sharpe ratio, Calmar Ratio, and R2 also improved by more than 30%. The annual return almost doubled, compared with the average of the previous years. Nothing in the basic Z12 algorithms has changed. Only new combinations of algos, assets, and timeframes are now traded.

The evaluation shell is included in Zorro version 3.01 or above. Usage and details are described under https://zorro-project.com/manual/en/shell.htm. Attaching the shell to a strategy is described under https://zorro-project.com/manual/en/shell2.htm.

Pimp your performance with key figures

jcl — Wed, 08 Jan 2025 13:34:40 +0000

Not all scripts we’re hired to write are trading strategies. Some are for data analysis or event prediction – for instance: Write me a script that calculates the likeliness of a stock market crash tomorrow. Some time ago a client ordered a script for improving the performance of their company. This remarkable script was very different to a trading system. Its algorithm can in fact improve companies, but also your personal performance. How does this work?

Just like the performance of a stock, the performance of a company is measured by numerical indicators. They are named key figures. Key figures play a major role in quality management, such as the ISO 9000 standards. An ISO 9000 key figure is a detail indicator, such as the number of faults in a production line, the number of bugs found in a beta test, the number of new clients acquired per day, the total of positive minus negative online reviews, an so on. Aside from being essential for an ISO 9000 certification, these key figures have two purposes:

They give detailed insight and expose strengths and weaknesses.
And they give a strong motivation for reaching a certain goal.

The script below opens a user interface for entering various sorts of key figures. It calculates an overall score that reflects the current performance of a company – or of a person – and displays it in a chart. If you use it not for a company, but for yourself, it helps improving your personal life. And from the short script you can see how to create a relatively complex software with relatively few lines of code.

Hackers like the concept of key figures. They are plain numbers that you can work with. Anyone can define key figures for herself. If you’re a writer, an important key figure is the number of words you’ve written today; if you’re an alcolohic, it’s the number of drinks you had today. Many self-improvement books tell you precisely what you need to do for living a healthier, wealthier, happier life – but they all suffer from the same problem: long-term motivation. If you lack the iron will to keep your daily exercises, reduce smoking, stay away from fast food, and so on – all good resolutions will eventually fall into oblivion. If you had resolutions for 2025, you’ll soon know what I mean.

Failure is less likely when you can observe your progress any day and see its immediately effect on your overall performance score. This score is a direct measure of your success in live. Whether you’re a company or a person, you want to keep this core rising. This feedback produces a strong motivation, every day again.

The above performance chart is plotted by the key figure management script in C for the Zorro platform. The red line is the overall score, derived from all key figures in a way explained below. The blue line is the key figure for which you just entered a new value (in the example it’s the number of large or small features implemented in the last 6 months in the Zorro platform). The X axis is the date in YYMMDD format.

Of course, key figures can be very different. Some may have a daily goal, some not, some shall sum up over time, others (like your bank account value) are just taken as they are. The idea is that your overall score rises when you exceed the daily goals, and goes down otherwise. All key figures and their parameters can be freely defined in a CSV file, which can be edited with a text editor or with Excel. It looks like this:

Name, Decimals, Unit, Offset, Growth, Sum Appr,1,0.1,0,0,0 Praise,0,10,0,-30,1 Weight,1,-0.1,-250,0.033,0 Duck,1,0.5,0,-0.5,1 Worth,0,1,-6000,0,0

In the first column you can assign a name. The second column is the number of decimals in the display, the third is their unit in the overall score, the fourth is an offset in the score, the fifth is the daily goal, and the last tells if the figures shall sum up over time (1) or not (0).

Example. Suppose you’re a president of a large country and want to pimp up your personal performance. What’s your key figures? First, of course, approval rate. Any tenth percent adds one point to your score. So the first entry is simple:

Name, Decimals, Unit, Offset, Growth, Sum
Appr, 1, 0.1, 0, 0, 0

Next, fame. Key figure is the daily number of praises on Fox News, OneAmerica, and Newsmax. You’ve ordered a White House department to count the praises; of course you’re personally counting them too, just in case. Less than 30 praises per day would be bad and reduce your score, more will improve it. So 30 praises are daily subtracted from your score. Any 10 further praises add one point. This is an accumulative key figure:

Praise, 0, 10, 0, -30, 1

Next, health. Key figure is weight. Your enemies spread rumors that you’re unfit and obese. Your doctors urge you to shed weight. So you want to lose one pound every month, which (you have your mathematicians for calculating difficult things) is about 0.033 per day. Any lost 0.1 pound adds one point to your score. The numbers are negative since you want your weight to go down, not up. The offset is your current weight.

Weight, 1, -0.1, -250, 0.033, 0

Next, literacy. Your enemies spread rumors you’re illiterate. To prove them wrong, you’ve decided to read at least half a page per day in a real book (you’ve chosen Duck for President to begin with). Any further half page adds one point to your score. This is also an accumulative figure.

Duck, 1, 0.5, 0, -0.5, 1

Finally, net worth. You’ve meanwhile learned to better avoid business attempts. Let your net worth grow due to the value increase of your inherited real estate, which is currently at 6 billion. Any million further growth adds one point to your score (numbers given in millions):

Worth, 0, 1, -6000, 0, 0

For improving your personal performance, download the script from the 2025 repository. Copy the files KeyFigures.csv and KeyFigures.c in your Strategy folder. Edit KeyFigures.csv for entering your personal key figures, as in the above example (you can later add or remove key figures and use Excel to add or remove the new columns to the data file). This is the script:

// Pimp Your Performance with Key Figures //////////////////////

string Rules = "Strategy\\KeyFigures.csv";
string Data = "Data\\KeyData.csv"; // key figures history
string Format = "0%d.%m.%Y,f1,f,f,f,f,f,f,f,f,f,f,f,f,f,f";
int Records,Fields;

var value(int Record,int Field,int Raw)
{
	var Units = dataVar(1,Field,2);
	var Offset = dataVar(1,Field,3);
	var Growth = dataVar(1,Field,4);
	var Value = 0;
	int i;
	for(i=0; i<=Record; i++) {
		if(dataVar(2,i,Field+1) < 0.) continue; // ignore negative entries
		Value += Growth;
		if(i == Record || (dataInt(1,Field,5)&1)) // Sum up? 
			Value += dataVar(2,i,Field+1)+Offset;
	}
	if(Raw) return Value-Offset;
	else return Value/Units;
}

var score(int Record)
{
	int i,Score = 0;
	for(i=0; i= 2) plotChart("");
}

void main()
{
	int i = 0, j = 0;
	printf("Today is %s",strdate("%A, %d.%m.%Y",NOW));
	ignore(62);
	PlotLabels = 5;
// File 1: Rules
	Fields = dataParse(1,"ssss,f1,f,f,f,i",Rules);
// File 2: Content
	Records = dataParse(2,Format,Data);
	int LastDate = dataVar(2,Records-1,0);
	int Today = wdate(NOW);
	if(LastDate < Today) { // no file or add new line
		dataAppendRow(2,16);
		for(i=1; i<=Fields; i++)
			if(!(dataInt(1,i-1,5)&1))
				dataSet(2,Records,i,dataVar(2,Records-1,i));
		Records++;
	}
	dataSet(2,Records-1,0,(var)Today);

// display in panel
	panel(Records+1,Fields+2,GREY,-58);
	panelFix(1,0);
	print(TO_PANEL,"Key Figures");
	for(i=0; i
The file locations and the CSV format of the key figures history are defined at the begin. The value function calculates the contribution of a particular key figure to the overall score. The score function updates the overall score. The click function, which is called when you enter a new value, calculates the score of that day, updates the spreadsheet, and prints the chart. The main function imports the data and key figures from their CSV files into datasets, prints the current day and displays a spreadsheet of your key figures and score history, like this:


 



 

You will need Zorro S because the spreadsheet function is not available in the free version. Start the script any morning. It will open the spreadsheet, where you can click in any of the white fields and enter a new key figure value for today. You can anytime enter new figures for today or for past days. At any entry, the score is calculated and – if the history spans more than 2 days – a chart is plotted as in the above example.


Normally, personal performance depends on about 5-10 key figures (maximum is 15). For instance, miles you’ve jogged today, steps walked, exercises done, pages read, words written, words learned in a new language, value of your bank account, value of your stock portfolio, burgers eaten, cigarettes smoked, enemies killed, or number of old ladies you helped crossing the street. If you’re a president, consider the script a free present (we hope for generous tax exceptions in exchange). If you’re an ISO 9000 certified company and want to use the script for your quality management, please contact oP group to pay your fee. For personal use, the script is free. Pimp your performance and make the world a better place!

Better Tests with Oversampling

jcl — Mon, 23 Nov 2015 14:14:52 +0000

The more data you use for testing or training your strategy, the less bias will affect the test result and the more accurate will be the training. The problem: price data is always in short supply. Even shorter since you must put aside some part for out-of-sample tests. Extending the test or training period far into the past is not always a solution. The markets of the 1990s or 1980s were very different from today, so their price data can cause misleading results.
In this article I’ll describe a simple method to produce more trades for testing and training from the same amount of price data. As a side effect, you’ll get an additional metric for the robustness of your strategy.

The price curve is normally divided into equal sections, named bars. Any bar has an associated candle with an open, close, high, low, and average price which are used by the system for detecting patterns and generating trade signals. When the raw price data has a higher time resolution than a bar, which is normally the case, the candle prices are sampled like this:

$Open ~=~ y_{t-dt}$
$High ~=~ max(y_{t-dt}~...~y_t)$
$Low ~=~ min(y_{t-dt}~...~y_t)$
$Close ~=~ y_t$
$Avg ~=~ 1/n \sum_{t-dt}^{t}{y_i}$

where y_t is the raw price tick at time t, and dt is the bar period. If we now subdivide the bar period into m partitions and resample the bars with the time shifted by dt/m, we can produce m slightly different price curves from the same high resolution curve:

$Open_j ~=~ y_{t-j/m dt-dt}$
$High_j ~=~ max(y_{t-j/m dt-dt}~...~y_{t-j/m dt})$
$Low_j ~=~ min(y_{t-j/m dt-dt}~...~y_{t-j/m dt})$
$Close_j ~=~ y_{t-j/m dt}$
$Avg_j ~=~ 1/n \sum_{t-j/m dt-dt}^{t-j/m dt}{y_i}$

m is the oversampling factor. The price curve index j runs from 0 to m-1. Any curve j has obviously the same properties as the original price curve, but has slightly different candles. Some candles can even be extremely different in volatile market situations. Testing a strategy will thus normally produce a different result on any curve.

Time series oversampling can also can also be used with price-based bars, such as Renko Bars or Range Bars, although the above equations then change to the composition formula of the specific bar type. I found that the profit factors of strategies can differ by up to 30% between oversampled price curves. A large variance in results hints that something with the strategy may be wrong – maybe it’s too sensitive to randomness and thus subject to improvement. A strategy that produces large losses on some curves should better be discarded, even if the overall result is positive. But you can safely assume that live training results are best represented by the worst of all the oversampled price curves.

Price action example

Time series oversampling is supported by the Zorro platform. This allows us to quickly check its pros and cons with example strategies. We’ll look into a simple price action strategy with candle patterns. This is a strategy of the data mining flavor. It is not based on a market model, since no good model can explain a predictive power of candle patterns (if you know one, please let me know too!). This trading method therefore has an irrational touch, although it’s said to have worked for Japanese rice traders 300 years ago, maybe due to trading habits or behavior patterns of large market participants. Whatever the reason: while trading the old rice candle patterns in today’s markets can not be really recommended, tests indeed hint at a weak and short-lived predictive power of 3-candle patterns in some currency pairs, such as EUR/USD. The emphasis is on short-lived: Trading habits change and thus predictive candle patterns expire within a few years, while new patterns emerge.

Here’s the Zorro script of such a strategy. In the training run it generates trading rules with 3-candle patterns that preceded profitable trades. In testing and live trading, a position is opened whenever the generated rule detects such a potentially profitable pattern. A walk forward test is used for curve fitting prevention, which is mandatory for all data mining systems:

function run()
{
  BarPeriod = 60; // 1-hour bars
  set(RULES+ALLCYCLES);
  NumYears = 10;
  NumWFOCycles = 10;

  if(Train) {
    Hedge = 2;	  // allow simultaneous long + short trades 
    Detrend = TRADES; // detrend on trade level
    MaxLong = MaxShort = 0; // no limit
  } else {
    Hedge = 1;	// long trade closes short and vice versa
    Detrend = 0;
    MaxLong = MaxShort = 1; // only 1 open position	
  }
	
  LifeTime = 3; // 3 hours trade time
  if(between(lhour(CET),9,13))  // European business hours
  {
    if(adviseLong(PATTERN+FAST+2,0, // train patterns with trade results
      priceHigh(2),priceLow(2),priceClose(2),
      priceHigh(1),priceLow(1),priceClose(1),
      priceHigh(1),priceLow(1),priceClose(1),
      priceHigh(0),priceLow(0),priceClose(0)) > 50)
        enterLong();	
			
    if(adviseShort(PATTERN+FAST+2) > 50)
      enterShort();
  }
}

The core of the script is the adviseLong/adviseShort call, Zorro’s machine learning function (details are better explained in the Zorro tutorial). The function is fed with patterns of 3 candles; the high, low, and close prices of adjacent candles are compared with each other (the open price is not used as it’s identical to the previous close in 24-hour traded assets). Training target is the return of a 3-hours trade after the appearance of a pattern. We’re using 3 hours trade time because the patterns consist of 3 bars, and it makes some sense to have a prediction horizon similar to the pattern length. Since we’re trading EUR/USD, we’re limiting the trades to European business hours. So the last trade must be entered at 13:00 for being closed at 16:00.

But when we train and test the above script with EUR/USD, we get no profitable strategy – at least not with realistic trading costs (an FXCM microlot account is simulated by default):

Price action without oversampling, P&L curve

We can see that the script seems to enter trades mostly at random, so the equity drops continuously at about the rate of the trading costs. The script performs a 10-years walk-forward test in 10 cycles. The default training/test split is 85%, so the test time is about 9 months after a training time of 4 years. 4 years are roughly equivalent to 4*250*24 = 24000 patterns to check. That’s apparently not enough to significantly distinguish profitable from random patterns.

The problem: We can not simply extend the training time. When we do that, we’ll find that the result does not get better. The reason is the limited pattern lifetime. It makes no sense to train past the half-life of the found patterns. So this is not the solution. But what happens when we train and test the same strategy with 4-fold oversampling?

NumSampleCycles = 4;

When we add this line to the script, the training process gets four times more patterns. Although many of them are similar, the amount of data is now enough to distinguish profitable from random patterns with some accuracy. We can see this in the now positive P&L curve over the likewise extended test periods:

Price action with 4-fold oversampling, P&L curve

Also you’ll now get an additional section in the performance report, like this:

Sample Cycles    Best    Worst    Avg  StdDev
Net Profit       5362$   4001$   4730$   4094$
Profit Factor     1.58    1.45    1.51    0.05
Num Trades        1256    1273    1240
Win Rate           41%     39%     40%

Large deviations between the sample cycles will tell you that your strategy is unstable against random price curve fluctuations.

I’ve added the above script to the 2015 repository. But although it generates some profit, be aware that it’s for demonstration only and no ‘industrial quality’. Sharpe ratio and R2 are not good, drawdowns are long, and essential ingredients such as stops, trailing, money management, portfolio diversification, filters, and DMB measurement are not included. So better don’t trade it live.

Conclusion

Admittedly the price action system is a drastic and somewhat dubious example of the benefits of oversampling. But I found that 4-fold or 6-fold oversampling improves optimization and training of almost all strategies, and also increases the quality of backtests by making them less susceptible to extreme candles and outliers.

Oversampling is certainly not the one-fits-all solution. It will not work when the system relies on a specific time for opening and closing trades, as for gap trading or for systems based on daily bars. And it does not help either when single candles have little effect on the result, for instance when trade signals are generated from moving averages with very long time periods. But in most cases it noticeably improves testing and trading.

The Cold Blood Index

jcl — Mon, 26 Oct 2015 12:50:51 +0000

You’ve developed a new trading system. All tests produced impressive results. So you started it live. And are down by $2000 after 2 months. Or you have a strategy that worked for 2 years, but revently went into a seemingly endless drawdown. Situations are all too familiar to any algo trader. What now? Carry on in cold blood, or pull the brakes in panic?
Several reasons can cause a strategy to lose money right from the start. It can be already expired since the market inefficiency disappeared. Or the system is worthless and the test falsified by some bias that survived all reality checks. Or it’s a normal drawdown that you just have to sit out. In this article I propose an algorithm for deciding very early whether or not to abandon a system in such a situation.

When you start a trading strategy, you’re almost always under water for some time. This is a normal consequence of equity curve volatility. It is the very reason why you need initial capital at all for trading (aside from covering margins and transaction costs). Here you can see the typical bumpy start of a trading system:

CHF grid trader, initial live equity curve

You can estimate from the live equity curve that this system was rather profitable (it was a grid trader exploiting the CHF price cap). It started in July 2013 and had earned about 750 pips in January 2014, 7 months later. Max drawdown was ~400 pips from September until November. So the raw return of that system was about 750/400 ~= 180%. Normally an excellent value for a trade system. But you can also see from the curve that you were down 200 pips about six weeks into trading, and thus had lost almost half of your minimum initial capital. And if you had started the system in September, you had even stayed under water for more than 3 months! This is a psychologically difficult situation. Many traders panic, pull out, and this way lose money even with highly profitable systems. Algo trading unaffected by emotions? Not true.

Not so out of sample

The basic problem: you can never fully trust your test results. No matter how out-of-sample you test it, a strategy still suffers from a certain amount of Data-Snooping Bias. The standard method of measuring bias – White’s Reality Check – works well for simple mechanically generated systems, as in the Trend Experiment. But all human decisions about algorithms, asset selection, filters, training targets, stop/takeprofit mechanisms, WFO windows, money management and so on add new bias, since they are normally affected by testing. The out-of-sample data is then not so out-of-sample anymore. While the bias by training or optimization can be measured and even eliminated with walk forward methods, the bias introduced by the mere development process is unknown. The strategy might still be profitable, or not anymore, or not at all. You can only find out by comparing live results permanently with test results.

You could do that with no risk by trading on a demo account. But if the system is really profitable, demo time is sacrificed profit and thus expensive. Often very expensive, as you must demo trade a long time for some result significancy, and many strategies have a limited lifetime anyway. So you normally demo trade a system only a few weeks for making sure that the script is bug-free, then you go live with real money.

Pull-out conditions

The simplest method of comparing live results is based on the maximum drawdown in the test. This is the pull-out inequality:

[pmath size=18]E ~<~ C + G t/y – D[/pmath]

E = Current account equity
C = Initial account capital
G = Test profit
t = Live trading period
y = Test period
D = Test maximum drawdown

This formula means simply that you should pull out when the live trading drawdown exceeds the maximum drawdown from the test. Traders often check their live results this way, but there are many problems involved with this method:

The maximum backtest drawdown is more or less random.
Drawdowns grow with the test period, thus longer test periods produce worse maximum drawdowns and later pull-out signals.
The drawdown time is not considered.
The method does not work when profits are reinvested by some money management algorithm.
The method does not consider the unlikeliness that the maximum drawdown happens already at live trading start.

For those reasons, the above pullout inequality is often modified for taking the drawdown length and growth into account. The maximum drawdown is then assumed to grow with the square root of time, leading to this modified formula:

[pmath size=18]E ~<~ C + G t/y – D sqrt{{t+l}/y}[/pmath]

E = Current account equity
C = Initial account capital
G = Test profit
t = Live trading period
y = Test period
D = Maximum drawdown depth
l = Maximum drawdown length

This was in fact the algorithm that I often suggested to clients for supervising their live results. It puts the drawdown in relation to the test period and also considers the drawdown length, as the probability of being inside the worst drawdown right at live trading start is l/y. Still, the method does not work with a profit reinvesting system. And it is dependent on the rather random test drawdown. You could address the latter issue by taking the drawdown from a Montecarlo shuffled equity curve, but this produces new problems since trading results have often serial correlation.

After this lenghty introduction for motivation, here’s the proposed algorithm that overcomes the mentioned issues.

Keeping cold blood

For finding out if we really must immediately stop a strategy, we calculate the deviation of the current live trading situation from the strategy behavior in the test. For this we do not use the maximum drawdown, but the backtest equity or balance curve:

Determine a time window of length l (in days) that you want to check. It’s normally the length of the current drawdown; if your system is not in a drawdown, you’re probably in cold blood anyway. Determine the drawdown depth D, i.e. the net loss during that time.
Place a time window of same size l at the start of the test balance curve.
Determine the balance difference G from end to start of the window. Increase a counter N when G <= D.
Move the window forward by 1 day.
Repeat steps 3 and 4 until the window arrived at the end of the balance curve. Count the steps with a counter M.

Any window movement takes a sample out of the curve. We have N samples that are similar or worse, and M-N samples that are better than the current trading situation. The probability to not encounter such a drawdown in T out of M samples is a simple combinatorial equation:

[pmath size=18]1-P ~=~ {(M-N)!(M-T)! }/ {M!(M-N-T)!}[/pmath]

N = Number of G <= D occurrences
M = Total samples = y-l+1
l = Window length in days
y = Test time in days
T = Samples taken = t-l+1
t = Live trading time in days

P is the cold blood index – the similarity of the live situation with the backtest. As long as P stays above 0.1 or 0.2, probably all is still fine. But if P is very low or zero, either the backtest was strongly biased or the market has significantly changed. The system can still be profitable, just less profitable as in the test. But when the current loss D is large in comparison to the gains so far, we should stop.

Often we want to calculate P soon after the begin of live trading. The window size l is then identical to our trading time t, hence T == 1. This simplifies the equation to:

[pmath size=18]P ~=~ N/M[/pmath]

In such a situation I’d give up and pull out of a painful drawdown as soon as P drops below 5%.

The slight disadvantage of this method is that you must perform a backtest with the same capital allocation, and store its balance or equity curve in a file for later evaluation during live trading. However this should only take a few lines of code in a strategy script.

Here’s a small example script for Zorro that calculates P (in percent) from a stored balance curve when a trading time t and drawdown of length l and depth D is given:

int TradeDays = 40;    // t, Days since live start
int DrawDownDays = 30; // l, Days since you're in drawdown
var DrawDown = 100;    // D, Current drawdown depth in $

string BalanceFile = "Log\\BalanceDaily.dbl"; // stored double array

var logsum(int n)
{
  if(n <= 1) return 0;
  return log(n)+logsum(n-1);
}

void main()
{
  int CurveLength = file_length(BalanceFile)/sizeof(var);
  var *Balances = file_content(BalanceFile);

  int M = CurveLength - DrawDownDays + 1;
  int T = TradeDays - DrawDownDays + 1;
 
  if(T < 1 || M <= T) {
    printf("Not enough samples!");
    return;
  }
 
  var GMin=0., N=0.;
  int i=0;
  for(; i < M-1; i++)
  {
    var G = Balances[i+DrawDownDays] - Balances[i];
    if(G <= -DrawDown) N += 1.;
    if(G < GMin) GMin = G;
  } 

  var P;
  if(TradeDays > DrawDownDays)
    P = 1. - exp(logsum(M-N)+logsum(M-T)-logsum(M)-logsum(M-N-T));
  else
    P = N/M;

  printf("\nTest period: %i days",CurveLength);
  printf("\nWorst test drawdown: %.f",-GMin);
  printf("\nM: %i N: %i T: %i",M,(int)N,T);
  printf("\nCold Blood Index: %.1f%%",100*P);
}

Since my computer is unfortunately not good enough for calculating the factorials of some thousand samples, I’ve summed up the logarithms instead – therefore the strange logsum function in the script.

Conclusion

Finding out early whether a live trading drawdown is ‘normal’ or not can be essential for your wallet.
The backtest drawdown is a late and inaccurate criteria.
The Cold Blood Index calculates the precise probability of such a drawdown based on the backtest balance curve.

I’ve added the script above to the 2015 scripts collection. I also have suggested to the Zorro developers to implement this method for automatically analyzing drawdowns while live trading, and issue warnings when P gets dangerously low. This can also be done separately for components in a portfolio system. This feature will probably appear in a future Zorro version.

Is “Scalping” Irrational?

jcl — Fri, 09 Oct 2015 16:45:41 +0000

Clients often ask for strategies that trade on very short time frames. Some are possibly inspired by “I just made $2000 in 5 minutes” stories on trader forums. Others have heard of High Frequency Trading: the higher the frequency, the better must be the trading! The Zorro developers had been pestered for years until they finally implemented tick histories and millisecond time frames. Totally useless features? Or has short term algo trading indeed some quantifiable advantages? An experiment for looking into that matter produced a surprising result.

It is certainly tempting to earn profits within minutes. Additionally, short time frames produce more bars and trades – a great advantage for strategy development. The quality of test and training depends on the amount of data, and timely price data is always in short supply. Still, scalping – opening and closing trades in minutes or seconds – is largely considered nonsense and irrational by algo traders. Four main reasons are given:

Short time frames cause high trading costs – slippage, spread, commission – in relation to the expected profit.
Short time frames expose more ‘noise’, ‘randomness’ and ‘artifacts’ in the price curve, which reduces profit and increases risk.
Any algorithms had to be individually adapted to the broker or price data provider due to price feed dependency in short time frames.
Algorithmic strategies usually cease working below a certain time frame.

Higher costs, less profit, more risk, feed dependency, no working strategies – seemingly good arguments against scalping (HFT is a very different matter). But never trust common wisdom, especially not in trading. That’s why I had not yet added scalping to my list of irrational trade methods. I can confirm reasons number 3 and 4 from my own experiences: Below bar periods of about 10 minutes, backtests with price histories from different brokers began to produce noticeably different results. And I never managed to develop a strategy with a significantly positive walk-forward test on bar periods less than 30 minutes. But this does not mean that such a strategy does not exist. Maybe short time frames just need special trade methods?

So I’ve programmed an experiment for finding out once and for all if scalping is really as bad as it’s rumored to be. Then I can at least give some reasoned advice to the next client who desires a tick-triggered short-term trading strategy.

Trading costs examined

The first part of the experiment is easily done: a statistic of the impact of trading costs. Higher costs obviously require more profits for compensation. How many trades must you win for overcoming the trading costs at different time frames? Here’s a short script (in C, for Zorro) for answering this question:

function run()
{
  BarPeriod = 1;
  LookBack = 1440;
  Commission = 0.60;
  Spread = 0.5*PIP;

  int duration = 1, i = 0;
  if(!is(LOOKBACK))
    while(duration <= 1440)
  { 
    var Return = abs(priceClose(0)-priceClose(duration))*PIPCost/PIP;
    var Cost = Commission*LotAmount/10000. + Spread*PIPCost/PIP;
    var Rate = ifelse(Return > Cost, Cost/(2*Return) + 0.5, 1.);

    plotBar("Min Rate",i++,duration,100*Rate,AVG+BARS,RED); 
 
    if(duration < 10) duration += 1;
    else if(duration < 60) duration += 5;
    else if(duration < 180) duration += 30;
    else duration += 60;
  }
  Bar += 100; // hack!
}

This script calculates the minimum win rate to compensate the trade costs for different trade durations. We assumed here a spread of 0.5 pips and a round turn commission of 60 cents per 10,000 contracts – that’s average costs of a Forex trade. PIPCost/PIP in the above script is the conversion factor from a price difference to a win or loss on the account. We’re also assuming no win/loss bias: Trades shall win or lose on average the same amount. This allows us to split the Return of any trade in a win and a loss, determined by WinRate. The win is WinRate * Return and the loss is (1-WinRate) * Return. For breaking even, the win minus the loss must cover the cost. The required win rate for this is

WinRate = Cost/(2*Return) + 0.5

The win rate is averaged over all bars and plotted in a histogram of trade durations from 1 minute up to 1 day. The duration is varied in steps of 1, 5, 30, and 60 minutes. We’re entering a trade for any duration every 101 minutes (Bar += 100 in the script is a hack for running the simulation in steps of 101 minutes, while still maintaining the 1-minute bar period).

The script needs a few seconds to run, then produces this histogram (for EUR/USD and 2015):

Required win rate in percent vs. trade duration in minutes

You need about 53% win rate for covering the costs of 1-day trades (rightmost bar), but 90% win rate for 1-minute trades! Or alternatively, a 9:1 reward to risk ratio at 50% win rate. This exceeds the best performances of real trading systems by a large amount, and seems to confirm convincingly the first reason why you better take tales by scalping heroes on trader forums with a grain of salt.

But what about reason number two – that short time frames are plagued with ‘noise’ and ‘randomness’? Or is it maybe the other way around and some effect makes short time frames even more predictable? That’s a little harder to test.

Measuring randomness

‘Noise’ is often identified with the high-frequency components of a signal. Naturally, short time frames produce more high-frequency components than long time frames. They could be detected with a highpass filter, or eliminated with a lowpass filter. Only problem: Price curve noise is not always related to high frequencies. Noise is just the part of the curve that does not carry information about the trading signal. For cycle trading, high frequencies are the signal and low-frequency trend is the noise. So the jaggies and ripples of a short time frame price curve might be just the very inefficiencies that you want to exploit. It depends on the strategy what noise is; there is no ‘general price noise’.

Thus we need a better criteria for determining the tradeability of a price curve. That criteria is randomness. You can not trade a random market, but you can potentially trade anything that deviates from randomness. Randomness can be measured through the information content of the price curve. A good measure of information content is the Shannon Entropy. It is defined this way:

This formula basically measures disorder. A very ordered, predictable signal has low entropy. A random, unpredictable signal has high entropy. In the formula, P(s_i) is the relative frequency of a certain pattern s_iin the signal S. The entropy is at maximum when all patterns are evenly distributed and all P(s_i) have about the same value. If some patterns appear more frequently than other patterns, the entropy goes down. The signal is then less random and more predictable. The Shannon Entropy is measured in bit.

The problem: Zorro has tons of indicators, even the Shannon Gain, but not the Shannon Entropy! So I have no choice but to write a new indicator, which fortunately is my job anyway. This is the source code of the Shannon Entropy of a char string:

var ShannonEntropy(char *S,int Length)
{
  static var Hist[256];
  memset(Hist,0,256*sizeof(var));
  var Step = 1./Length;
  int i;
  for(i=0; i 0.)
      H -= Hist[i]*log2(Hist[i]);
  }
  return H;
}

A char has 8 bit, so 2⁸ = 256 different chars can appear in a string. The frequency of each char is counted and stored in the Hist array. So this array contains the P(s_i) of the above entropy formula. They are multiplied with their binary logarithm and summed up; the result is H(S), the Shannon Entropy.

In the above code, a char is a pattern of the signal. So we need to convert our price curve into char patterns. This is done by a second ShannonEntropy function that calls the first one:

var ShannonEntropy(var *Data,int Length,int PatternSize)
{
  static char S[1024]; // hack!
  int i,j;
  int Size = min(Length-PatternSize-1,1024);
  for(i=0; i Data[i+j+1])
      C += 1<
PatternSize determines the partitioning of the price curve. A pattern is defined by a number of price changes. Each price is either higher than the previous price, or it is not; this is a binary information and constitutes one bit of the pattern. A pattern can consist of up to 8 bits, equivalent to 256 combinations of price changes. The patterns are stored in a char string. Their entropy is then determined by calling the first ShannonEntropy function with that string (both functions have the same name, but the compiler can distinguish them from their different parameters). Patterns are generated from any price and the subsequent PatternSize prices; then the procedure is repeated with the next price. So the patterns overlap.
An unexpected result
Now we only need to produce a histogram of the Shannon Entropy, similar to the win rate in our first script:
function run()
{
  BarPeriod = 1;
  LookBack = 1440*300;
  StartWeek = 10000;
 
  int Duration = 1, i = 0;
  while(Duration <= 1440)
  { 
    TimeFrame = frameSync(Duration);
    var *Prices = series(price(),300);

    if(!is(LOOKBACK) && 0 == (Bar%101)) {
      var H = ShannonEntropy(Prices,300,3);
      plotBar("Randomness",i++,Duration,H,AVG+BARS,BLUE);	
    }
    if(Duration < 10) Duration += 1;
    else if(Duration < 60) Duration += 5;
    else if(Duration < 240) Duration += 30;
    else if(Duration < 720) Duration += 120;
    else Duration += 720;
  }
}
The entropy is calculated for all time frames at every 101th bar, determined with the modulo function. (Why 101? In such cases I’m using odd numbers for preventing synchronization effects). I cannot use here the hack with skipping the next 100 bars as in the previous script, as skipping bars would prevent proper shifting of the price series. That’s why this script must really grind through any minute of 3 years, and needs several minutes to complete.
Two code lines should be explained because they are critical for measuring the entropy of daily candles using less-than-a-day bar periods:
StartWeek = 10000;
This starts the week at Monday midnight (1 = Monday, 00 00 = midnight) instead of Sunday 11 pm. This line was missing at first and I wondered why the entropy of daily candles was higher than I expected. Reason:  The single Sunday hour at 11 pm counted as a full day and noticeably increased the randomness of daily candles.
TimeFrame = frameSync(Duration);
This synchronizes the time frame to full hours respectively days. If this is missing, the Shannon Entropy of daily candles gets again a too high value since the candles are not in sync with a day anymore. A day has often less than 1440 one-minute bars due to weekends and irregularities in the historical data.
The Shannon Entropy is calculated with a pattern size of 3 price changes, resulting in 8 different patterns. 3 bit is the maximum entropy for 8 patterns. As price changes are not completely random, I expected an entropy value slightly smaller than 3, steadily increasing when time frames are decreasing. However I got this interesting histogram (EUR/USD, 2013-2015, FXCM price data):
Entropy vs. time frame (minutes)

The entropy is almost, but not quite 3 bit. This confirms that price patterns are not absolutely random. We can see that the 1440 minutes time frame has the lowest Shannon Entropy at about 2.9 bit. This was expected, as the daily cycle has a strong effect on the price curve, and daily candles are thus more regular than candles of other time frames. For this reason price action or price pattern algorithms often use daily candles. The entropy increases with decreasing time frames, but only down to time frames  of about ten minutes. Even lower time frames are actually less random!
This is an unexpected result. The lower the time frame, the less price quotes does it contain, so the impact of chance should be in fact higher. But the opposite is the case. I could produce similar results with other patterns of 4 and 5 bit, and also with other assets. For making sure I continued the experiment with a different, tick-based price history and even shorter time frames of 2, 5, 10, 15, 30, 45, and 60 seconds (Zorro’s “useless” micro time frames now came in handy, after all):
Entropy vs. time frame (seconds)

The x axis is now in second units instead of minutes. We see that price randomness continues to drop with the time frame.
There are several possible explanations. Price granularity is higher at low time frames due to the smaller number of ticks. High-volume trades are often split into many small parts (‘iceberg trades‘) and may cause a sequence of similar price quotes in short intervals. All this reduces the price entropy of short time frames. But it does not necessarily increase trade opportunities:  A series of identical quotes has zero entropy and is 100% predictable, but can not be traded. Of course, iceberg trades are still an interesting inefficiency that could theoretically be exploited – if it weren’t for the high trading costs. So that’s something to look further into only when you have direct market access and no broker fees.
I have again uploaded the scripts to the 2015 scripts collection. You’ll need Zorro 1.36 or above for reproducing the results. Zorro S and tick based data are needed for the second time frames.
Conclusions

Scalping is not completely nuts. Very low time frames expose some regularity.
Whatever the reason, this regularity can not be exploited by retail traders due to the high costs of short term trades.
On time frames above 60 minutes prices become less random and more regular. This recommends long time frames for algo trading.
The most regular price patterns appear with 1-day bars. They also cause the least trading costs.

Papers
Shannon Entropy: Lecture

Hacker’s Tools

jcl — Sat, 03 Oct 2015 08:01:30 +0000

For our financial hacking experiments (and for harvesting their financial fruits) we need some software machinery for research, testing, training, and live trading financial algorithms. There are many tools for algo trading, but no existing software platform today is really up to all those tasks. You have to put together your system from different software packages. Fortunately, two are normally sufficient. I’ll use Zorro and R for most articles on this blog, but will also occasionally look into other tools.

Choice of languages

Algo trading systems are normally based on a script in some programming language. You can avoid writing scripts entirely by using a visual ‘strategy builder’, ‘code wizard’ or spreadsheet program for defining your strategy. But this is also some sort of programming, just in a different language that you have to master. And visual builders can only create rather simple ‘indicator soup’ systems that are unlikely to produce consistent trading profit. For serious algo trading sytems, real development, and real research, there’s no stepping around ‘real programming’.

You’re also not free to select the programming language with the nicest or easiest syntax. One of the best compromises of simplicity and object orientation is probably Python. It also offers libraries with useful statistics and indicator functions. Consequently, many strategy developers start with programming their systems in Python… and soon run into serious limitations. There’s another criterion that is more relevant for system development than syntax: execution speed.

Speed mostly depends on whether a computer language is compiled or interpreted. C, Pascal, and Java are compiled languages, meaning that the code runs directly on the processor (C, C++, Pascal) or on a ‘virtual machine’ (Java). Python, R, and Matlab is interpreted: The code won’t run by itself, but is executed by an interpreter software. Interpreted languages are much slower and need more CPU and memory resources than compiled languages. It won’t help much for trading strategies that they have fast C-programmed libraries. All backtests or optimization processes must still run through the bottleneck of interpreted trading logic. Theoretically the slowness can be worked around with ‘vectorized coding’ – see below – but that has little practical use.

R and Python have other advantages. They are interactive: you can enter commands directly at a console. This allows quick code or function testing. Some languages, such as C#, are inbetween: They are compiled to a machine-independent interim code that is then, dependent on implementation, either interpreted or converted to machine code. C# is about 4 times slower than C, but still 30 times faster than Python.

Here’s a benchmark table of the same two test programs written in several languages: a sudoku solver and a loop with a 1000 x 1000 matrix multiplication (in seconds):

Language	Sudoku	Matrix
C, C++	1.0	1.8
Java	1.7	2.6
Pascal	—	4
C#	3.8	9
JavaScript	18.1	16
Basic (VBA)	—	25
Erlang	18	31
Python	119	121
Ruby	98	628
Matlab	—	621
R	—	1738

Speed becomes important as soon as you want to develop a short-term trading system. In the development process you’re all the time testing system variants. A 10-years backtest with M1 historical data executes the strategy about 3 million times. If a C-written strategy needs 1 minute for this, the same strategy in EasyLanguage would need about 30 minutes, in Python 2 hours, and in R more than 10 hours! And that’s only a backtest, no optimization or WFO run. If I had coded the trend experiment in Python or R, I would today still wait for the results. You can see why trade platforms normally use a C variant or a proprietary compiled language for their strategies. HFT systems are anyway written in C or directly in machine language.

Even compiled languages can have large speed differences due to different implementation of trading and analysis functions. When we compare not Sudoku or a matrix multiplication, but a real trading system – the small RSI strategy from this page – we find very different speeds on different trading platforms (10 years backtest, ticks resolution):

Zorro: ~ 0.5 seconds (compiled C)
MT4: ~ 110 seconds (MQL4, a C variant)
MultiCharts: ~ 155 seconds (EasyLanguage, a C/Pascal mix)

However, the differences are not as bad as suggested by the benchmark table. In most cases the slow language speed is partically compensated by fast vector function libraries. A script that does not go step by step through historical data, but only calls library functions that process all data simultaneously, would run with comparable speed in all languages. Indeed some trading systems can be coded in this vectorized method, but unfortunately this works only with simple systems and requires entirely different scripts for backtests and live trading.

Choice of tools

Zorro is a software for financial analysis and algo-trading – a sort of Swiss Knife tool since you can use it not only for live trading, but also for all sorts of quick tests. It’s my software of choice for financial hacking because:

It’s free (unless you’re rich).
Scripts are in C, event driven and very fast. You can code a system or an idea in 5 minutes.
Open architecture – you can add anything with DLL plugins.
Minimalistic – just a frontend to a programming language.
Can be automatized for experiments.
Very stable – I rarely found bugs and they were fixed fast.
Very accurate, realistic trading simulation, including HFT.
Supports also options and futures, and portfolios of multiple assets.
Has a library with 100s of indicators, statistics and machine learning functions, most with source code.
Is continuously developed and supported (new versions usually come out every 2..3 months).
Last but not least: I know it quite well, as I’ve written its tutorial…

A strategy example coded in C, the classic SMA crossover:

void run()
{
  double* Close = series(priceClose());
  double* MA30 = series(SMA(Close,30));	
  double* MA100 = series(SMA(Close,100));
	
  Stop = 4*ATR(100);
  if(crossOver(MA30,MA100))
    enterLong();
  if(crossUnder(MA30,MA100))
    enterShort();
}

More code can be found among the script examples on the Zorro website. You can see that Zorro offers a relatively easy trading implementation. But here comes the drawback of the C language: You can not as easy drop in external libraries as in Python or R. Using a C/C++ based data analysis or machine learning package involves sometimes a lengthy implementation. Fortunately, Zorro can also call R and Python functions for those purposes.

R is a script interpreter for data analysis and charting. It is not a real language with consistent syntax, but more a conglomerate of operators and data structures that has grown over 20 years. It’s harder to learn than a normal computer language, but offers some unique advantages. I’ll use it in this blog when it comes to complex analysis or machine learning tasks. It’s my tool of choice for financial hacking because:

It’s free. (“Software is like sex: it’s better when it’s free.”)
R scripts can be very short and effective (once you got used to the syntax).
It’s the global standard for data analysis and machine learning.
Open architecture – you can add modules for almost anything.
Minimalistic – just a console with a language interpreter.
Very stable – I found a few bugs in external libraries, but so far never in the main program.
Has tons of “packages” for all imaginable mathematical and statistical tasks, and especially for machine learning.
Is continuously developed and supported by the global scientific community (about 15 new packages usually come out every day).

This is the SMA crossover in R for a ‘vectorized’ backtest:

require(quantmod)
require(PerformanceAnalytics)

Data <- xts(read.zoo("EURUSD.csv", tz="UTC", format="%Y-%m-%d %H:%M", sep=",", header=TRUE))
Close <- Cl(Data)
MA30 <- SMA(Close,30)
MA100 <- SMA(Close,100)
 
Dir <- ifelse(MA30 > MA100,1,-1) # calculate trade direction
Dir.1 <- c(NA,Dir[-length(Dir)]) # shift by 1 for avoiding peeking bias
Return <- ROC(Close)*Dir.1 
charts.PerformanceSummary(na.omit(Return))

You can see that the vectorized code just consists of function calls. It runs almost as fast as the C equivalent. But it is difficult to read, it can not be used for live trading, and many parts of a trading logic – even a simple stop loss – cannot be coded for a vectorized test. Thus, so good R is for interactive data analysis, so hopeless is it for writing trade strategies – although some R packages (for instance, quantstrat) even offer rudimentary optimization and test functions. They all require an awkward coding style and do not simulate trading very realistically, but are still too slow for serious backtests.

Although R can not replace a serious backtest and trading platform, Zorro and R complement each other perfectly: Here is an example of a machine learning system build together with a deep learning package from R and the training and trading framework from Zorro.

More hacker’s tools

Aside from languages and platforms, you’ll often need auxiliary tools that may be small, simple, cheap, but all the more important since you’re using them all the time. For editing not only scripts, but even short CSV lists I use Notepad++. For interactive working with R I recommend RStudio. Very helpful for strategy development is a file comparison tool: You often have to compare trade logs of different system variants and check which variant opened which trade a little earlier or later, and which consequences that had. For this I use Beyond Compare.

Aside from Zorro and R, there’s also a relatively new system development software that I plan to examine closer at some time in the future, TSSB for generating and testing bias-free trading systems with advanced machine learning algorithms. David Aronson and Timothy Masters were involved in its development, so it certainly won’t be as useless as most other “trade system generating” software. However, there’s again a limitation: TSSB can not trade or export, so you can not really use the ingenious systems that you developed with it. Maybe I’ll find a solution to combine TSSB with Zorro.

References

TIOBE index of top programming languages

Speed comparison of programming languages

Update (November 2017). The release of new deep learning packages has made TSSB sort of obsolete. For instance, the H2O package natively supports several ways of features filtering and dimensionality reduction, as well as ensembles, both so far the strength of TSSB. H2O is supported with Zorro’s advise function. Still, the TSSB book by Davin Aronson is a valuable source of methods, approaches, and tips about machine learning for financial prediction.

Download links to the latest versions of Zorro and R are placed on the side bar. A brief tutorial to both Zorro an R is contained in the Zorro manual; a more comprehensive introduction into working with Zorro can be found in the Black Book.

Trend Indicators

jcl — Fri, 04 Sep 2015 16:48:13 +0000

The most common trade method is ‘going with the trend‘. While it’s not completely clear how one can go with the trend without knowing it beforehand, most traders believe that ‘trend’ exists and can be exploited. ‘Trend’ is supposed to manifest itself in price curves as a sort of momentum or inertia that continues a price movement once it started. This inertia effect does not appear in random walk curves. One can speculate about possible causes of trend, such as:

Information spreads slowly, thus producing a delayed, prolonged reaction and price movement. (There are examples where it took the majority of traders more than a year to become aware of essential, fundamental information about a certain market!)
Events trigger other events with a similar effect on the price, causing a sort of avalanche effect.
Traders see the price move, shout “Ah! Trend!” and jump onto the bandwagon, thus creating a self fulfilling prophecy.
Trade algorithms detect a trend, attempt to exploit it and amplify it this way.
All of this, but maybe on different time scales.

Whatever the cause, hackers prefer experiment over speculation, so let’s test once and for all if trend in price curves really exists and can be exploited in an algorithmic strategy. Exploiting trend obviously means buying at the begin of a trend, and selling at the end. For this, we simply assume that trend begins at the bottom of a price curve valley and ends at a peak, or vice versa. Obviously this assumption makes no difference between ‘real’ trends and random price fluctuations. Such a difference could anyway only be determined in hindsight. However, trade profits and losses by random fluctuations should cancel out each other in the long run, at least when trade costs are disregarded. When trends exist in price curves, this method should produce an overall profit from the the remaining trades that are triggered by real trends and last longer than in random walk curves. And hopefully this profit will exceed the costs of the random trades. At least – that’s the theory. So we’re just looking for peaks and valleys of the price curve. The script of the trend following experiment would look like this (all scripts here are in C for the Zorro platform; var is typedef’d to double):

void run()
{
  var *Price = series(price());
  var *Smoothed = series(Filter(Price,Period));
  
  if(valley(Smoothed))
    enterLong();
  else if(peak(Smoothed))
    enterShort();
}

This strategy is simple enough. It is basically the first strategy from the Zorro tutorial. The valley function returns true when the last price was below the current price and also below the last-but-one price; the peak function returns true when the last price was above the current and the last-but-one. Trades are thus entered at all peaks and valleys and closed by reversal only. For not getting too many random trades and too high trade costs, we’re removing the high frequencies from the price curve with some smoothing indicator, named Filter in the code above. Trend, after all, is supposed to be longer-lasting and should manifest itself in the lower frequencies. This is an example trade produced by this system: The black jagged line in the chart above is the raw price curve, the red line is the filtered curve. You can see that it lags behind the black curve a little, which is typical of smoothing indicators. It has a peak at the end of September and thus entered a short trade (tiny green dot). The red line continues going down all the way until November 23, when a valley was reached. A long trade (not shown in this chart) was then entered and the short trade was closed by reversal (other tiny green dot). The green straight line connects the entry and exit points of the trade. It was open almost 2 months and made in that time a profit of ~ 13 cents per unit, or 1300 pips.

The only remaining question is – which indicator shall we use for filtering out the high frequencies, the ripples and jaggies, from the price curve? We’re spoilt for choice.

The Smoothing Candidates

Smoothing a curve in some way is a common task of all trend strategies. Consequently, there’s a multitude of smoothing, averaging, low-lag and spectral filter indicators at our disposal. We have to test them. The following candidates were selected for the experiment, traditional indicators as well as fancier algorithms:

SMA, the simple moving average, the sum of the last n prices divided by n.
EMA, exponential moving average, the current price multiplied with a small factor plus the last EMA multiplied with a large factor. The sum of both factors must be 1. This is a first order lowpass filter.

LowPass, a second order lowpass filter with the following source code:

var LowPass(var *Data,int Period)
{
  var* LP = series(Data[0]);
  var a = 2.0/(1+Period);
  return LP[0] = (a-0.25*a*a)*Data[0]
    + 0.5*a*a*Data[1]
    - (a-0.75*a*a)*Data[2]
    + 2*(1.-a)*LP[1]
    - (1.-a)*(1.-a)*LP[2];
}

HMA, the Hull Moving Average, with the following source code:

var HMA(var *Data,int Period)
{
  return WMA(series(2*WMA(Data,Period/2)-WMA(Data,Period),sqrt(Period))
}

ZMA, Ehler’s Zero-Lag Moving Average, an EMA with a correction term for removing lag. This is the source code:

var ZMA(var *Data,int Period)
{
  var *ZMA = series(Data[0]);
  var a = 2.0/(1+Period);
  var Ema = EMA(Data,Period);
  var Error = 1000000;
  var Gain, GainLimit=5, BestGain=0;
  for(Gain = -GainLimit; Gain < GainLimit; Gain += 0.1) {
     ZMA[0] = a*(Ema + Gain*(Data[0]-ZMA[1])) + (1-a)*ZMA[1];
     var NewError = Data[0] - ZMA[0];
     if(abs(Error) < newError) {
       Error = abs(newError);
       BestGain = Gain;
     }
  }
  return ZMA[0] = a*(Ema + BestGain*(Data[0]-ZMA[1])) + (1-a)*ZMA[1];
}

ALMA, Arnaud Legoux Moving Average, based on a shifted Gaussian distribution (described in this paper):

var ALMA(var *Data, int Period)
{
  var m = floor(0.85*(Period-1));
  var s = Period/6.0;
  var alma = 0., wSum = 0.;
  int i;
  for (i = 0; i < Period; i++) {
    var w = exp(-(i-m)*(i-m)/(2*s*s));
    alma += Data[Period-1-i] * w;
    wSum += w;
  }
  return alma / wSum;
}

Laguerre, a 4-element Laguerre filter:

var Laguerre(var *Data, var alpha)
{
  var *L = series(Data[0]);
  L[0] = alpha*Data[0] + (1-alpha)*L[1];
  L[2] = -(1-alpha)*L[0] + L[1] + (1-alpha)*L[2+1];
  L[4] = -(1-alpha)*L[2] + L[2+1] + (1-alpha)*L[4+1];
  L[6] = -(1-alpha)*L[4] + L[4+1] + (1-alpha)*L[6+1];
  return (L[0]+2*L[2]+2*L[4]+L[6])/6;
}

Linear regression, fits a straight line between the data points in a way that the distance between each data point and the line is minimized by the least-squares rule.

Smooth, John Ehlers’ “Super Smoother”, a 2-pole Butterworth filter combined with a 2-bar SMA that suppresses the Nyquist frequency:

var Smooth(var *Data,int Period)
{
  var f = (1.414*PI) / Period;
  var a = exp(-f);
  var c2 = 2*a*cos(f);
  var c3 = -a*a;
  var c1 = 1 - c2 - c3;
  var *S = series(Data[0]);
  return S[0] = c1*(Data[0]+Data[1])*0.5 + c2*S[1] + c3*S[2];
}

Decycle, another low-lag indicator by John Ehlers. His decycler is simply the difference between the price and its fluctuation, retrieved with a highpass filter:
```
var Decycle(var* Data,int Period)
{
  return Data[0]-HighPass2(Data,Period);
}
```

The above source codes have been taken from the indicators.c file of the Zorro platform, which contains source codes of all supplied indicators.

Comparing Step Responses

What is the best of all those indicators? For a first impression here’s a chart showing them reacting on a simulated sudden price step. You can see that some react slow, some very slow, some fast, and some overshoot: You can generate such a step response diagram with this Zorro script:

// compare the step responses of some low-lag MAs
function run()
{
  set(PLOTNOW);
  BarPeriod = 60;
  MaxBars = 1000;
  LookBack = 150;
  asset("");   // don't load an asset
  ColorUp = ColorDn = 0;  // don't plot a price curve
  PlotWidth = 800;
  PlotHeight1 = 400;

  var *Impulse = series(ifelse(Bar>200 && Bar<400,1,0)); // 0->1 impulse
  int Period = 50;
  plot("Impulse",Impulse[0],0,GREY);
  plot("SMA",SMA(Impulse,Period),0,BLACK);
  plot("EMA",EMA(Impulse,Period),0,0x808000);
  plot("ALMA",ALMA(Impulse,Period),0,0x008000);
  plot("Laguerre",Laguerre(Impulse,4.0/Period),0,0x800000);
  plot("Hull MA",HMA(Impulse,Period),0,0x00FF00);
  plot("Zero-Lag MA",ZMA(Impulse,Period),0,0x00FFFF); 
  plot("Decycle",Decycle(Impulse,Period),0,0xFF00FF);
  plot("LowPass",LowPass(Impulse,Period),0,0xFF0000);
  plot("Smooth",Smooth(Impulse,Period),0,0x0000FF);
}

We can see that from all the indicators, the ALMA seems to be the best representation of the step, while the ZMA produces the fastest response with no overshoot, the Decycler the fastest with some overshoot, and Laguerre lags a lot behind anything else. Obviously, all this has not much meaning because the time period parameters of the indicators are not really comparable. So the step response per se does not reveal if the indicator is well suited for trend detection. We have no choice: We must use them all.

The Trend Experiment

For the experiment, all smoothing indicators above will be applied to a currency, a stock index, and a commodity, and will compete in exploiting trend. Because different time frames can represent different trader groups and thus different markets, the indicators will be applied to price curves made of 15-minutes, 1-hour, and 4-hours bars. And we’ll also try different indicator time periods in 10 steps from 10 to 10000 bars.

So we’ll have to program 10 indicators * 3 assets * 3 bar sizes * 10 time periods = 900 systems. Some of the 900 results will be positive, some negative, and some zero. If there are no significant positive results, we’ll have to conclude that trend in price curves either does not exist, or can at least not be exploited with this all-too-obvious method. However we’ll likely get some winning systems, if only for statistical reasons.

The next step will be checking if their profits are for real or just caused by a statistical effect dubbed Data Mining Bias. There is a method to measure Data Mining Bias from the resulting equity curves: the notorious White’s Reality Check. We’ll apply that check to the results of our experiment. If some systems survive White’s Reality Check, we’ll have the once-and-for-all proof that markets are ineffective, trends really exist, and algorithmic trend trading works.

Next Step…