<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Entropy &#8211; The Financial Hacker</title>
	<atom:link href="https://financial-hacker.com/tag/entropy/feed/" rel="self" type="application/rss+xml" />
	<link>https://financial-hacker.com</link>
	<description>A new view on algorithmic trading</description>
	<lastBuildDate>Fri, 07 Oct 2022 11:02:51 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://financial-hacker.com/wp-content/uploads/2017/07/cropped-mask-32x32.jpg</url>
	<title>Entropy &#8211; The Financial Hacker</title>
	<link>https://financial-hacker.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Better Strategies 4: Machine Learning</title>
		<link>https://financial-hacker.com/build-better-strategies-part-4-machine-learning/</link>
					<comments>https://financial-hacker.com/build-better-strategies-part-4-machine-learning/#comments</comments>
		
		<dc:creator><![CDATA[jcl]]></dc:creator>
		<pubDate>Thu, 31 Mar 2016 13:43:37 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[System Development]]></category>
		<category><![CDATA[Autoencoder]]></category>
		<category><![CDATA[Boltzmann machine]]></category>
		<category><![CDATA[Classification]]></category>
		<category><![CDATA[Data mining bias]]></category>
		<category><![CDATA[Decision tree]]></category>
		<category><![CDATA[Entropy]]></category>
		<category><![CDATA[Indicator soup]]></category>
		<category><![CDATA[K-Means]]></category>
		<category><![CDATA[K-Nearest Neighbor]]></category>
		<category><![CDATA[Naive Bayes]]></category>
		<category><![CDATA[Neural network]]></category>
		<category><![CDATA[Price action]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Regression]]></category>
		<category><![CDATA[Shannon]]></category>
		<category><![CDATA[Support vector machine]]></category>
		<guid isPermaLink="false">http://www.financial-hacker.com/?p=931</guid>

					<description><![CDATA[Deep Blue was the first computer that won a chess world championship. That was 1996, and it took 20 years until another program, AlphaGo, could defeat the best human Go player. Deep Blue was a model based system with hardwired chess rules. AlphaGo is a data-mining system, a deep neural network trained with thousands of &#8230; <a href="https://financial-hacker.com/build-better-strategies-part-4-machine-learning/" class="more-link">Continue reading<span class="screen-reader-text"> "Better Strategies 4: Machine Learning"</span></a>]]></description>
										<content:encoded><![CDATA[<p><strong>Deep Blue</strong> was the first computer that won a chess world championship. That was 1996, and it took 20 years until another program, <strong>AlphaGo</strong>, could defeat the best human Go player. Deep Blue was a model based system with hardwired chess rules. AlphaGo is a data-mining system, a deep neural network trained with thousands of Go games. Not improved hardware, but a breakthrough in software was essential for the step from beating top Chess players to beating top Go players.<br />
In this 4th part of the <a href="http://www.financial-hacker.com/build-better-strategies/" target="_blank" rel="noopener noreferrer">mini-series</a> we&#8217;ll look into the <strong>data mining approach</strong> for developing trading strategies. This method does not care about market mechanisms. It just scans price curves or other data sources for predictive patterns. Machine learning or &#8220;Artificial Intelligence&#8221; is not always involved in data-mining strategies. In fact the most popular &#8211; and surprisingly profitable &#8211; data mining method works without any fancy neural networks or support vector machines.<span id="more-931"></span></p>
<h3>Machine learning principles</h3>
<p>A learning algorithm is fed with data <strong>samples</strong>, normally derived in some way from historical prices. Each sample consists of <em><strong>n</strong></em> variables <em><strong>x<sub>1</sub> .. x<sub>n</sub></strong></em>, commonly named <strong>predictors</strong>,&nbsp;<strong>features</strong>, <strong>signals</strong>, or simply <strong>input</strong>. These predictors can be the price returns of the last <em><strong>n</strong></em> bars, or a collection of classical indicators, or any other imaginable functions of the price curve (I&#8217;ve even seen the pixels of a price chart image used as predictors for a neural network!). Each sample also normally includes a <strong>target variable</strong> <em><strong>y</strong></em>, like the return of the next trade after taking the sample, or the next price movement. In the literature you can find <strong>y</strong> also named <strong>label</strong> or <strong>objective</strong>. In a <strong>training process</strong><em>,</em> the algorithm learns to predict the target&nbsp;<em><strong>y</strong></em> from the predictors <em><strong>x<sub>1</sub> .. x<sub>n</sub></strong></em>. The learned &#8216;memory&#8217; is stored in a data structure named <strong>model</strong> that is specific to the algorithm (not to be confused with a financial model for <a href="http://www.financial-hacker.com/build-better-strategies-part-2-model-based-systems/">model based strategies</a>!). A machine learning model can be a function with prediction rules in C code, generated by the training process. Or it can be a set of connection weights of a neural network.</p>
<p><strong>Training</strong>: &nbsp; &nbsp;&nbsp;<em><strong>x<sub>1</sub> .. x<sub>n</sub></strong></em>, <em><strong>y</strong></em>&nbsp; =&gt; &nbsp;model</p>
<p><strong>Prediction</strong>: &nbsp;&nbsp;<em><strong>x<sub>1</sub> .. x<sub>n</sub></strong></em>, model &nbsp;=&gt; &nbsp;<em><strong>y</strong></em></p>
<p>The predictors, features, or whatever you call them, must carry information sufficient to predict the target <em><strong>y</strong></em> with some accuracy. They m<span style="line-height: 1.75;">ust also often fulfill two formal requirements. First, all predictor values should be in the same range, like -1 .. +1 (for most R algorithms) or -100 .. +100 (for Zorro or TSSB algorithms). So you need to </span><a href="http://manual.zorro-project.com/norm.htm" target="_blank" rel="noopener noreferrer"><strong style="line-height: 1.75;">normalize</strong></a><span style="line-height: 1.75;"> them in some way before sending them to the machine. Second, the samples should be <strong>balanced</strong>, i.e. equally distributed over all values of the target variable. So there should be about as many winning as losing samples. If you do not observe these two requirements, you&#8217;ll wonder why you&#8217;re getting bad results from the machine learning algorithm.</span></p>
<p><strong>Regression</strong> algorithms predict a numeric value, like the magnitude and sign of the next price move. <strong>Classification</strong> algorithms predict a qualitative sample class, for instance whether it&#8217;s preceding a win or a loss. Some algorithms, such as neural networks, decision trees, or support vector machines, can be run in both modes.</p>
<p>A few algorithms learn to divide samples into classes without needing any target <em><strong>y</strong></em>. That&#8217;s <strong>unsupervised learning</strong>, as opposed to <strong>supervised learning</strong> using a target. Somewhere inbetween is <strong>reinforcement learning</strong>, where the system trains itself by running simulations with the given features, and using the outcome as training target. AlphaZero, the successor of AlphaGo, used reinforcement learning by playing millions of Go games against itself. In finance there are few applications for unsupervised or reinforcement learning. 99% of machine learning strategies use supervised learning.</p>
<p>Whatever signals we&#8217;re using for predictors in finance, they will most likely contain much noise and little information, and will be nonstationary on top of it. Therefore financial prediction is <strong>one of the hardest tasks</strong> in machine learning. More complex algorithms do not necessarily achieve better results. The selection of the predictors is critical to the success. It is no good idea to use lots of predictors, since this simply causes overfitting and failure in out of sample operation. Therefore data mining strategies often apply a <strong>preselection algorithm </strong>that determines a small number of predictors out of a pool of many. The preselection can be based on correlation between predictors, on significance, on information content, or simply on prediction success with a test set. Practical experiments with feature selection can be found in a recent article on the <a href="http://robotwealth.com/machine-learning-financial-prediction-david-aronson/" target="_blank" rel="noopener noreferrer">Robot Wealth</a> blog.</p>
<p>Here&#8217;s a list of the most popular data mining methods used in finance.</p>
<h3>1. Indicator soup</h3>
<p>Most trading systems we&#8217;re programming for clients are not based on a financial model. The client just wanted trade signals from certain technical indicators, filtered with other technical indicators in combination with more technical indicators. When asked how this hodgepodge of indicators could be a profitable strategy, he normally answered: &#8220;Trust me. I&#8217;m trading it manually, and it works.&#8221;</p>
<p>It did indeed. At least sometimes. Although most of those systems did not pass a WFA test (and some not even a simple backtest), a surprisingly large number did. And those were also often profitable in real trading. The client had systematically experimented with technical indicators until he found a combination that worked in live trading with certain assets. This way of trial-and-error technical analysis is a classical data mining approach, just executed by a human and not by a machine. I can not really recommend this method &#8211; and a lot of luck, not to speak of money, is probably involved &#8211; but I can testify that it sometimes leads to profitable systems.</p>
<h3>2. Candle patterns</h3>
<p>Not to be confused with those&nbsp;<a href="http://www.financial-hacker.com/seventeen-popular-trade-strategies-that-i-dont-really-understand/">Japanese Candle Patterns</a> that had their best-before date long, long&nbsp;ago. The modern equivalent is&nbsp;<strong>price action trading</strong>. You&#8217;re still looking at the open, high, low, and close of candles. You&#8217;re still hoping to find a pattern that predicts a price direction. But you&#8217;re now data mining contemporary price curves for collecting those patterns. There are software packages for that purpose. They search for patterns that are profitable by some user-defined criterion, and use them to build a specific pattern detection function. It could look like this one (from Zorro&#8217;s <a href="http://manual.zorro-project.com/advisor.htm" target="_blank" rel="noopener noreferrer">pattern analyzer</a>):</p>
<pre class="prettyprint">int detect(double* sig)
{
  if(sig[1]&lt;sig[2] &amp;&amp; sig[4]&lt;sig[0] &amp;&amp; sig[0]&lt;sig[5] &amp;&amp; sig[5]&lt;sig[3] &amp;&amp; sig[10]&lt;sig[11] &amp;&amp; sig[11]&lt;sig[7] &amp;&amp; sig[7]&lt;sig[8] &amp;&amp; sig[8]&lt;sig[9] &amp;&amp; sig[9]&lt;sig[6])
      return 1; 
  if(sig[4]&lt;sig[1] &amp;&amp; sig[1]&lt;sig[2] &amp;&amp; sig[2]&lt;sig[5] &amp;&amp; sig[5]&lt;sig[3] &amp;&amp; sig[3]&lt;sig[0] &amp;&amp; sig[7]&lt;sig[8] &amp;&amp; sig[10]&lt;sig[6] &amp;&amp; sig[6]&lt;sig[11] &amp;&amp; sig[11]&lt;sig[9])
      return 1;
  if(sig[1]&lt;sig[4] &amp;&amp; eqF(sig[4]-sig[5]) &amp;&amp; sig[5]&lt;sig[2] &amp;&amp; sig[2]&lt;sig[3] &amp;&amp; sig[3]&lt;sig[0] &amp;&amp; sig[10]&lt;sig[7] &amp;&amp; sig[8]&lt;sig[6] &amp;&amp; sig[6]&lt;sig[11] &amp;&amp; sig[11]&lt;sig[9])
      return 1;
  if(sig[1]&lt;sig[4] &amp;&amp; sig[4]&lt;sig[5] &amp;&amp; sig[5]&lt;sig[2] &amp;&amp; sig[2]&lt;sig[0] &amp;&amp; sig[0]&lt;sig[3] &amp;&amp; sig[7]&lt;sig[8] &amp;&amp; sig[10]&lt;sig[11] &amp;&amp; sig[11]&lt;sig[9] &amp;&amp; sig[9]&lt;sig[6])
      return 1;
  if(sig[1]&lt;sig[2] &amp;&amp; sig[4]&lt;sig[5] &amp;&amp; sig[5]&lt;sig[3] &amp;&amp; sig[3]&lt;sig[0] &amp;&amp; sig[10]&lt;sig[7] &amp;&amp; sig[7]&lt;sig[8] &amp;&amp; sig[8]&lt;sig[6] &amp;&amp; sig[6]&lt;sig[11] &amp;&amp; sig[11]&lt;sig[9])
      return 1;
  ....
  return 0;
}
</pre>
<p>This C function returns 1 when the signals match one of the patterns, otherwise 0. You can see from the lengthy code that this is not the fastest way to detect patterns. A better method, used by Zorro when the detection function needs not be exported, is sorting the signals by their magnitude and checking the sort order. An example of such a&nbsp;system can be found <a href="http://www.financial-hacker.com/better-tests-with-oversampling/" target="_blank" rel="noopener noreferrer">here</a>.</p>
<p>Can price action trading really work? Just like the indicator soup, it&#8217;s not based on any rational financial model. One can at best imagine that sequences of price movements cause market participants to react in a certain way, this way establishing a temporary predictive pattern. However the number of patterns is quite limited when you only look at sequences of a few adjacent candles. The next step is comparing candles that are not adjacent, but arbitrarily selected within a longer time period. This way you&#8217;re getting an almost unlimited number of patterns &#8211; but at the cost of finally leaving the realm of the rational. It is hard to imagine how a price move can be predicted by some candle patterns from weeks ago.</p>
<p>Still, a lot effort is going into that. A fellow blogger, Daniel Fernandez, runs a subscription website (<a href="https://asirikuy.com/newsite/" target="_blank" rel="noopener noreferrer">Asirikuy</a>) specialized on data mining candle patterns. He refined pattern trading down to the smallest details, and if anyone would ever achieve any profit this way, it would be him. But to his subscribers&#8217; disappointment, trading his patterns live (<a href="https://quriquant.com/pages/view/livetrading" target="_blank" rel="noopener noreferrer">QuriQuant</a>) produced very different results than his wonderful backtests. If profitable price action systems really exist, apparently no one has found them yet.</p>
<h3>3. Linear regression</h3>
<p>The simple basis of many complex machine learning algorithms: Predict the target variable <em><strong>y</strong></em> by a linear combination of the predictors &nbsp;<em><strong>x<sub>1&nbsp;</sub>.. x<sub>n</sub></strong></em>.</p>
<p style="text-align: left; padding-left: 30px;">[latex]y = a_0 + a_1 x_1 + &#8230; + a_n x_n[/latex]</p>
<p>The coefficients&nbsp;<em><strong>a<sub>n</sub></strong></em>&nbsp;are the model. They are calculated for minimizing the sum of squared differences between the true <em><strong>y</strong></em> values from the training samples and their predicted <em><strong>y</strong></em> from the above formula:</p>
<p style="padding-left: 30px;">[latex]Minimize(\sum (y_i-\hat{y_i})^2)[/latex]</p>
<p>For normal distributed samples, the minimizing is possible with some matrix arithmetic, so no iterations are required. In the case <em><strong>n = 1</strong></em>&nbsp;&#8211; with only one predictor variable <em><strong>x</strong></em> &#8211; the regression formula is reduced to</p>
<p style="padding-left: 30px;">[latex]y = a + b x[/latex]</p>
<p>which is <strong>simple linear regression</strong>, as opposed to <strong>multivariate linear regression</strong> where&nbsp;<em><strong>n &gt; 1</strong></em>. Simple linear regression is available in most trading platforms, f.i. with the <a href="http://manual.zorro-project.com/ta.htm" target="_blank" rel="noopener noreferrer">LinReg</a> indicator in the TA-Lib. With <em><strong>y</strong></em> = price and <em><strong>x</strong></em> = time it&#8217;s often used as an alternative to a moving average. Multivariate linear regression is available in the R platform through the <strong>lm(..)</strong> function that comes with the standard installation. A variant is <strong>polynomial regression</strong>. Like simple regression it uses only one predictor variable <em><strong>x</strong></em>, but also its square and higher degrees, so that&nbsp;<em><strong>x<sub>n</sub> == x<sup>n</sup></strong></em>:</p>
<p style="padding-left: 30px;">[latex]y = a_0 + a_1 x + a_2 x^2 + &#8230; + a_n x^n[/latex]</p>
<p style="text-align: left;">With <em><strong>n = 2</strong></em> or <em><strong>n = 3</strong></em>, polynomial regression is often used to predict the next average price from the smoothed prices of the last bars. The <a href="http://manual.zorro-project.com/polyfit.htm" target="_blank" rel="noopener noreferrer">polyfit</a> function of MatLab, R, Zorro, and many other platforms can be used for polynomial regression.</p>
<h3>4. Perceptron</h3>
<p>Often referred to as a neural network with only one neuron. In fact a perceptron is a regression function like above, but with a binary result, thus called <strong>logistic regression</strong>. It&#8217;s not regression though, it&#8217;s a classification algorithm. Zorro&#8217;s <strong>advise(PERCEPTRON, &#8230;)</strong> function generates C code that returns either 100 or -100, dependent on whether the predicted result is above a threshold or not:<br />
<!--?prettify linenums=true?--></p>
<pre class="prettyprint">int predict(double* sig)
{
  if(-27.99*sig[0] + 1.24*sig[1] - 3.54*sig[2] &gt; -21.50)
    return 100;
  else
    return -100;
}</pre>
<p>You can see that the <strong>sig</strong> array is equivalent to the features&nbsp;<em><strong>x<sub>n</sub></strong></em> in the regression formula, and the numeric factors are the coefficients&nbsp;<em><strong>a<sub>n</sub></strong></em>.</p>
<h3>5. N<span style="line-height: 1.75;">eural networks</span></h3>
<p>Linear or logistic regression can only solve linear problems. Many do not fall into this category &#8211; a famous example is predicting the output of a simple XOR function. And most likely also predicting prices or trade returns. An artificial neural network (ANN) can tackle nonlinear problems. It&#8217;s a bunch of perceptrons that are connected together in an array of <strong>layers</strong>. Any perceptron is a <strong>neuron</strong> of the net. Its output goes to the inputs of all neurons of the next layer, like this:</p>
<p><a href="http://www.financial-hacker.com/wp-content/uploads/2016/03/neural-network.png"><img fetchpriority="high" decoding="async" class="alignnone wp-image-1752 size-full" src="http://www.financial-hacker.com/wp-content/uploads/2016/03/neural-network.png" width="500" height="309" srcset="https://financial-hacker.com/wp-content/uploads/2016/03/neural-network.png 500w, https://financial-hacker.com/wp-content/uploads/2016/03/neural-network-300x185.png 300w" sizes="(max-width: 500px) 85vw, 500px" /></a></p>
<p>Like the perceptron, a neural network also learns by determining the coefficients that minimize the error between sample prediction and sample target.&nbsp;But this requires now an approximation process, normally with <strong>backpropagating</strong> the error from the output to the inputs, optimizing the weights on its way. This process imposes two restrictions. First, the neuron outputs must now be continuously differentiable functions instead of the simple perceptron threshold. Second, the network must not be too deep &#8211; it must not have too many &#8216;hidden layers&#8217; of neurons between inputs and output.&nbsp;This second restriction limits the complexity of problems that&nbsp;a standard neural network&nbsp;can solve.</p>
<p>When using a neural network for predicting trades, you have a lot of parameters with which you can play around and, if you&#8217;re not careful, produce a lot of <strong>selection bias</strong>:</p>
<ul style="list-style-type: square;">
<li>Number of hidden layers</li>
<li>Number of neurons&nbsp;per hidden layer</li>
<li>Number of backpropagation cycles, named <strong>epochs</strong></li>
<li>Learning rate, the step width of an epoch</li>
<li>Momentum, an inertia factor for the weights adaption</li>
<li>Activation function</li>
</ul>
<p>The <strong>activation function</strong>&nbsp;emulates the perceptron threshold. For the backpropagation you need a continuously differentiable function that generates a &#8216;soft&#8217; step at a certain x value. Normally a <strong>sigmoid</strong>,&nbsp;<strong>tanh</strong>, or <strong>softmax</strong> function is used. Sometimes it&#8217;s also a <strong>linear</strong> function that just returns the weighted sum of all inputs. In this case the network can be used for regression, for predicting a numeric value instead of a binary outcome.</p>
<p>Neural networks are available in the standard <strong>R</strong> installation (<strong>nnet</strong>, a single hidden layer network) and in many packages, for instance <strong>RSNNS&nbsp;</strong>and <strong>FCNN4R</strong>.</p>
<h3>6. Deep learning</h3>
<p>Deep learning methods use neural networks with many hidden layers and thousands of neurons, which could not be effectively trained anymore by conventional backpropagation. Several methods became popular in the last years for training such huge networks. They usually pre-train the hidden neuron layers for achieving a more effective learning process. A <strong>Restricted Boltzmann Machine</strong> (<strong>RBM</strong>) is an unsupervised classification algorithm with a special network structure that has no connections between the hidden neurons. A&nbsp;<strong>Sparse Autoencoder</strong> (<strong>SAE</strong>) uses a conventional network structure, but pre-trains the hidden layers in a clever way by reproducing the input signals on the layer outputs with as few active connections as possible. Those methods allow very complex networks for tackling very complex learning tasks. Such as beating the world&#8217;s best human Go player.</p>
<p>Deep learning networks are available in the <strong>deepnet</strong> and <strong>darch</strong> R packages. Deepnet provides an autoencoder, Darch a restricted Boltzmann machine. I have not yet experimented with Darch, but here&#8217;s an example R script using the Deepnet autoencoder with 3 hidden layers for trade signals through Zorro&#8217;s <strong>neural()</strong> function:</p>
<pre class="prettyprint">library('deepnet', quietly = T) 
library('caret', quietly = T)

# called by Zorro for training
neural.train = function(model,XY) 
{
  XY &lt;- as.matrix(XY)
  X &lt;- XY[,-ncol(XY)] # predictors
  Y &lt;- XY[,ncol(XY)]  # target
  Y &lt;- ifelse(Y &gt; 0,1,0) # convert -1..1 to 0..1
  Models[[model]] &lt;&lt;- sae.dnn.train(X,Y,
      hidden = c(50,100,50), 
      activationfun = "tanh", 
      learningrate = 0.5, 
      momentum = 0.5, 
      learningrate_scale = 1.0, 
      output = "sigm", 
      sae_output = "linear", 
      numepochs = 100, 
      batchsize = 100,
      hidden_dropout = 0, 
      visible_dropout = 0)
}

# called by Zorro for prediction
neural.predict = function(model,X) 
{
  if(is.vector(X)) X &lt;- t(X) # transpose horizontal vector
  return(nn.predict(Models[[model]],X))
}

# called by Zorro for saving the models
neural.save = function(name)
{
  save(Models,file=name) # save trained models
}

# called by Zorro for initialization
neural.init = function()
{
  set.seed(365)
  Models &lt;&lt;- vector("list")
}

# quick OOS test for experimenting with the settings
Test = function() 
{
  neural.init()
  XY &lt;&lt;- read.csv('C:/Project/Zorro/Data/signals0.csv',header = F)
  splits &lt;- nrow(XY)*0.8
  XY.tr &lt;&lt;- head(XY,splits) # training set
  XY.ts &lt;&lt;- tail(XY,-splits) # test set
  neural.train(1,XY.tr)
  X &lt;&lt;- XY.ts[,-ncol(XY.ts)]
  Y &lt;&lt;- XY.ts[,ncol(XY.ts)]
  Y.ob &lt;&lt;- ifelse(Y &gt; 0,1,0)
  Y &lt;&lt;- neural.predict(1,X)
  Y.pr &lt;&lt;- ifelse(Y &gt; 0.5,1,0)
  confusionMatrix(Y.pr,Y.ob) # display prediction accuracy
}</pre>
<h3>7. Support vector machines</h3>
<p>Like a neural network, a support vector machine (SVM) is another extension of linear regression. When we look at the regression formula again,</p>
<p style="padding-left: 30px;">[latex]y = a_0 + a_1 x_1 + &#8230; + a_n x_n[/latex]</p>
<p>we can interpret the features <em><strong>x<sub>n</sub></strong></em>&nbsp;as coordinates of a <em><strong>n</strong></em>-dimensional <strong>feature space</strong>. Setting the target variable <em><strong>y</strong></em>&nbsp;to a fixed value determines a plane in that space, called a <strong>hyperplane</strong> since it has more than two (in fact,&nbsp;<em><strong>n-1</strong></em>) dimensions. The hyperplane separates the samples with <em><strong>y &gt; o</strong></em> from the samples with <em><strong>y &lt; 0</strong></em>. The <em><strong>a<sub>n</sub></strong></em> coefficients can be calculated in a way that the distances of the plane to the nearest samples &#8211; which are called the &#8216;support vectors&#8217; of the plane,&nbsp;hence the algorithm name &#8211; is maximum. This way we have a binary classifier with optimal separation of winning and losing samples.</p>
<p>The problem: normally those samples are not <strong>linearly separable</strong> &#8211; they are scattered around irregularly in the feature space. No flat plane can be squeezed between winners and losers. If it could, we had simpler methods to calculate that plane, f.i.&nbsp;<strong>linear discriminant analysis</strong>. But for the common case we need the SVM trick: Adding more dimensions to the feature space. For this the SVM algorithm produces more features with a <strong>kernel function</strong> that combines any two existing predictors to a new feature.&nbsp;This is analogous to the step above from the simple regression to polynomial regression, where also more features are added by taking the sole predictor to the n-th power. The more dimensions you add, the easier it is to separate the samples with a flat hyperplane. This plane is then transformed back to the original n-dimensional space, getting wrinkled and crumpled on the way. By clever selecting the kernel function, the process can be performed without actually computing the transformation.</p>
<p>Like neural networks, SVMs can be used not only for classification, but also for regression. They also offer some parameters for optimizing and possibly overfitting the prediction process:</p>
<ul style="list-style-type: square;">
<li>Kernel function. You normally use a RBF kernel (radial basis function, a symmetric kernel), but you also have the choice of other kernels, such as sigmoid, polynomial, and linear.</li>
<li>Gamma, the width of the RBF kernel</li>
<li>Cost parameter C, the &#8216;penalty&#8217; for wrong classifications in the training samples</li>
</ul>
<p>An&nbsp;often used SVM is the <strong>libsvm</strong> library. It&#8217;s also available in R in the <strong>e1071</strong> package. In the next and final part of this series I plan to describe a trading strategy using this SVM.</p>
<h3>8. K-Nearest neighbor</h3>
<p>Compared with the heavy ANN and SVM stuff, that&#8217;s a nice simple algorithm with a unique property: It needs no training. So the samples are the model. You could use this algorithm for a trading system that learns permanently by simply adding more and more samples. The nearest neighbor algorithm computes the distances in feature space from the current feature values to the <em><strong>k</strong></em> nearest samples. A&nbsp;distance in n-dimensional space between two feature sets <em><strong>(x<sub>1</sub> .. x<sub>n</sub>)</strong></em> and <em><strong>(y<sub>1</sub> .. y<sub>n</sub>)</strong></em> is calculated just as in 2 dimensions:</p>
<p>[latex display=&#8221;true&#8221;]D = \sqrt{(x_1-y_1)^2 + (x_2-y_2)^2 + &#8230; + (x_n-y_n)^2}[/latex]</p>
<p>The algorithm simply predicts the target from the average of the <em><strong>k</strong></em> target variables of the nearest samples, weighted by their inverse distances. It can be used for classification as well as for regression. Software tricks borrowed from computer graphics, such as an <strong>adaptive binary tree</strong> (ABT), can make the nearest neighbor search pretty fast. In my past life as computer game programmer, we used such methods in games for tasks like self-learning enemy intelligence. You can call the <strong>knn</strong> function in R for nearest neighbor prediction &#8211; or write a simple function in C for that purpose.</p>
<h3>9. K-Means</h3>
<p>This is an approximation algorithm for unsupervised classification. It has some similarity, not only its name, to k-nearest neighbor. For classifying the samples, the algorithm first places <em><strong>k</strong></em> random points in the feature space. Then it assigns to any of those points all the samples with the smallest distances to it. The point is then moved to the <strong>mean</strong> of these nearest samples. This will generate a new samples assignment, since some samples are now closer to another point. The process is repeated until&nbsp;the assignment does not change anymore by moving the points, i.e. each point lies exactly at the mean of its nearest samples. We now have <em><strong>k</strong></em> classes of samples, each in the neighborhood of one of the <em><strong>k</strong></em> points.</p>
<p>This simple algorithm can produce surprisingly good results. In R, the <strong>kmeans</strong> function does the trick. An example of the k-means algorithm for classifying candle patterns can be found here: <a href="http://robotwealth.com/unsupervised-candlestick-classification-for-fun-and-profit-part-1/" target="_blank" rel="noopener noreferrer">Unsupervised candlestick classification for fun and profit</a>.</p>
<h3>10. Naive Bayes</h3>
<p>This algorithm uses <strong>Bayes&#8217; Theorem</strong> for classifying samples of non-numeric features (i.e. <strong>events</strong>), such as the above mentioned <strong>candle patterns</strong>. Suppose that an event&nbsp;<em><strong>X</strong></em> (for instance, that the Open of the previous bar is below the Open of the current bar) appears in 80% of all winning samples. What is then the probability that a sample is winning when it contains event&nbsp;<em><strong>X</strong></em>? It&#8217;s not 0.8 as you might think. The probability can be calculated with Bayes&#8217; Theorem:</p>
<p>[latex display=&#8221;true&#8221;]P(Y \vert X) = \frac{P(X \vert Y) P(Y)}{P(X)}[/latex]</p>
<p><em><strong>P(Y|X)</strong></em> is the probability that event <em><strong>Y</strong></em> (f.i. winning) occurs in all samples containing event <em><strong>X</strong></em>&nbsp;(in our example, <em><strong>Open(1) &lt; Open(0)</strong></em>). According to the formula, it is equal to the probability of <em><strong>X</strong></em>&nbsp;occurring in all winning samples (here, 0.8), multiplied by the probability of <em><strong>Y</strong></em> in all samples (around 0.5 when you were following my above advice of balanced samples) and divided by the probability of <em><strong>X</strong></em> in all samples.</p>
<p>If we are naive and assume that all events <em><strong>X</strong></em> are independent of each other, we can calculate the overall probability that a sample is winning by simply multiplying the probabilities <em><strong>P</strong><strong>(X|winning)</strong></em> for every event <em><strong>X</strong></em>. This way we end up with this formula:</p>
<p>[latex display=&#8221;true&#8221;]P(Y | X_1 .. X_n) ~=~ s~P(Y) \prod_{i}{P(X_i | Y)}[/latex]</p>
<p>with a scaling factor <em><strong>s</strong></em>. For the formula to work, the features should be selected in a way that they are as independent as possible, which imposes an obstacle for using Naive Bayes in trading. For instance, the two events <em><strong>Close(1) &lt; Close(0)</strong></em> and <em><strong>Open(1) &lt; Open(0)</strong></em> are most likely not independent of each other. Numerical predictors can be converted to events by dividing the number into separate ranges.</p>
<p>The Naive Bayes algorithm is available in the ubiquitous <strong>e1071</strong> R package.</p>
<h3>11. Decision and regression trees</h3>
<p>Those trees predict an outcome or a numeric value based on a series of yes/no decisions, in a structure like the branches of a tree. Any decision is either the presence of an event or not (in case of non-numerical features) or a comparison of a feature value with a fixed threshold. A typical tree function, generated by Zorro&#8217;s tree builder, looks like this:</p>
<pre class="prettyprint">int tree(double* sig)
{
  if(sig[1] &lt;= 12.938) {
    if(sig[0] &lt;= 0.953) return -70;
    else {
      if(sig[2] &lt;= 43) return 25;
      else {
        if(sig[3] &lt;= 0.962) return -67;
        else return 15;
      }
    }
  }
  else {
    if(sig[3] &lt;= 0.732) return -71;
    else {
      if(sig[1] &gt; 30.61) return 27;
      else {
          if(sig[2] &gt; 46) return 80;
          else return -62;
      }
    }
  }
}</pre>
<p>How is such a tree produced from a set of samples? There are several methods; Zorro uses the <strong>Shannon i</strong><strong>nformation entropy</strong>, which already had an appearance on this blog in the <a href="http://www.financial-hacker.com/is-scalping-irrational/" target="_blank" rel="noopener noreferrer">Scalping</a> article. At first it checks one of the features, let&#8217;s say&nbsp;<em><strong>x<sub>1</sub></strong></em>. It places a hyperplane with the plane formula&nbsp;<em><strong>x<sub>1</sub>&nbsp;= t&nbsp;</strong></em>into the feature space. This hyperplane separates the samples with <em><strong>x<sub>1</sub> &gt; t</strong></em> from the samples with <em><strong>x<sub>1</sub> &lt; t</strong></em>. The dividing threshold <strong><em>t</em></strong> is selected so that the <strong>information gain</strong> &#8211; the difference of information entropy of the whole space, to the sum of information entropies of the two divided sub-spaces &#8211; is maximum. This is the case when the samples in the subspaces are more similar to each other than the samples in the whole space.</p>
<p>This process is then repeated with the next feature&nbsp;<em><strong>x<sub>2</sub></strong></em>&nbsp;and two hyperplanes splitting the two subspaces. Each split is equivalent to a comparison of a feature with a threshold. By repeated splitting, we soon get a huge tree with thousands of threshold comparisons. Then the process is run backwards by <strong>pruning</strong> the tree and removing all decisions that do not lead to substantial information gain. Finally we end up with a relatively small tree as in the code above.</p>
<p>Decision trees have a wide range of applications. They can produce excellent predictions superior to those of neural networks or support vector machines. But they are not a one-fits-all solution, since their splitting planes are always parallel to the axes of the feature space. This somewhat limits their predictions. They can be used not only for classification, but also for regression, for instance by returning the percentage of samples contributing to a certain branch of the tree. Zorro&#8217;s tree is a regression tree. The best known classification tree algorithm is <strong>C5.0</strong>, available in the <strong>C50</strong> package for R.</p>
<p>For improving the prediction even further or overcoming the parallel-axis-limitation, an ensemble of trees can be used, called a <strong>random forest</strong>. The prediction is then generated by averaging or voting the predictions from the single trees. Random forests are available in R packages <strong>randomForest</strong>, <strong>ranger</strong> and <strong>Rborist</strong>.</p>
<h3>Conclusion</h3>
<p>There are many different data mining and machine learning methods at your disposal. The critical question: what is better, a <a href="http://www.financial-hacker.com/build-better-strategies-part-2-model-based-systems/" target="_blank" rel="noopener noreferrer">model-based</a>&nbsp;or a machine learning strategy? There is no doubt that machine learning has a lot of advantages. You don&#8217;t need to care about market microstructure, economy, trader psychology, or similar soft stuff. You can concentrate on pure mathematics. Machine learning is a much more elegant, more attractive way to generate trade systems. It has all advantages on its side but one. Despite all the enthusiastic threads on trader forums, it&nbsp;tends to mysteriously fail in live trading.</p>
<p>Every second week a new paper about trading with machine learning methods is published (a few can be found below). Please take all those publications with a grain of salt. According to some papers, <strong>phantastic win rates</strong> in the range of 70%, 80%, or even 85% have been achieved. Although win rate is not the only relevant criterion &#8211; you can lose even with a high win rate &#8211; 85% accuracy in predicting trades is normally equivalent to a profit factor above 5. With such a system the involved scientists should be billionaires meanwhile. Unfortunately I never managed to reproduce those win rates with the described method, and didn&#8217;t even come close. So maybe a lot of selection bias went into the results. Or maybe I&#8217;m just too stupid.</p>
<p>Compared with model based strategies, I&#8217;ve seen not many successful machine learning systems so far. And from what one hears about the algorithmic methods by successful hedge funds, machine learning seems still rarely to be used. But maybe this will change in the future with the availability of more processing power and the upcoming of new algorithms for deep learning.</p>
<h3>Papers</h3>
<ol>
<li>Classification using deep neural networks: <a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=2756331" target="_blank" rel="noopener noreferrer">Dixon.et.al.2016</a></li>
<li>Predicting price direction using&nbsp;ANN &amp; SVM: &nbsp;<a href="https://www.researchgate.net/profile/Melek_Boyacioglu/publication/222043783_Predicting_direction_of_stock_price_index_movement_using_artificial_neural_networks_and_support_vector_machines_The_sample_of_the_Istanbul_Stock_Exchange/links/02e7e51815672d58af000000.pdf" target="_blank" rel="noopener noreferrer">Kara.et.al.2011</a></li>
<li>Empirical comparison of learning algorithms: <a href="https://www.cs.cornell.edu/~caruana/ctp/ct.papers/caruana.icml06.pdf" target="_blank" rel="noopener noreferrer">Caruana.et.al.2006</a></li>
<li>Mining stock market tendency using GA &amp; SVM: <a href="http://ww.svms.org/finance/YuWangLai2005.pdf" target="_blank" rel="noopener noreferrer">Yu.Wang.Lai.2005</a></li>
</ol>
<p>The next part of this series will deal with the practical development of a machine learning strategy.</p>
<p style="text-align: right;"><strong>⇒&nbsp;<a href="http://www.financial-hacker.com/build-better-strategies-part-5-developing-a-machine-learning-system/">Build Better Strategies – Part 5</a></strong></p>
]]></content:encoded>
					
					<wfw:commentRss>https://financial-hacker.com/build-better-strategies-part-4-machine-learning/feed/</wfw:commentRss>
			<slash:comments>44</slash:comments>
		
		
			</item>
		<item>
		<title>Is &#8220;Scalping&#8221; Irrational?</title>
		<link>https://financial-hacker.com/is-scalping-irrational/</link>
					<comments>https://financial-hacker.com/is-scalping-irrational/#comments</comments>
		
		<dc:creator><![CDATA[jcl]]></dc:creator>
		<pubDate>Fri, 09 Oct 2015 16:45:41 +0000</pubDate>
				<category><![CDATA[Indicators]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[System Evaluation]]></category>
		<category><![CDATA[Entropy]]></category>
		<category><![CDATA[Experiment]]></category>
		<category><![CDATA[HFT]]></category>
		<category><![CDATA[Information]]></category>
		<category><![CDATA[Shannon]]></category>
		<category><![CDATA[Zorro]]></category>
		<guid isPermaLink="false">http://www.financial-hacker.com/?p=483</guid>

					<description><![CDATA[Clients often ask for strategies that trade on very short time frames. Some are possibly inspired by &#8220;I just made $2000 in 5 minutes&#8221; stories on trader forums. Others have heard of High Frequency Trading: the higher the frequency, the better must be the trading! The Zorro developers had been pestered for years until they &#8230; <a href="https://financial-hacker.com/is-scalping-irrational/" class="more-link">Continue reading<span class="screen-reader-text"> "Is &#8220;Scalping&#8221; Irrational?"</span></a>]]></description>
										<content:encoded><![CDATA[<p>Clients often ask for strategies that trade on <strong>very short time frames</strong>. Some are possibly inspired by &#8220;I just made $2000 in 5 minutes&#8221; stories on trader forums. Others have heard of <a href="http://www.financial-hacker.com/hacking-hft-systems/" target="_blank" rel="noopener"><strong>High Frequency Trading</strong></a>: the higher the frequency, the better must be the trading! The <strong><a href="http://www.financial-hacker.com/hackers-tools-zorro-and-r/">Zorro</a></strong> developers had been pestered for years until they finally implemented tick histories and millisecond time frames. <strong>Totally useless features?</strong> Or has short term algo trading indeed some quantifiable advantages? An experiment for looking into that matter produced a <strong>surprising result</strong>.<span id="more-483"></span></p>
<p>It is certainly tempting to earn profits within minutes. Additionally, short time frames produce more bars and trades &#8211; a great advantage for strategy development. The quality of test and training depends on the amount of data, and timely price data is always in short supply. Still, scalping &#8211; opening and closing trades in minutes or seconds &#8211; is largely considered nonsense and irrational by algo traders. Four main reasons are given:</p>
<ol>
<li>Short time frames cause high <strong>trading costs</strong> &#8211; slippage, spread, commission &#8211; in relation to the expected profit.</li>
<li>Short time frames expose more <strong>&#8216;noise&#8217;,</strong> <strong>&#8216;randomness&#8217;</strong> and <strong>&#8216;artifacts&#8217;</strong> in the price curve, which reduces profit and increases risk.</li>
<li>Any algorithms had to be individually adapted to the broker or price data provider due to <strong>price feed dependency</strong> in short time frames.</li>
<li>Algorithmic strategies usually <strong>cease working</strong> below a certain time frame.</li>
</ol>
<p>Higher costs, less profit, more risk, feed dependency, no working strategies &#8211; seemingly good arguments against scalping (HFT is a very different matter). But never trust common wisdom, especially not in trading. That&#8217;s why I had not yet added scalping to my <a href="http://www.financial-hacker.com/seventeen-popular-trade-strategies-that-i-dont-really-understand/">list of irrational trade methods</a>. I can confirm reasons number 3 and 4 from my own experiences: Below bar periods of about 10 minutes, backtests with price histories from different brokers began to produce noticeably different results. And I never managed to develop a strategy with a significantly positive walk-forward test on bar periods less than 30 minutes. But this does not mean that such a strategy does not exist. Maybe short time frames just need special trade methods?</p>
<p>So I&#8217;ve programmed an experiment for finding out once and for all if scalping is really as bad as it&#8217;s rumored to be. Then I can at least give some reasoned advice to the next client who desires a tick-triggered short-term trading strategy.</p>
<h3>Trading costs examined</h3>
<p>The first part of the experiment is easily done: a statistic of the impact of trading costs. Higher costs obviously require more profits for compensation. How many trades must you win for overcoming the trading costs at different time frames? Here&#8217;s a short script (in C, for Zorro) for answering this question:</p>
<pre class="prettyprint">function run()
{
  BarPeriod = 1;
  LookBack = 1440;
  Commission = 0.60;
  Spread = 0.5*PIP;

  int duration = 1, i = 0;
  if(!is(LOOKBACK))
    while(duration &lt;= 1440)
  { 
    var Return = abs(priceClose(0)-priceClose(duration))*PIPCost/PIP;
    var Cost = Commission*LotAmount/10000. + Spread*PIPCost/PIP;
    var Rate = ifelse(Return &gt; Cost, Cost/(2*Return) + 0.5, 1.);

    plotBar("Min Rate",i++,duration,100*Rate,AVG+BARS,RED); 
 
    if(duration &lt; 10) duration += 1;
    else if(duration &lt; 60) duration += 5;
    else if(duration &lt; 180) duration += 30;
    else duration += 60;
  }
  Bar += 100; // hack!
}</pre>
<p>This script calculates the minimum win rate to compensate the trade costs for different trade durations. We assumed here a spread of <strong>0.5 pips</strong> and a round turn commission of <strong>60 cents</strong> per 10,000 contracts &#8211; that&#8217;s average costs of a Forex trade. <strong>PIPCost/PIP</strong> in the above script is the conversion factor from a price difference to a win or loss on the account. We&#8217;re also assuming no win/loss bias: Trades shall win or lose on average the same amount. This allows us to split the <strong>Return</strong> of any trade in a win and a loss, determined by <strong>WinRate</strong>. The win is <strong>WinRate * Return</strong> and the loss is <strong>(1-WinRate) * Return</strong>. For breaking even, the win minus the loss must cover the cost. The required win rate for this is</p>
<p style="padding-left: 30px; text-align: center;"><em><strong>WinRate = Cost/(2*Return) + 0.5</strong></em></p>
<p>The win rate is averaged over all bars and plotted in a histogram of trade durations from 1 minute up to 1 day. The duration is varied in steps of 1, 5, 30, and 60 minutes. We&#8217;re entering a trade for any duration every 101 minutes (<strong>Bar += 100</strong> in the script is a hack for running the simulation in steps of 101 minutes, while still maintaining the 1-minute bar period).</p>
<p>The script needs a few seconds to run, then produces this histogram (for EUR/USD and 2015):</p>
<figure id="attachment_524" aria-describedby="caption-attachment-524" style="width: 889px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2015/10/scalp11.png"><img decoding="async" class="wp-image-524 size-full" src="http://www.financial-hacker.com/wp-content/uploads/2015/10/scalp11.png" alt="" width="889" height="513" srcset="https://financial-hacker.com/wp-content/uploads/2015/10/scalp11.png 889w, https://financial-hacker.com/wp-content/uploads/2015/10/scalp11-300x173.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /></a><figcaption id="caption-attachment-524" class="wp-caption-text">Required win rate in percent vs. trade duration in minutes</figcaption></figure>
<p>You need about <strong>53% win rate</strong> for covering the costs of 1-day trades (rightmost bar), but <strong>90% win rate </strong>for 1-minute trades! Or alternatively, a 9:1 reward to risk ratio at 50% win rate. This exceeds the best performances of real trading systems by a large amount, and seems to confirm convincingly the first reason why you better take tales by scalping heroes on trader forums with a grain of salt.</p>
<p>But what about reason number two &#8211; that short time frames are plagued with &#8216;noise&#8217; and &#8216;randomness&#8217;? Or is it maybe the other way around and some effect makes short time frames even more predictable? That&#8217;s a little harder to test.</p>
<h3>Measuring randomness</h3>
<p>&#8216;Noise&#8217; is often identified with the high-frequency components of a signal. Naturally, short time frames produce more high-frequency components than long time frames. They could be detected with a highpass filter, or eliminated with a lowpass filter. Only problem: <strong>Price curve noise</strong> is not always related to high frequencies. Noise is just the part of the curve that does not carry information about the trading signal. For cycle trading, high frequencies are the signal and low-frequency trend is the noise. So the jaggies and ripples of a short time frame price curve might be just the very inefficiencies that you want to exploit. It depends on the strategy what noise is; there is no &#8216;general price noise&#8217;.</p>
<p>Thus we need a better criteria for determining the tradeability of a price curve. That criteria is <strong>randomness</strong>. You can not trade a random market, but you can potentially trade anything that deviates from randomness. Randomness can be measured through the <strong>information content</strong> of the price curve. A good measure of information content is the <strong>Shannon Entropy</strong>. It is defined this way:</p>
<p><a href="http://www.financial-hacker.com/wp-content/uploads/2015/10/shannon.png"><img decoding="async" class="wp-image-529 size-full aligncenter" src="http://www.financial-hacker.com/wp-content/uploads/2015/10/shannon.png" alt="" width="260" height="47" /></a></p>
<p>This formula basically measures disorder. A very ordered, predictable signal has low entropy. A random, unpredictable signal has high entropy. In the formula, <em><strong>P(s<sub>i</sub>)</strong></em> is the relative frequency of a certain pattern <em><strong>s<sub>i </sub></strong></em>in the signal <em><strong>S</strong></em>. The entropy is at maximum when all patterns are evenly distributed and all <em><strong>P(s<sub>i</sub>)</strong></em> have about the same value. If some patterns appear more frequently than other patterns, the entropy goes down. The signal is then less random and more predictable. The Shannon Entropy is measured in <strong>bit</strong>.</p>
<p>The problem: Zorro has tons of indicators, even the Shannon Gain, but not the Shannon Entropy! So I have no choice but to write a new indicator, which fortunately is my job anyway. This is the source code of the Shannon Entropy of a char string:</p>
<pre class="prettyprint">var ShannonEntropy(char *S,int Length)
{
  static var Hist[256];
  memset(Hist,0,256*sizeof(var));
  var Step = 1./Length;
  int i;
  for(i=0; i&lt;Length; i++) 
    Hist[S[i]] += Step;
  var H = 0;
  for(i=0;i&lt;256;i++) {
    if(Hist[i] &gt; 0.)
      H -= Hist[i]*log2(Hist[i]);
  }
  return H;
}</pre>
<p>A char has 8 bit, so 2<sup>8</sup> = 256 different chars can appear in a string. The frequency of each char is counted and stored in the <strong>Hist</strong> array. So this array contains the <em><strong>P(s<sub>i</sub>)</strong> </em>of the above entropy formula. They are multiplied with their binary logarithm and summed up; the result is <em><strong>H(S)</strong></em>, the Shannon Entropy.</p>
<p>In the above code, a char is a pattern of the signal. So we need to convert our price curve into char patterns. This is done by a second <strong>ShannonEntropy</strong> function that calls the first one:</p>
<pre class="prettyprint">var ShannonEntropy(var *Data,int Length,int PatternSize)
{
  static char S[1024]; // hack!
  int i,j;
  int Size = min(Length-PatternSize-1,1024);
  for(i=0; i&lt;Size; i++) {
    int C = 0;
    for(j=0; j&lt;PatternSize; j++) {
    if(Data[i+j] &gt; Data[i+j+1])
      C += 1&lt;&lt;j;
    }
    S[i] = C;
  }
  return ShannonEntropy(S,Size);
}</pre>
<p><strong>PatternSize</strong> determines the partitioning of the price curve. A pattern is defined by a number of price changes. Each price is either higher than the previous price, or it is not; this is a binary information and constitutes one bit of the pattern. A pattern can consist of up to 8 bits, equivalent to 256 combinations of price changes. The patterns are stored in a char string. Their entropy is then determined by calling the first <strong>ShannonEntropy</strong> function with that string (both functions have the same name, but the compiler can distinguish them from their different parameters). Patterns are generated from any price and the subsequent <strong>PatternSize</strong> prices; then the procedure is repeated with the next price. So the patterns overlap.</p>
<h3>An unexpected result</h3>
<p>Now we only need to produce a histogram of the Shannon Entropy, similar to the win rate in our first script:</p>
<pre class="prettyprint">function run()
{
  BarPeriod = 1;
  LookBack = 1440*300;
  StartWeek = 10000;
 
  int Duration = 1, i = 0;
  while(Duration &lt;= 1440)
  { 
    TimeFrame = frameSync(Duration);
    var *Prices = series(price(),300);

    if(!is(LOOKBACK) &amp;&amp; 0 == (Bar%101)) {
      var H = ShannonEntropy(Prices,300,3);
      plotBar("Randomness",i++,Duration,H,AVG+BARS,BLUE);	
    }
    if(Duration &lt; 10) Duration += 1;
    else if(Duration &lt; 60) Duration += 5;
    else if(Duration &lt; 240) Duration += 30;
    else if(Duration &lt; 720) Duration += 120;
    else Duration += 720;
  }
}</pre>
<p>The entropy is calculated for all time frames at every 101th bar, determined with the modulo function. (Why 101? In such cases I&#8217;m using odd numbers for preventing synchronization effects). I cannot use here the hack with skipping the next 100 bars as in the previous script, as skipping bars would prevent proper shifting of the price series. That&#8217;s why this script must really grind through any minute of 3 years, and needs several minutes to complete.</p>
<p>Two code lines should be explained because they are critical for measuring the entropy of daily candles using less-than-a-day bar periods:</p>
<p><strong>StartWeek = 10000;</strong></p>
<p>This starts the week at Monday midnight (<strong>1</strong> = Monday, <strong>00 00</strong> = midnight) instead of Sunday 11 pm. This line was missing at first and I wondered why the entropy of daily candles was higher than I expected. Reason:  The single Sunday hour at 11 pm counted as a full day and noticeably increased the randomness of daily candles.</p>
<p><strong>TimeFrame = frameSync(Duration);</strong></p>
<p>This synchronizes the time frame to full hours respectively days. If this is missing, the Shannon Entropy of daily candles gets again a too high value since the candles are not in sync with a day anymore. A day has often less than 1440 one-minute bars due to weekends and irregularities in the historical data.</p>
<p>The Shannon Entropy is calculated with a pattern size of 3 price changes, resulting in 8 different patterns. 3 bit is the maximum entropy for 8 patterns. As price changes are not completely random, I expected an entropy value slightly smaller than 3, steadily increasing when time frames are decreasing. However I got this interesting histogram (EUR/USD, 2013-2015, FXCM price data):</p>
<figure id="attachment_569" aria-describedby="caption-attachment-569" style="width: 637px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2015/10/scalp2_32.png"><img loading="lazy" decoding="async" class="wp-image-569 size-full" src="http://www.financial-hacker.com/wp-content/uploads/2015/10/scalp2_32.png" alt="" width="637" height="513" srcset="https://financial-hacker.com/wp-content/uploads/2015/10/scalp2_32.png 637w, https://financial-hacker.com/wp-content/uploads/2015/10/scalp2_32-300x242.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px" /></a><figcaption id="caption-attachment-569" class="wp-caption-text">Entropy vs. time frame (minutes)</figcaption></figure>
<p>The entropy is almost, but not quite 3 bit. This confirms that price patterns are not absolutely random. We can see that the 1440 minutes time frame has the lowest Shannon Entropy at about 2.9 bit. This was expected, as the daily cycle has a strong effect on the price curve, and daily candles are thus more regular than candles of other time frames. For this reason price action or price pattern algorithms often use daily candles. The entropy increases with decreasing time frames, but only down to time frames  of about ten minutes. Even lower time frames are actually less random!</p>
<p>This is an unexpected result. The lower the time frame, the less price quotes does it contain, so the impact of chance should be in fact higher. But the opposite is the case. I could produce similar results with other patterns of 4 and 5 bit, and also with other assets. For making sure I continued the experiment with a different, tick-based price history and even shorter time frames of 2, 5, 10, 15, 30, 45, and 60 seconds (Zorro&#8217;s &#8220;useless&#8221; micro time frames now came in handy, after all):</p>
<figure id="attachment_601" aria-describedby="caption-attachment-601" style="width: 205px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2015/10/scalp2_41.png"><img loading="lazy" decoding="async" class="wp-image-601 size-full" src="http://www.financial-hacker.com/wp-content/uploads/2015/10/scalp2_41.png" alt="" width="205" height="513" srcset="https://financial-hacker.com/wp-content/uploads/2015/10/scalp2_41.png 205w, https://financial-hacker.com/wp-content/uploads/2015/10/scalp2_41-120x300.png 120w" sizes="(max-width: 205px) 85vw, 205px" /></a><figcaption id="caption-attachment-601" class="wp-caption-text">Entropy vs. time frame (seconds)</figcaption></figure>
<p>The x axis is now in second units instead of minutes. We see that price randomness continues to drop with the time frame.</p>
<p>There are several possible explanations. Price granularity is higher at low time frames due to the smaller number of ticks. High-volume trades are often split into many small parts (&#8216;<strong>iceberg trades</strong>&#8216;) and may cause a sequence of similar price quotes in short intervals. All this reduces the price entropy of short time frames. But it does not necessarily increase trade opportunities:  A series of identical quotes has zero entropy and is 100% predictable, but can not be traded. Of course, iceberg trades are still an interesting inefficiency that could theoretically be exploited &#8211; if it weren&#8217;t for the high trading costs. So that&#8217;s something to look further into only when you have direct market access and no broker fees.</p>
<p>I have again uploaded the scripts to the 2015 scripts collection. You&#8217;ll need Zorro 1.36 or above for reproducing the results. Zorro S and tick based data are needed for the second time frames.</p>
<h3>Conclusions</h3>
<ul style="list-style-type: square;">
<li>Scalping is not completely nuts. Very low time frames expose some regularity.</li>
<li>Whatever the reason, this regularity can not be exploited by retail traders due to the high costs of short term trades.</li>
<li>On time frames above 60 minutes prices become less random and more regular. This recommends long time frames for algo trading.</li>
<li>The most regular price patterns appear with 1-day bars. They also cause the least trading costs.</li>
</ul>
<h3>Papers</h3>
<p>Shannon Entropy: <a href="http://www.ueltschi.org/teaching/chapShannon.pdf" target="_blank" rel="noopener noreferrer">Lecture</a></p>
]]></content:encoded>
					
					<wfw:commentRss>https://financial-hacker.com/is-scalping-irrational/feed/</wfw:commentRss>
			<slash:comments>24</slash:comments>
		
		
			</item>
	</channel>
</rss>
