<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Deepnet &#8211; The Financial Hacker</title>
	<atom:link href="https://financial-hacker.com/tag/deepnet/feed/" rel="self" type="application/rss+xml" />
	<link>https://financial-hacker.com</link>
	<description>A new view on algorithmic trading</description>
	<lastBuildDate>Sat, 01 Jun 2019 15:31:13 +0000</lastBuildDate>
	<language>en-US</language>
	<sy:updatePeriod>
	hourly	</sy:updatePeriod>
	<sy:updateFrequency>
	1	</sy:updateFrequency>
	

<image>
	<url>https://financial-hacker.com/wp-content/uploads/2017/07/cropped-mask-32x32.jpg</url>
	<title>Deepnet &#8211; The Financial Hacker</title>
	<link>https://financial-hacker.com</link>
	<width>32</width>
	<height>32</height>
</image> 
	<item>
		<title>Deep Learning Systems for Bitcoin 1</title>
		<link>https://financial-hacker.com/deep-learning-systems-for-bitcoins-part-1/</link>
					<comments>https://financial-hacker.com/deep-learning-systems-for-bitcoins-part-1/#comments</comments>
		
		<dc:creator><![CDATA[jcl]]></dc:creator>
		<pubDate>Wed, 27 Dec 2017 10:51:27 +0000</pubDate>
				<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[No Math]]></category>
		<category><![CDATA[System Development]]></category>
		<category><![CDATA[Autoencoder]]></category>
		<category><![CDATA[Bitcoin]]></category>
		<category><![CDATA[Blockchain]]></category>
		<category><![CDATA[Boltzmann machine]]></category>
		<category><![CDATA[Cryptocurrency]]></category>
		<category><![CDATA[Deepnet]]></category>
		<category><![CDATA[Economy]]></category>
		<category><![CDATA[H2O]]></category>
		<category><![CDATA[Keras]]></category>
		<category><![CDATA[Money]]></category>
		<category><![CDATA[MxNet]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Tensorflow]]></category>
		<guid isPermaLink="false">http://www.financial-hacker.com/?p=2899</guid>

					<description><![CDATA[Since December 2017, bitcoins can not only be traded at more or less dubious exchanges, but also as futures at the CME and CBOE. And already several trading systems popped up for bitcoin and other cryptocurrencies. None of them can claim big success, with one exception. There is a very simple strategy that easily surpasses &#8230; <a href="https://financial-hacker.com/deep-learning-systems-for-bitcoins-part-1/" class="more-link">Continue reading<span class="screen-reader-text"> "Deep Learning Systems for Bitcoin 1"</span></a>]]></description>
										<content:encoded><![CDATA[<p>Since December 2017, bitcoins can not only be traded at more or less dubious exchanges, but also as futures at the CME and CBOE. And already several trading systems popped up for bitcoin and other cryptocurrencies. None of them can claim big success, with one exception. There is a very simple strategy that easily surpasses all other bitcoin systems and probably also all known historical trading systems. Its name: <strong>Buy and Hold</strong>. In the light of the extreme success of that particular bitcoin strategy, do we really need any other trading system for cryptos?<span id="more-2899"></span></p>
<h3>Bitcoin &#8211; hodl??</h3>
<p>A buy and hold strategy works extremely well when a price bubble grows, and extremely bad when it bursts. And indeed, apparently all finance and economy gurus (well, all but <a href="https://www.rt.com/news/411379-john-mcafee-bitcoin-prediction/" target="_blank" rel="noopener noreferrer">John McAfee</a>) tell you that the cryptocurrency market, and especially bitcoin, is a bubble, even a &#8220;scam with no substantial worth&#8221;, and will soon experience a crash &#8220;worse than the 17th century tulip mania&#8221; or the &#8220;18th century South Sea Company fraud&#8221;.</p>
<figure id="attachment_2917" aria-describedby="caption-attachment-2917" style="width: 681px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2017/12/bitcoin.png"><img fetchpriority="high" decoding="async" class="wp-image-2917 size-full" src="http://www.financial-hacker.com/wp-content/uploads/2017/12/bitcoin.png" alt="" width="681" height="321" srcset="https://financial-hacker.com/wp-content/uploads/2017/12/bitcoin.png 681w, https://financial-hacker.com/wp-content/uploads/2017/12/bitcoin-300x141.png 300w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 984px) 61vw, (max-width: 1362px) 45vw, 600px" /></a><figcaption id="caption-attachment-2917" class="wp-caption-text">Bubble or not?</figcaption></figure>
<p>By definition, a bubble is a price largely above the &#8216;real value&#8217; or &#8216;fair value&#8217; of an asset, and it bursts when people realize that. So what is the fair value of a bitcoin? Obviously not zero, since blockchain based currencies have (aside from their disadvantages) several advantages over traditional currencies, on the economy level as well as on the private level. Such as:</p>
<ul style="list-style-type: square;">
<li>They break the <a href="http://www.financial-hacker.com/money-and-how-to-get-it/" target="_blank" rel="noopener noreferrer">link of money and debt</a>. Cryptocurrencies don&#8217;t require the bank credit mechanism for money creation.</li>
<li>They can be used where normal money would be impractical, such as fee transfers between machines or trading in multiplayer games.</li>
<li>They allow low-cost and anonymous money transactions. At least in theory.</li>
<li>They replace banks for storing and mattresses for stashing money.</li>
</ul>
<p>I&#8217;m ready to believe that blockchain is the future of money transfer and storage. But that does not mean an ever-rising bitcoin price. Hundreds of cryptocurrencies came out in the last two years, any single of them with a better blockchain technology than bitcoin, and any good programmer can add a new coin anytime. Few will survive. Countries or big companies might sooner or later issue their own crypto tokens, as Venezuela already is attempting. The release of an official blockchain Dollar, Yuan, or Euro would leave the old bitcoin with its energy hungry transaction algorithm in thin air. Thus, when investing in bitcoin, we should not hope for a rosy future, but look for its present &#8216;real value&#8217;.</p>
<p>Due to its extreme volatility, bitcoin can not replace bank tresors. But it is already used in some situations for reducing money transfer costs, since the miners get any transaction rewarded in bitcoin. And above all, anonymity can be a substantial motive to own it. When you need a hacker to delete your drunk driving record, pay her in bitcoin. But how big is the online market for illegal hacker jobs, kill contracts, money laundering, drugs, weapons, or pro-Trump facebook advertisements? No one knows, but when we compare it with cash, another form of anonymous payment, we get interesting results.</p>
<p>The current cash in circulation in the US is approximately $1.5 trillion dollars. And the current bitcoin supply, about 17 million bitcoins, represents a total value of about $250 billion. Which means that you can already replace 15% of all US cash with bitcoin! Not to mention all the other cryptos. I fear that this supply already exceeds the demand of anonymous online payment for today and also the next future.</p>
<p>For those reasons, a bitcoin &#8220;hodl&#8221; system, despite its extreme historical performance, is high risk. We don&#8217;t know when and how the bubble will burst &#8211; maybe bitcoin will go up to $100,000 before &#8211; but we have some reason to suspect that at some point sooner or later the bitcoin price might drop like a stone down to its &#8216;real value&#8217;. Which is unknown, but for practical purposes is probably not in the $15,000 area, but more like $15.</p>
<p>So we need some other method to tackle the cryptocurrency trading problem. The first question: Has the crypto market already developed price curve inefficiencies that can be exploited in a trading system? In <strong>(1)</strong> we see some tests with basic bitcoin strategies. Our own tests came to the same results. Momentum based strategies can work, and <a href="http://www.financial-hacker.com/get-rich-slowly/" target="_blank" rel="noopener noreferrer">mean-variance optimizing</a>&nbsp;portfolio systems can achieve even extreme returns with crytrocurrencies &#8211; up to 10 times higher than &#8220;hodl&#8221;. But that&#8217;s not really surprising due to the high momentums and volatilities of crypto coins. The problem is that all crypto portfolios are exposed to high risk. Other conventional model-based strategies don&#8217;t work well anyway with cryptos.</p>
<p>When we concentrate on bitcoin, our proposed system must be a fast trading, trend-agnostic strategy. That means it holds positions only a few minutes, and is not exposed to the bubble risk. I can already tell that short-term mean reversion &#8211; even with a more sophisticated system as in <strong>(1)</strong> &#8211; produces no good result with cryptos. So only a few possibilities remain. One of them is exploiting short-term price patterns. This is the strategy that we will develop. And I can already tell that it works. But for this we&#8217;ll need a deep machine learning system for detecting the patterns and determining their rules.</p>
<h3>Selecting a machine learning library</h3>
<p>The basic structure of such a machine learning system is described <a href="http://www.financial-hacker.com/build-better-strategies-part-5-developing-a-machine-learning-system/" target="_blank" rel="noopener noreferrer">here</a>. Due to the low signal-to-noise ratio and to ever-changing market conditions, analyzing price series is one of the most ambitious tasks for machine learning. Compared with other AI algorithms, deep learning systems have the highest success rate. Since we can connect any <a href="http://www.financial-hacker.com/hackers-tools-zorro-and-r/" target="_blank" rel="noopener noreferrer">Zorro</a> based trading script to the data analysis software R, we&#8217;ll use a R based deep learning package. There are meanwhile many available. Here&#8217;s the choice:</p>
<ul style="list-style-type: square;">
<li><strong>Deepnet</strong>, a lightweight and straightforward neural net library with a stacked autoencoder and a Boltzmann machine. Produces good results when the feature set is not too complex. The basic train and predict functions for using a deepnet autoencoder in a Zorro strategy:<!--?prettify linenums=true?-->
<pre class="prettyprint">library('deepnet') 

neural.train = function(model,XY) 
{
  XY &lt;- as.matrix(XY)
  X &lt;- XY[,-ncol(XY)]
  Y &lt;- XY[,ncol(XY)]
  Y &lt;- ifelse(Y &gt; 0,1,0)
  Models[[model]] &lt;&lt;- sae.dnn.train(X,Y,
      hidden = c(30), 
      learningrate = 0.5, 
      momentum = 0.5, 
      learningrate_scale = 1.0, 
      output = "sigm", 
      sae_output = "linear", 
      numepochs = 100, 
      batchsize = 100)
}

neural.predict = function(model,X) 
{
  if(is.vector(X)) X &lt;- t(X)
  return(nn.predict(Models[[model]],X))
}
</pre>
</li>
<li><strong>H2O</strong>, an open-source software package with the ability to run on distributed computer systems. Coded in Java, so the latest version of the JDK is required. Aside from deep autoencoders, many other machine learning algorithms are supported, such as random forests. Features can be preselected, and ensembles can be created. Disadvantage: While batch training is fast, predicting a single sample, as usually needed in a trading strategy, is relatively slow due to the server/client concept. The basic <strong>H2O</strong> train and predict functions for Zorro:<!--?prettify linenums=true?-->
<pre class="prettyprint">library('h2o') 
# also install the Java JDK

neural.train = function(model,XY) 
{
  XY &lt;- as.h2o(XY)
  Models[[model]] &lt;&lt;- h2o.deeplearning(
    -ncol(XY),ncol(XY),XY,
    hidden = c(30),  seed = 365)
}

neural.predict = function(model,X) 
{
  if(is.vector(X)) X &lt;- as.h2o(as.data.frame(t(X)))
  else X &lt;- as.h2o(X)
  Y &lt;- h2o.predict(Models[[model]],X)
  return(as.vector(Y))
}</pre>
</li>
<li><strong>Tensorflow</strong> in its <strong>Keras</strong> incarnation, a neural network kit by Google. Supports CPU and GPU and comes with all needed modules for tensor arithmetics, activation and loss functions, covolution kernels, and backpropagation algorithms. So you can build your own neural net structure. <strong>Keras</strong> offers a simple interface for that.
<p>Keras is available as a R library, but installing it requires also a Python environment. First install Anaconda from <a href="https://www.anaconda.com">www.anaconda.com</a>. Open the Anaconda Navigator and install the RStudio application (installing Keras outside an Anaconda environment fails on some PCs with an error message). Then open Rstudio inside the Navigator, install the Keras package, then finally execute library(&#8216;keras&#8217;) and install_keras(). These steps usually succeed.</p>
<p>The <strong>Keras</strong> train and predict functions for Zorro:<!--?prettify linenums=true?--></p>
<pre class="prettyprint">library('keras')
#needs Python 3.6 and Anaconda
#call install_keras() after installing the package

neural.train = function(model,XY) 
{
  X &lt;- data.matrix(XY[,-ncol(XY)])
  Y &lt;- XY[,ncol(XY)]
  Y &lt;- ifelse(Y &gt; 0,1,0)
  Model &lt;- keras_model_sequential() 
  Model %&gt;% 
    layer_dense(units=30,activation='relu',input_shape = c(ncol(X))) %&gt;% 
    layer_dropout(rate = 0.2) %&gt;% 
    layer_dense(units = 1, activation = 'sigmoid')
  
  Model %&gt;% compile(
    loss = 'binary_crossentropy',
    optimizer = optimizer_rmsprop(),
    metrics = c('accuracy'))
  
  Model %&gt;% fit(X, Y, 
    epochs = 20, batch_size = 20, 
    validation_split = 0, shuffle = FALSE)
  
  Models[[model]] &lt;&lt;- Model
}

neural.predict = function(model,X) 
{
  if(is.vector(X)) X &lt;- t(X)
  X &lt;- as.matrix(X)
  Y &lt;- Models[[model]] %&gt;% predict_proba(X)
  return(ifelse(Y &gt; 0.5,1,0))
}
</pre>
</li>
<li><strong>MxNet</strong>, Amazon&#8217;s answer on Google&#8217;s Tensorflow. Offers also tensor arithmetics and neural net building blocks on CPU and GPU, as well as high level network functions similar to Keras (the next Keras version will also support MxNet). Just as with Tensorflow, CUDA is supported, but not (yet) OpenCL, so you&#8217;ll need a Nvidia graphics card to enjoy GPU support. In direct comparison <strong>(2)</strong>, MxNet was reported to be less resource hungry and a bit faster than Tensorflow, but so far I could not confirm this. The standard train and predict functions:<!--?prettify linenums=true?-->
<pre class="prettyprint"># how to install the CPU version:
#cran &lt;- getOption("repos")
#cran["dmlc"] &lt;- "https://s3-us-west-2.amazonaws.com/apache-mxnet/R/CRAN/"
#options(repos = cran)
#install.packages('mxnet')
library('mxnet')

neural.train = function(model,XY) 
{
  X &lt;- data.matrix(XY[,-ncol(XY)])
  Y &lt;- XY[,ncol(XY)]
  Y &lt;- ifelse(Y &gt; 0,1,0)
  Models[[model]] &lt;&lt;- mx.mlp(X,Y,
       hidden_node = c(30), 
       out_node = 2, 
       activation = "sigmoid",
       out_activation = "softmax",
       num.round = 20,
       array.batch.size = 20,
       learning.rate = 0.05,
       momentum = 0.9,
       eval.metric = mx.metric.accuracy)
}

neural.predict = function(model,X) 
{
  if(is.vector(X)) X &lt;- t(X)
  X &lt;- data.matrix(X)
  Y &lt;- predict(Models[[model]],X)
  return(ifelse(Y[1,] &gt; Y[2,],0,1))
}
</pre>
</li>
</ul>
<p>By replacing the <strong>neural.train</strong> and <strong>neural.predict</strong> functions, and other functions for saving and loading models that are not listed here, you can run the same strategy with different deep learning packages and compare. We&#8217;re currently using Keras for most machine learning strategies, and I&#8217;ll also use it for the short-term bitcoin trading system presented in the upcoming 2nd part of this article. There is no bitcoin futures data available yet, so tick based price data from several bitcoin exchanges will have to do for the backtest.</p>
<p>I&#8217;ve uploaded the interface scripts for Deepnet, H2O, Tensorflow/Keras, and MxNet to the 2018 script repository, so you can run your own deep learning experiments and compare the packages. Here&#8217;s a Zorro script for downloading bitcoin prices from Quandl &#8211; EOD only, though, since the exchanges demand dear payment for their tick data.</p>
<pre class="prettyprint">void main()
{
  assetHistory("BITFINEX/BTCUSD",FROM_QUANDL);
}</pre>
<p>You can also get Bitcoin M1 data from Kaggle in CSV format. Here&#8217;s a Zorro script for converting it to a Zorro T6 dataset:</p>
<pre class="prettyprint">void main()
{
	string InName = "History\\bitstampUSD_1-min_data_2012-01-01_to_2019-03-13.csv";
	string Format = "+%t,f3,f1,f2,f4,f6";
	dataParse(1,Format,InName); 
	dataSave(1,"History\\BTCUSD.t6");
}</pre>
<h3>Further reading</h3>
<p>(1) Nicolas Rabener, <a href="https://www.factorresearch.com/research-quant-strategies-in-the-cryptocurrency-space" target="_blank" rel="noopener noreferrer">Quant Strategies in the Cryptocurrency Space</a></p>
<p>(2) Julien Simon, <a href="https://medium.com/@julsimon/keras-shoot-out-tensorflow-vs-mxnet-51ae2b30a9c0" target="_blank" rel="noopener noreferrer">Tensorflow vs MxNet</a></p>
<p>(3) Zachary Lipton et al, <a href="https://github.com/zackchase/mxnet-the-straight-dope" target="_blank" rel="noopener noreferrer">MxNet &#8211; The Straight Dope</a><br />
(Good introduction in deep learning with MxNet / Gluon examples)</p>
<p>(4) F.Chollet/J.J.Allaire, Deep Learning with R<br />
(Excellent introduction in Keras)</p>
]]></content:encoded>
					
					<wfw:commentRss>https://financial-hacker.com/deep-learning-systems-for-bitcoins-part-1/feed/</wfw:commentRss>
			<slash:comments>42</slash:comments>
		
		
			</item>
		<item>
		<title>Better Strategies 5: A Short-Term Machine Learning System</title>
		<link>https://financial-hacker.com/build-better-strategies-part-5-developing-a-machine-learning-system/</link>
					<comments>https://financial-hacker.com/build-better-strategies-part-5-developing-a-machine-learning-system/#comments</comments>
		
		<dc:creator><![CDATA[jcl]]></dc:creator>
		<pubDate>Fri, 12 Aug 2016 09:42:38 +0000</pubDate>
				<category><![CDATA[3 Most Clicked]]></category>
		<category><![CDATA[Machine Learning]]></category>
		<category><![CDATA[Research]]></category>
		<category><![CDATA[System Development]]></category>
		<category><![CDATA[Autoencoder]]></category>
		<category><![CDATA[Boltzmann machine]]></category>
		<category><![CDATA[Classification]]></category>
		<category><![CDATA[Confusion matrix]]></category>
		<category><![CDATA[Data mining bias]]></category>
		<category><![CDATA[Deepnet]]></category>
		<category><![CDATA[Experiment]]></category>
		<category><![CDATA[Price action]]></category>
		<category><![CDATA[R]]></category>
		<category><![CDATA[Sharpe ratio]]></category>
		<category><![CDATA[Walk forward analysis]]></category>
		<guid isPermaLink="false">http://www.financial-hacker.com/build-better-strategies-part-3-the-development-process-copy/</guid>

					<description><![CDATA[It&#8217;s time for the 5th and final part of the Build Better Strategies series. In part 3 we&#8217;ve discussed the development process of a model-based system, and consequently we&#8217;ll conclude the series with developing a data-mining system. The principles of data mining and machine learning have been the topic of part 4. For our short-term &#8230; <a href="https://financial-hacker.com/build-better-strategies-part-5-developing-a-machine-learning-system/" class="more-link">Continue reading<span class="screen-reader-text"> "Better Strategies 5: A Short-Term Machine Learning System"</span></a>]]></description>
										<content:encoded><![CDATA[<p>It&#8217;s time for the 5th and final part of the <a href="http://www.financial-hacker.com/build-better-strategies/">Build Better Strategies</a> series. In <a href="http://www.financial-hacker.com/build-better-strategies-part-3-the-development-process/" target="_blank" rel="noopener">part 3</a> we&#8217;ve discussed the development process of a model-based system, and consequently we&#8217;ll conclude the series with developing a data-mining system. The principles of data mining and machine learning have been the topic of <a href="http://www.financial-hacker.com/build-better-strategies-part-4-machine-learning/">part 4</a>. For our short-term trading example we&#8217;ll use a <strong>deep learning algorithm</strong>, a stacked autoencoder, but it will work in the same way with many other machine learning algorithms. With today&#8217;s software tools, only about <strong>20 lines of code</strong> are needed for a machine learning strategy. I&#8217;ll try to explain all steps in detail. <span id="more-1872"></span></p>
<p>Our example will be a <strong>research project</strong> &#8211; a machine learning experiment for answering two questions. Does a more complex algorithm &#8211; such as, more neurons and deeper learning &#8211; produce a better prediction? And are short-term price moves predictable by short-term price history? The last question came up due to my scepticism about <strong>price action trading</strong> in the <a href="http://www.financial-hacker.com/build-better-strategies-part-4-machine-learning/" target="_blank" rel="noopener">previous part</a> of this series. I got several emails asking about the &#8220;trading system generators&#8221; or similar price action tools that are praised on some websites. There is no hard evidence that such tools ever produced any profit (except for their vendors) &#8211; but does this mean that they all are garbage? We&#8217;ll see.</p>
<p>Our experiment is simple: We collect information from the last candles of a price curve, feed it in a deep learning neural net, and use it to predict the next candles. My hypothesis is that a few candles don&#8217;t contain any useful predictive information. Of course, a nonpredictive outcome of the experiment won&#8217;t mean that I&#8217;m right, since I could have used wrong parameters or prepared the data badly. But a predictive outcome would be a hint that I&#8217;m wrong and price action trading can indeed be profitable.</p>
<h3>Machine learning strategy development<br />
Step 1: The target variable</h3>
<p>To recap the <a href="http://www.financial-hacker.com/build-better-strategies-part-4-machine-learning/">previous part</a>: a supervised learning algorithm is trained with a set of <strong>features</strong> in order to predict a <strong>target variable</strong>. So the first thing to determine is what this target variable shall be. A popular target, used in most papers, is the sign of the price return at the next bar. Better suited for prediction, since less susceptible to randomness, is the price difference to a more distant <strong>prediction horizon</strong>, like 3 bars from now, or same day next week. Like almost anything in trading systems, the prediction horizon is a compromise between the effects of randomness (less bars are worse) and predictability (less bars are better).</p>
<p>Sometimes you&#8217;re not interested in directly predicting price, but in predicting some other parameter &#8211; such as the current leg of a Zigzag indicator &#8211; that could otherwise only be determined in hindsight. Or you want to know if a certain <strong>market inefficiency</strong> will be present in the next time, especially when you&#8217;re using machine learning not directly for trading, but for filtering trades in a <a href="http://www.financial-hacker.com/build-better-strategies-part-3-the-development-process/" target="_blank" rel="noopener">model-based system</a>. Or you want to predict something entirely different, for instance the probability of a market crash tomorrow. All this is often easier to predict than the popular tomorrow&#8217;s return.</p>
<p>In our price action experiment we&#8217;ll use the return of a short-term price action trade as target variable. Once the target is determined, next step is selecting the features.</p>
<h3>Step 2: The features</h3>
<p>A price curve is the worst case for any machine learning algorithm. Not only does it carry <strong>little signal and mostly noise</strong>, it is also nonstationary and the signal/noise ratio changes all the time. The exact ratio of signal and noise depends on what is meant with &#8220;signal&#8221;, but it is normally too low for any known machine learning algorithm to produce anything useful. So we must derive features from the price curve that contain more signal and less noise. Signal, in that context, is any information that can be used to predict the target, whatever it is. All the rest is noise.</p>
<p>Thus, <strong>selecting the features is critical for success</strong> &#8211; even more critical than deciding which machine learning algorithm you&#8217;re going to use. There are two approaches for selecting features. The first and most common is extracting as much information from the price curve as possible. Since you do not know where the information is hidden, you just generate a wild collection of indicators with a wide range of parameters, and hope that at least a few of them will contain the information that the algorithm needs. This is the approach that you normally find in the literature. The problem of this method: Any machine learning algorithm is easily confused by nonpredictive predictors. So it won&#8217;t do to just throw 150 indicators at it. You need some <strong>preselection algorithm </strong>that determines which of them carry useful information and which can be omitted. Without reducing the features this way to maybe eight or ten, even the deepest learning algorithm won&#8217;t produce anything useful.</p>
<p>The other approach, normally for experiments and research, is using only limited information from the price curve. This is the case here: Since we want to examine price action trading, we only use the last few prices as inputs, and must discard all the rest of the curve. This has the advantage that we don&#8217;t need any preselection algorithm since the number of features is limited anyway. Here are the two simple predictor functions that we use in our experiment (in C):</p>
<pre class="prettyprint">var change(int n)
{
	return scale((priceClose(0) - priceClose(n))/priceClose(0),100)/100;
}

var range(int n)
{
	return scale((HH(n) - LL(n))/priceClose(0),100)/100;
}</pre>
<p>The two functions are supposed to carry the necessary information for price action: per-bar movement and volatility. The <strong>change </strong>function is the difference of the current price to the price of <strong>n</strong> bars before, divided by the current price. The <strong>range</strong> function is the total high-low distance of the last <strong>n</strong> candles, also in divided by the current price. And the <strong>scale</strong> function centers and compresses the values to the <strong>+/-100</strong> range, so we divide them by 100 for getting them normalized to <strong>+/-1</strong>. We remember that normalizing is needed for machine learning algorithms.</p>
<h3>Step 3: Preselecting predictors</h3>
<p>When you have selected a large number of indicators or other signals as features for your algorithm, you must determine which of them is useful and which not. There are many methods for reducing the number of features, for instance:</p>
<ul style="list-style-type: square;">
<li>Determine the correlations between the signals. Remove those with a strong correlation to other signals, since they do not contribute to the information.</li>
<li>Compare the information content of signals directly, with algorithms like information entropy or decision trees.</li>
<li>Determine the information content indirectly by comparing the signals with randomized signals; there are some software libraries for this, such as the R Boruta package.</li>
<li>Use an algorithm like Principal Components Analysis (PCA) for generating a new signal set with reduced dimensionality.</li>
<li>Use genetic optimization for determining the most important signals just by the most profitable results from the prediction process. Great for curve fitting if you want to publish impressive results in a research paper.</li>
</ul>
<p>Reducing the number of features is important for most machine learning algorithms, including shallow neural nets. For deep learning it&#8217;s less important, since deep nets  with many neurons are normally able to process huge feature sets and discard redundant features. For our experiment we do not preselect or preprocess the features, but you can find useful information about this in articles (1), (2), and (3) listed at the end of the page.</p>
<h3>Step 4: Select the machine learning algorithm</h3>
<p>R offers many different ML packages, and any of them offers many different algorithms with many different parameters. Even if you already decided about the method &#8211; here, deep learning &#8211; you have still the choice among different approaches and different R packages. Most are quite new, and you can find not many empirical information that helps your decision. You have to try them all and gain experience with different methods. For our experiment we&#8217;ve choosen the <strong>Deepnet</strong> package, which is probably the simplest and easiest to use deep learning library. This keeps our code short. We&#8217;re using its <strong>Stacked Autoencoder</strong> (<strong>SAE</strong>) algorithm for pre-training the network. Deepnet also offers a <strong>Restricted Boltzmann Machine</strong> (<strong>RBM</strong>) for pre-training, but I could not get good results from it. There are other and more complex deep learning packages for R, so you can spend a lot of time checking out all of them.</p>
<p><em>How</em> pre-training works is easily explained, but <em>why</em> it works is a different matter. As to my knowledge, no one has yet come up with a solid mathematical proof that it works at all. Anyway, imagine a large neural net with many hidden layers:</p>
<p><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/deepnet.png"><img decoding="async" class="alignnone wp-image-2026 size-full" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/deepnet.png" width="560" height="279" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/deepnet.png 560w, https://financial-hacker.com/wp-content/uploads/2016/10/deepnet-300x149.png 300w" sizes="(max-width: 560px) 85vw, 560px" /></a></p>
<p>Training the net means setting up the connection weights between the neurons. The usual method is error backpropagation. But it turns out that the more hidden layers you have, the worse it works. The backpropagated error terms get smaller and smaller from layer to layer, causing the first layers of the net to learn almost nothing. Which means that the predicted result becomes more and more dependent of the random initial state of the weights. This severely limited the complexity of layer-based neural nets and therefore the tasks that they can solve. At least until 10 years ago.</p>
<p>In 2006 scientists in Toronto first published the idea to pre-train the weights with an unsupervised learning algorithm, a restricted Boltzmann machine. This turned out a revolutionary concept. It boosted the development of artificial intelligence and allowed all sorts of new applications from Go-playing machines to self-driving cars. Meanwhile, several new improvements and algorithms for deep learning have been found. A stacked autoencoder works this way:</p>
<ol>
<li>Select the hidden layer to train; begin with the first hidden layer. Connect its outputs to a temporary output layer that has the same structure as the network&#8217;s input layer.</li>
<li>Feed the network with the training samples, but without the targets. Train it so that the first hidden layer reproduces the input signal &#8211; the features &#8211; at its outputs as exactly as possible. The rest of the network is ignored. During training, apply a &#8216;weight penalty term&#8217; so that as few connection weights as possible are used for reproducing the signal.</li>
<li>Now feed the outputs of the trained hidden layer to the inputs of the next untrained hidden layer, and repeat the training process so that the input signal is now reproduced at the outputs of the next layer.</li>
<li>Repeat this process until all hidden layers are trained. We have now a &#8216;sparse network&#8217; with very few layer connections that can reproduce the input signals.</li>
<li>Now train the network with backpropagation for learning the target variable, using the pre-trained weights of the hidden layers as a starting point.</li>
</ol>
<p>The hope is that the unsupervised pre-training process produces an internal noise-reduced abstraction of the input signals that can then be used for easier learning the target. And this indeed appears to work. No one really knows why, but several theories &#8211; see paper (4) below &#8211; try to explain that phenomenon.</p>
<h3>Step 5: Generate a test data set</h3>
<p>We first need to produce a data set with features and targets so that we can test our prediction process and try out parameters. The features must be based on the same price data as in live trading, and for the target we must simulate a short-term trade. So it makes sense to generate the data not with R, but with our trading platform, which is anyway a lot faster. Here&#8217;s a small <a href="http://www.financial-hacker.com/hackers-tools-zorro-and-r/" target="_blank" rel="noopener">Zorro</a> script for this, <strong>DeepSignals.c</strong>:</p>
<pre class="prettyprint">function run()
{
	StartDate = 20140601; // start two years ago
	BarPeriod = 60; // use 1-hour bars
	LookBack = 100; // needed for scale()

	set(RULES);   // generate signals
	LifeTime = 3; // prediction horizon
	Spread = RollLong = RollShort = Commission = Slippage = 0;
	
	adviseLong(SIGNALS+BALANCED,0,
		change(1),change(2),change(3),change(4),
		range(1),range(2),range(3),range(4));
	enterLong(); 
}
</pre>
<p>We&#8217;re generating 2 years of data with features calculated by our above defined <strong>change </strong>and <strong>range</strong> functions. Our target is the result of a trade with 3 bars life time. Trading costs are set to zero, so in this case the result is equivalent to the sign of the price difference at 3 bars in the future. The <strong>adviseLong</strong> function is described in the <a href="http://manual.zorro-project.com/advisor.htm" target="_blank" rel="noopener">Zorro manual</a>; it is a mighty function that automatically handles training and predicting and allows to use any R-based machine learning algorithm just as if it were a simple indicator.</p>
<p>In our code, the function uses the next trade return as target, and the price changes and ranges of the last 4 bars as features. The <strong>SIGNALS</strong> flag tells it not to train the data, but to export it to a .csv file. The <strong>BALANCED</strong> flag makes sure that we get as many positive as negative returns; this is important for most machine learning algorithms. Run the script in [Train] mode with our usual test asset EUR/USD selected. It generates a spreadsheet file named <strong>DeepSignalsEURUSD_L.csv</strong> that contains the features in the first 8 columns, and the trade return in the last column.</p>
<h3>Step 6: Calibrate the algorithm</h3>
<p>Complex machine learning algorithms have many parameters to adjust. Some of them offer great opportunities to curve-fit the algorithm for publications. Still, we must calibrate parameters since the algorithm rarely works well with its default settings. For this, here&#8217;s an R script that reads the previously created data set and processes it with the deep learning algorithm (<strong>DeepSignal.r</strong>): </p>
<pre class="prettyprint">library('deepnet', quietly = T) 
library('caret', quietly = T)

neural.train = function(model,XY) 
{
  XY &lt;- as.matrix(XY)
  X &lt;- XY[,-ncol(XY)]
  Y &lt;- XY[,ncol(XY)]
  Y &lt;- ifelse(Y &gt; 0,1,0)
  Models[[model]] &lt;&lt;- sae.dnn.train(X,Y,
      hidden = c(50,100,50), 
      activationfun = "tanh", 
      learningrate = 0.5, 
      momentum = 0.5, 
      learningrate_scale = 1.0, 
      output = "sigm", 
      sae_output = "linear", 
      numepochs = 100, 
      batchsize = 100,
      hidden_dropout = 0, 
      visible_dropout = 0)
}

neural.predict = function(model,X) 
{
  if(is.vector(X)) X &lt;- t(X)
  return(nn.predict(Models[[model]],X))
}

neural.init = function()
{
  set.seed(365)
  Models &lt;&lt;- vector("list")
}

TestOOS = function() 
{
  neural.init()
  XY &lt;&lt;- read.csv('C:/Zorro/Data/DeepSignalsEURUSD_L.csv',header = F)
  splits &lt;- nrow(XY)*0.8
  XY.tr &lt;&lt;- head(XY,splits);
  XY.ts &lt;&lt;- tail(XY,-splits)
  neural.train(1,XY.tr)
  X &lt;&lt;- XY.ts[,-ncol(XY.ts)]
  Y &lt;&lt;- XY.ts[,ncol(XY.ts)]
  Y.ob &lt;&lt;- ifelse(Y &gt; 0,1,0)
  Y &lt;&lt;- neural.predict(1,X)
  Y.pr &lt;&lt;- ifelse(Y &gt; 0.5,1,0)
  confusionMatrix(Y.pr,Y.ob)
}</pre>
<p>We&#8217;ve defined three functions <strong>neural.train</strong>, <strong>neural.predict</strong>, and <strong>neural.init</strong> for training, predicting, and initializing the neural net. The function names are not arbitrary, but follow the convention used by Zorro&#8217;s advise(NEURAL,..) function. It doesn&#8217;t matter now, but will matter later when we use the same R script for training and trading the deep learning strategy. A fourth function, <strong>TestOOS</strong>, is used for out-of-sample testing our setup.</p>
<p>The function <strong>neural.init</strong> seeds the R random generator with a fixed value (365 is my personal lucky number). Otherwise we would get a slightly different result any time, since the neural net is initialized with random weights. It also creates a global R list named &#8220;Models&#8221;. Most R variable types don&#8217;t need to be created beforehand, some do (don&#8217;t ask me why). The &#8216;&lt;&lt;-&#8216; operator is for accessing a global variable from within a function.</p>
<p>The function <strong>neural.train</strong> takes as input a model number and the data set to be trained. The model number identifies the trained model in the &#8220;<strong>Models</strong>&#8221; list. A list is not really needed for this test, but we&#8217;ll need it for more complex strategies that train more than one model. The matrix containing the features and target is passed to the function as second parameter. If the <strong>XY</strong> data is not a proper matrix, which frequently happens in R depending on how you generated it, it is converted to one. Then it is split into the features (<strong>X</strong>) and the target (<strong>Y</strong>), and finally the target is converted to <strong>1</strong> for a positive trade outcome and <strong>0</strong> for a negative outcome. </p>
<p>The network parameters are then set up. Some are obvious, others are free to play around with:</p>
<ul style="list-style-type: square;">
<li>The network structure is given by the <strong>hidden</strong> vector:  <strong>c(50,100,50)</strong> defines 3 hidden layers, the first with 50, second with 100, and third with 50 neurons. That&#8217;s the parameter that we&#8217;ll later modify for determining whether deeper is better.</li>
<li>The <strong>activation function </strong>converts the sum of neuron input values to the neuron output; most often used are <strong>sigmoid</strong> that saturates to 0 or 1, or <strong>tanh</strong> that saturates to -1 or +1.</li>
</ul>
<p><a href="http://www.financial-hacker.com/wp-content/uploads/2016/08/sigmoid_tanh.png"><img decoding="async" class="alignnone wp-image-2111 " src="http://www.financial-hacker.com/wp-content/uploads/2016/08/sigmoid_tanh.png" width="523" height="197" srcset="https://financial-hacker.com/wp-content/uploads/2016/08/sigmoid_tanh.png 960w, https://financial-hacker.com/wp-content/uploads/2016/08/sigmoid_tanh-300x113.png 300w, https://financial-hacker.com/wp-content/uploads/2016/08/sigmoid_tanh-768x289.png 768w" sizes="(max-width: 523px) 85vw, 523px" /></a></p>
<p>We use <strong>tanh</strong> here since our signals are also in the +/-1 range. The <strong>output</strong> of the network is a sigmoid function since we want a prediction in the 0..1 range. But the <strong>SAE output</strong> must be &#8220;linear&#8221; so that the Stacked Autoencoder can reproduce the analog input signals on the outputs. Recently in fashion came RLUs, Rectified Linear Units, as activation functions for internal layers. RLUs are faster and partially overcome the above mentioned backpropagation problem, but are not supported by deepnet.</p>
<ul style="list-style-type: square;">
<li>The <strong>learning rate</strong> controls the step size for the gradient descent in training; a lower rate means finer steps and possibly more precise prediction, but longer training time.</li>
<li><strong>Momentum</strong> adds a fraction of the previous step to the current one. It prevents the gradient descent from getting stuck at a tiny local minimum or saddle point.</li>
<li>The <strong>learning rate scale</strong> is a multiplication factor for changing the learning rate after each iteration (I am not sure for what this is good, but there may be tasks where a lower learning rate on higher epochs improves the training).</li>
<li>An <strong>epoch</strong> is a training iteration over the entire data set. Training will stop once the number of epochs is reached. More epochs mean better prediction, but longer training.</li>
<li>The <strong>batch size</strong> is a number of random samples &#8211; a <strong>mini batch</strong> &#8211; taken out of the data set for a single training run. Splitting the data into mini batches speeds up training since the weight gradient is then calculated from fewer samples. The higher the batch size, the better is the training, but the more time it will take.</li>
<li>The <strong>dropout</strong> is a number of randomly selected neurons that are disabled during a mini batch. This way the net learns only with a part of its neurons. This seems a strange idea, but can effectively reduce overfitting.</li>
</ul>
<p>All these parameters are common for neural networks. Play around with them and check their effect on the result and the training time. Properly calibrating a neural net is not trivial and might be the topic of another article. The parameters are stored in the model together with the matrix of trained connection weights. So they need not to be given again in the prediction function, <strong>neural.predict</strong>. It takes the model and a vector <strong>X</strong> of features, runs it through the layers, and returns the network output, the predicted target <strong>Y</strong>. Compared with training, prediction is pretty fast since it only needs a couple thousand multiplications. If <strong>X</strong> was a row vector, it is transposed and this way converted to a column vector, otherwise the <strong>nn.predict</strong> function won&#8217;t accept it.</p>
<p>Use RStudio or some similar environment for conveniently working with R. Edit the path to the <strong>.csv</strong> data in the file above, source it, install the required R packages (deepnet, e1071, and caret), then call the <strong>TestOOS</strong> function from the command line. If everything works, it should print something like that:</p>
<pre class="prettyprint">&gt; TestOOS()
begin to train sae ......
training layer 1 autoencoder ...
####loss on step 10000 is : 0.000079
training layer 2 autoencoder ...
####loss on step 10000 is : 0.000085
training layer 3 autoencoder ...
####loss on step 10000 is : 0.000113
sae has been trained.
begin to train deep nn ......
####loss on step 10000 is : 0.123806
deep nn has been trained.
Confusion Matrix and Statistics

          Reference
Prediction    0    1
         0 1231  808
         1  512  934
                                          
               Accuracy : 0.6212          
                 95% CI : (0.6049, 0.6374)
    No Information Rate : 0.5001          
    P-Value [Acc &gt; NIR] : &lt; 2.2e-16       
                                          
                  Kappa : 0.2424          
 Mcnemar's Test P-Value : 4.677e-16       
                                          
            Sensitivity : 0.7063          
            Specificity : 0.5362          
         Pos Pred Value : 0.6037          
         Neg Pred Value : 0.6459          
             Prevalence : 0.5001          
         Detection Rate : 0.3532          
   Detection Prevalence : 0.5851          
      Balanced Accuracy : 0.6212          
                                          
       'Positive' Class : 0               
                                          
&gt; </pre>
<p><strong>TestOOS</strong> reads first our data set from Zorro&#8217;s Data folder. It splits the data in 80% for training (<strong>XY.tr</strong>) and 20% for out-of-sample testing (<strong>XY.ts</strong>). The training set is trained and the result stored in the <strong>Models</strong> list at index 1. The test set is further split in features (<strong>X</strong>) and targets (<strong>Y</strong>). <strong>Y</strong> is converted to binary 0 or 1 and stored in <strong>Y.ob</strong>, our vector of observed targets. We then predict the targets from the test set, convert them again to binary 0 or 1 and store them in <strong>Y.pr</strong>. For comparing the observation with the prediction, we use the <strong>confusionMatrix</strong> function from the caret package.</p>
<p>A confusion matrix of a binary classifier is simply a 2&#215;2 matrix that tells how many 0&#8217;s and how many 1&#8217;s had been predicted wrongly and correctly. A lot of metrics are derived from the matrix and printed in the lines above. The most important at the moment is the <strong>62% prediction accuracy</strong>. This may hint that I bashed price action trading a little prematurely. But of course the 62% might have been just luck. We&#8217;ll see that later when we run a WFO test.</p>
<p>A final advice: R packages are occasionally updated, with the possible consequence that previous R code suddenly might work differently, or not at all. This really happens, so test carefully after any update.</p>
<h3>Step 7: The strategy</h3>
<p>Now that we&#8217;ve tested our algorithm and got some prediction accuracy above 50% with a test data set, we can finally code our machine learning strategy. In fact we&#8217;ve already coded most of it, we just must add a few lines to the above Zorro script that exported the data set. This is the final script for training, testing, and (theoretically) trading the system (<strong>DeepLearn.c</strong>):</p>
<pre class="prettyprint">#include &lt;r.h&gt;

function run()
{
	StartDate = 20140601;
	BarPeriod = 60;	// 1 hour
	LookBack = 100;

	WFOPeriod = 252*24; // 1 year
	DataSplit = 90;
	NumCores = -1;  // use all CPU cores but one

	set(RULES);
	Spread = RollLong = RollShort = Commission = Slippage = 0;
	LifeTime = 3;
	if(Train) Hedge = 2;
	
	if(adviseLong(NEURAL+BALANCED,0,
		change(1),change(2),change(3),change(4),
		range(1),range(2),range(3),range(4)) &gt; 0.5) 
		enterLong();
	if(adviseShort() &gt; 0.5) 
		enterShort();
}</pre>
<p>We&#8217;re using a WFO cycle of one year, split in a 90% training and a 10% out-of-sample test period. You might ask why I have earlier used two year&#8217;s data and a different split, 80/20, for calibrating the network in step 5. This is for using differently composed data for calibrating and for walk forward testing. If we used exactly the same data, the calibration might overfit it and compromise the test. </p>
<p>The selected WFO parameters mean that the system is trained with about 225 days data, followed by a 25 days test or trade period. Thus, in live trading the system would retrain every 25 days, using the prices from the previous 225 days. In the literature you&#8217;ll sometimes find the recommendation to retrain a machine learning system after any trade, or at least any day. But this does not make much sense to me. When you used almost 1 year&#8217;s data for training a system, it can obviously not deteriorate after a single day. Or if it did, and only produced positive test results with daily retraining, I would strongly suspect that the results are artifacts by some coding mistake.</p>
<p>Training a deep network takes really a long time, in our case about 10 minutes for a network with 3 hidden layers and 200 neurons. In live trading this would be done by a second Zorro process that is automatically started by the trading Zorro. In the backtest, the system trains at any WFO cycle. Therefore using multiple cores is recommended for training many cycles in parallel. The <strong>NumCores</strong> variable at <strong>-1</strong> activates all CPU cores but one. Multiple cores are only available in Zorro S, so a complete walk forward test with all WFO cycles can take several hours with the free version.</p>
<p>In the script we now train both long and short trades. For this we have to allow hedging in Training mode, since long and short positions are open at the same time. Entering a position is now dependent on the return value from the <strong>advise</strong> function, which in turn calls either the <strong>neural.train</strong> or the <strong>neural.predict</strong> function from the R script. So we&#8217;re here entering positions when the neural net predicts a result above 0.5. </p>
<p>The R script is now controlled by the Zorro script (for this it must have the same name, <strong>DeepLearn.r</strong>, only with different extension). It is identical to our R script above since we&#8217;re using the same network parameters. Only one additional function is needed for supporting a WFO test:</p>
<pre class="prettyprint">neural.save = function(name)
{
  save(Models,file=name)  
}</pre>
<p>The <strong>neural.save</strong> function stores the <strong>Models</strong> list &#8211; it now contains 2 models for long and for short trades &#8211; after every training run in Zorro&#8217;s Data folder. Since the models are stored for later use, we do not need to train them again for repeated test runs.</p>
<p>This is the WFO equity curve generated with the script above (EUR/USD, without trading costs):</p>
<figure id="attachment_2037" aria-describedby="caption-attachment-2037" style="width: 879px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD.png"><img loading="lazy" decoding="async" class="wp-image-2037 size-full" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD.png" width="879" height="341" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD.png 879w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-300x116.png 300w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-768x298.png 768w" sizes="(max-width: 709px) 85vw, (max-width: 909px) 67vw, (max-width: 1362px) 62vw, 840px" /></a><figcaption id="caption-attachment-2037" class="wp-caption-text">EUR/USD equity curve with 50-100-50 network structure</figcaption></figure>
<p>Although not all WFO cycles get a positive result, it seems that there is some predictive effect. The curve is equivalent to an annual return of 89%, achieved with a 50-100-50 hidden layer structure. We&#8217;ll check in the next step how different network structures affect the result.</p>
<p>Since the <strong>neural.init</strong>, <strong>neural.train</strong>, <strong>neural.predict</strong>, and <strong>neural.save</strong> functions are automatically called by Zorro&#8217;s adviseLong/adviseShort functions, there are no R functions directly called in the Zorro script. Thus the script can remain unchanged when using a different machine learning method. Only the <strong>DeepLearn.r</strong> script must be modified and the neural net, for instance, replaced by a support vector machine. For trading such a machine learning system live on a VPS, make sure that R is also installed on the VPS, the needed R packages are installed, and the path to the R terminal set up in Zorro&#8217;s ini file. Otherwise you&#8217;ll get an error message when starting the strategy.</p>
<h3>Step 8: The experiment</h3>
<p>If our goal had been developing a strategy, the next steps would be the reality check, risk and money management, and preparing for live trading just as described under <a href="http://www.financial-hacker.com/build-better-strategies-part-3-the-development-process/" target="_blank" rel="noopener">model-based strategy development</a>. But for our experiment we&#8217;ll now run a series of tests, with the number of neurons per layer increased from 10 to 100 in 3 steps, and 1, 2, or 3 hidden layers (deepnet does not support more than 3). So we&#8217;re looking into the following 9 network structures: c(10), c(10,10), c(10,10,10), c(30), c(30,30), c(30,30,30), c(100), c(100,100), c(100,100,100). For this experiment you need an afternoon even with a fast PC and in multiple core mode. Here are the results (SR = Sharpe ratio, R2 = slope linearity): </p>
<table cellspacing="0" cellpadding="0">
<tbody>
<tr style="height: 28px;">
<td style="width: 40px;"> </td>
<td>* 10 neurons</td>
<td>* 30 neurons</td>
<td>* 100 neurons</td>
</tr>
<tr style="height: 28.75px;">
<td style="width: 40px;">1</td>
<td>
<figure id="attachment_2047" aria-describedby="caption-attachment-2047" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-1.png"><img loading="lazy" decoding="async" class="wp-image-2047 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-1-300x116.png" width="300" height="116" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-1-300x116.png 300w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-1-768x298.png 768w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-1.png 879w" sizes="(max-width: 300px) 85vw, 300px" /></a><figcaption id="caption-attachment-2047" class="wp-caption-text">SR = 0.55 R2 = 0.00</figcaption></figure>
</td>
<td>
<figure id="attachment_2048" aria-describedby="caption-attachment-2048" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-2.png"><img loading="lazy" decoding="async" class="wp-image-2048 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-2-300x116.png" width="300" height="116" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-2-300x116.png 300w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-2-768x298.png 768w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-2.png 879w" sizes="(max-width: 300px) 85vw, 300px" /></a><figcaption id="caption-attachment-2048" class="wp-caption-text">SR = 1.02 R2 = 0.51</figcaption></figure>
</td>
<td>
<figure id="attachment_2049" aria-describedby="caption-attachment-2049" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-3.png"><img loading="lazy" decoding="async" class="wp-image-2049 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-3-300x116.png" width="300" height="116" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-3-300x116.png 300w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-3-768x298.png 768w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-3.png 879w" sizes="(max-width: 300px) 85vw, 300px" /></a><figcaption id="caption-attachment-2049" class="wp-caption-text">SR = 1.18 R2 = 0.84</figcaption></figure>
</td>
</tr>
<tr style="height: 28px;">
<td style="width: 40px;">2</td>
<td>
<figure id="attachment_2050" aria-describedby="caption-attachment-2050" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-5.png"><img loading="lazy" decoding="async" class="wp-image-2050 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-5-300x116.png" width="300" height="116" /></a><figcaption id="caption-attachment-2050" class="wp-caption-text">SR = 0.98 R2 = 0.57</figcaption></figure>
</td>
<td>
<figure id="attachment_2052" aria-describedby="caption-attachment-2052" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-6.png"><img loading="lazy" decoding="async" class="wp-image-2052 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-6-300x116.png" width="300" height="116" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-6-300x116.png 300w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-6-768x298.png 768w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-6.png 879w" sizes="(max-width: 300px) 85vw, 300px" /></a><figcaption id="caption-attachment-2052" class="wp-caption-text">SR = 1.22 R2 = 0.70</figcaption></figure>
</td>
<td>
<figure id="attachment_2054" aria-describedby="caption-attachment-2054" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-8.png"><img loading="lazy" decoding="async" class="wp-image-2054 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-8-300x116.png" width="300" height="116" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-8-300x116.png 300w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-8-768x298.png 768w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-8.png 879w" sizes="(max-width: 300px) 85vw, 300px" /></a><figcaption id="caption-attachment-2054" class="wp-caption-text">SR = 0.84 R2 = 0.60</figcaption></figure>
</td>
</tr>
<tr style="height: 28px;">
<td style="width: 40px;">3</td>
<td>
<figure id="attachment_2051" aria-describedby="caption-attachment-2051" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-4.png"><img loading="lazy" decoding="async" class="wp-image-2051 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-4-300x116.png" width="300" height="116" /></a><figcaption id="caption-attachment-2051" class="wp-caption-text">SR = 1.24 R2 = 0.79</figcaption></figure>
</td>
<td>
<figure id="attachment_2053" aria-describedby="caption-attachment-2053" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-7.png"><img loading="lazy" decoding="async" class="wp-image-2053 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-7-300x116.png" width="300" height="116" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-7-300x116.png 300w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-7-768x298.png 768w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-7.png 879w" sizes="(max-width: 300px) 85vw, 300px" /></a><figcaption id="caption-attachment-2053" class="wp-caption-text">SR = 1.28 R2 = 0.87</figcaption></figure>
</td>
<td>
<figure id="attachment_2060" aria-describedby="caption-attachment-2060" style="width: 300px" class="wp-caption alignnone"><a href="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-9.png"><img loading="lazy" decoding="async" class="wp-image-2060 size-medium" src="http://www.financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-9-300x116.png" width="300" height="116" srcset="https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-9-300x116.png 300w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-9-768x298.png 768w, https://financial-hacker.com/wp-content/uploads/2016/10/DeepLearn_EURUSD-9.png 879w" sizes="(max-width: 300px) 85vw, 300px" /></a><figcaption id="caption-attachment-2060" class="wp-caption-text">SR = 1.33 R2 = 0.83</figcaption></figure>
</td>
</tr>
</tbody>
</table>
<p>We see that a simple net with only 10 neurons in a single hidden layer won&#8217;t work well for short-term prediction. Network complexity clearly improves the performance, however only up to a certain point. A good result for our system is already achieved with 3 layers x 30 neurons. Even more neurons won&#8217;t help much and sometimes even produce a worse result. This is no real surprise, since for processing only 8 inputs, 300 neurons can likely not do a better job than 100.  </p>
<h3>Conclusion</h3>
<p>Our goal was determining if a few candles can have predictive power and how the results are affected by the complexity of the algorithm. The results seem to suggest that short-term price movements can indeed be predicted sometimes by analyzing the changes and ranges of the last 4 candles. The prediction is not very accurate &#8211; it&#8217;s in the 58%..60% range, and most systems of the test series become unprofitable when trading costs are included. Still, I have to reconsider my opinion about price action trading. The fact that the prediction improves with network complexity is an especially convincing argument for short-term price predictability.</p>
<p>It would be interesting to look into the long-term stability of predictive price patterns. For this we had to run another series of experiments and modify the training period (<strong>WFOPeriod</strong> in the script above) and the 90% IS/OOS split. This takes longer time since we must use more historical data. I have done a few tests and found so far that a year seems to be indeed a good training period. The system deteriorates with periods longer than a few years. Predictive price patterns, at least of EUR/USD, have a limited lifetime.</p>
<p>Where can we go from here? There&#8217;s a plethora of possibilities, for instance:</p>
<ul style="list-style-type: square;">
<li>Use inputs from more candles and process them with far bigger networks with thousands of neurons.</li>
<li>Use <a href="http://www.financial-hacker.com/better-tests-with-oversampling/">oversampling</a> for expanding the training data. Prediction always improves with more training samples.</li>
<li>Compress time series f.i. with spectal analysis and analyze not the candles, but their frequency representation with machine learning methods.</li>
<li>Use inputs from many candles &#8211; such as, 100 &#8211; and pre-process adjacent candles with one-dimensional convolutional network layers.</li>
<li>Use recurrent networks. Especially LSTM could be very interesting for analyzing time series &#8211; and as to my knowledge, they have been rarely used for financial prediction so far.</li>
<li>Use an ensemble of neural networks for prediction, such as Aronson&#8217;s &#8220;oracles&#8221; and &#8220;comitees&#8221;.</li>
</ul>
<h3>Papers / Articles</h3>
<p>(1) <a href="http://home.iitk.ac.in/~ayushmn/mail/pre-train.pdf" target="_blank" rel="noopener">A.S.Sisodiya, Reducing Dimensionality of Data</a> <br />
(2) <a href="http://robotwealth.com/machine-learning-financial-prediction-david-aronson/" target="_blank" rel="noopener">K.Longmore, Machine Learning for Financial Prediction</a> <br />
(3) <a href="https://www.mql5.com/en/articles/2029" target="_blank" rel="noopener">V.Perervenko, Selection of Variables for Machine Learning<br />
(</a>4) <a href="http://jmlr.org/papers/volume11/erhan10a/erhan10a.pdf" target="_blank" rel="noopener">D.Erhan et al, Why Does Pre-training Help Deep Learning?</a></p>
<hr />
<p>I&#8217;ve added the C and R scripts to the 2016 script repository. You need both in Zorro&#8217;s Strategy folder. Zorro version 1.474, and R version 3.2.5 (64 bit) was used for the experiment, but it should also work with other versions. </p>
]]></content:encoded>
					
					<wfw:commentRss>https://financial-hacker.com/build-better-strategies-part-5-developing-a-machine-learning-system/feed/</wfw:commentRss>
			<slash:comments>114</slash:comments>
		
		
			</item>
	</channel>
</rss>
