Deep Learning Systems for Bitcoin 1

Since December 2017, bitcoins can not only be traded at more or less dubious exchanges, but also as futures at the CME and CBOE. And already several trading systems popped up for bitcoin and other cryptocurrencies. None of them can claim big success, with one exception. There is a very simple strategy that easily surpasses all other bitcoin systems and probably also all known historical trading systems. Its name: Buy and Hold. In the light of the extreme success of that particular bitcoin strategy, do we really need any other trading system for cryptos?

Bitcoin – hodl??

A buy and hold strategy works extremely well when a price bubble grows, and extremely bad when it bursts. And indeed, apparently all finance and economy gurus (well, all but John McAfee) tell you that the cryptocurrency market, and especially bitcoin, is a bubble, even a “scam with no substantial worth”, and will soon experience a crash “worse than the 17th century tulip mania” or the “18th century South Sea Company fraud”.

Bubble or not?

By definition, a bubble is a price largely above the ‘real value’ or ‘fair value’ of an asset, and it bursts when people realize that. So what is the fair value of a bitcoin? Obviously not zero, since blockchain based currencies have (aside from their disadvantages) several advantages over traditional currencies, on the economy level as well as on the private level. Such as:

  • They break the link of money and debt. Cryptocurrencies don’t require the bank credit mechanism for money creation.
  • They can be used where normal money would be impractical, such as fee transfers between machines or trading in multiplayer games.
  • They allow low-cost and anonymous money transactions. At least in theory.
  • They replace banks for storing and mattresses for stashing money.

I’m ready to believe that blockchain is the future of money transfer and storage. But that does not mean an ever-rising bitcoin price. Hundreds of cryptocurrencies came out in the last two years, any single of them with a better blockchain technology than bitcoin, and any good programmer can add a new coin anytime. Few will survive. Countries or big companies might sooner or later issue their own crypto tokens, as Venezuela already is attempting. The release of an official blockchain Dollar, Yuan, or Euro would leave the old bitcoin with its energy hungry transaction algorithm in thin air. Thus, when investing in bitcoin, we should not hope for a rosy future, but look for its present ‘real value’.

Due to its extreme volatility, bitcoin can not replace bank tresors. But it is already used in some situations for reducing money transfer costs, since the miners get any transaction rewarded in bitcoin. And above all, anonymity can be a substantial motive to own it. When you need a hacker to delete your drunk driving record, pay her in bitcoin. But how big is the online market for illegal hacker jobs, kill contracts, money laundering, drugs, weapons, or pro-Trump facebook advertisements? No one knows, but when we compare it with cash, another form of anonymous payment, we get interesting results.

The current cash in circulation in the US is approximately $1.5 trillion dollars. And the current bitcoin supply, about 17 million bitcoins, represents a total value of about $250 billion. Which means that you can already replace 15% of all US cash with bitcoin! Not to mention all the other cryptos. I fear that this supply already exceeds the demand of anonymous online payment for today and also the next future.

For those reasons, a bitcoin “hodl” system, despite its extreme historical performance, is high risk. We don’t know when and how the bubble will burst – maybe bitcoin will go up to $100,000 before – but we have some reason to suspect that at some point sooner or later the bitcoin price might drop like a stone down to its ‘real value’. Which is unknown, but for practical purposes is probably not in the $15,000 area, but more like $15.

So we need some other method to tackle the cryptocurrency trading problem. The first question: Has the crypto market already developed price curve inefficiencies that can be exploited in a trading system? In (1) we see some tests with basic bitcoin strategies. Our own tests came to the same results. Momentum based strategies can work, and mean-variance optimizing portfolio systems can achieve even extreme returns with crytrocurrencies – up to 10 times higher than “hodl”. But that’s not really surprising due to the high momentums and volatilities of crypto coins. The problem is that all crypto portfolios are exposed to high risk. Other conventional model-based strategies don’t work well anyway with cryptos.

When we concentrate on bitcoin, our proposed system must be a fast trading, trend-agnostic strategy. That means it holds positions only a few minutes, and is not exposed to the bubble risk. I can already tell that short-term mean reversion – even with a more sophisticated system as in (1) – produces no good result with cryptos. So only a few possibilities remain. One of them is exploiting short-term price patterns. This is the strategy that we will develop. And I can already tell that it works. But for this we’ll need a deep machine learning system for detecting the patterns and determining their rules.

Selecting a machine learning library

The basic structure of such a machine learning system is described here. Due to the low signal-to-noise ratio and to ever-changing market conditions, analyzing price series is one of the most ambitious tasks for machine learning. Compared with other AI algorithms, deep learning systems have the highest success rate. Since we can connect any Zorro based trading script to the data analysis software R, we’ll use a R based deep learning package. There are meanwhile many available. Here’s the choice:

  • Deepnet, a lightweight and straightforward neural net library with a stacked autoencoder and a Boltzmann machine. Produces good results when the feature set is not too complex. The basic train and predict functions for using a deepnet autoencoder in a Zorro strategy:
    neural.train = function(model,XY) 
      XY <- as.matrix(XY)
      X <- XY[,-ncol(XY)]
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Models[[model]] <<- sae.dnn.train(X,Y,
          hidden = c(30), 
          learningrate = 0.5, 
          momentum = 0.5, 
          learningrate_scale = 1.0, 
          output = "sigm", 
          sae_output = "linear", 
          numepochs = 100, 
          batchsize = 100)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
  • H2O, an open-source software package with the ability to run on distributed computer systems. Coded in Java, so the latest version of the JDK is required. Aside from deep autoencoders, many other machine learning algorithms are supported, such as random forests. Features can be preselected, and ensembles can be created. Disadvantage: While batch training is fast, predicting a single sample, as usually needed in a trading strategy, is relatively slow due to the server/client concept. The basic H2O train and predict functions for Zorro:
    # also install the Java JDK
    neural.train = function(model,XY) 
      XY <- as.h2o(XY)
      Models[[model]] <<- h2o.deeplearning(
        hidden = c(30),  seed = 365)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- as.h2o(
      else X <- as.h2o(X)
      Y <- h2o.predict(Models[[model]],X)
  • Tensorflow in its Keras incarnation, a neural network kit by Google. Supports CPU and GPU and comes with all needed modules for tensor arithmetics, activation and loss functions, covolution kernels, and backpropagation algorithms. So you can build your own neural net structure. Keras offers a simple interface for that.

    Keras is available as a R library, but installing it requires also a Python environment. First install Anaconda from Open the Anaconda Navigator and install the RStudio application (installing Keras outside an Anaconda environment fails on some PCs with an error message). Then open Rstudio inside the Navigator, install the Keras package, then finally execute library(‘keras’) and install_keras(). These steps usually succeed.

    The Keras train and predict functions for Zorro:

    #needs Python 3.6 and Anaconda
    #call install_keras() after installing the package
    neural.train = function(model,XY) 
      X <- data.matrix(XY[,-ncol(XY)])
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Model <- keras_model_sequential() 
      Model %>% 
        layer_dense(units=30,activation='relu',input_shape = c(ncol(X))) %>% 
        layer_dropout(rate = 0.2) %>% 
        layer_dense(units = 1, activation = 'sigmoid')
      Model %>% compile(
        loss = 'binary_crossentropy',
        optimizer = optimizer_rmsprop(),
        metrics = c('accuracy'))
      Model %>% fit(X, Y, 
        epochs = 20, batch_size = 20, 
        validation_split = 0, shuffle = FALSE)
      Models[[model]] <<- Model
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
      X <- as.matrix(X)
      Y <- Models[[model]] %>% predict_proba(X)
      return(ifelse(Y > 0.5,1,0))
  • MxNet, Amazon’s answer on Google’s Tensorflow. Offers also tensor arithmetics and neural net building blocks on CPU and GPU, as well as high level network functions similar to Keras (the next Keras version will also support MxNet). Just as with Tensorflow, CUDA is supported, but not (yet) OpenCL, so you’ll need a Nvidia graphics card to enjoy GPU support. In direct comparison (2), MxNet was reported to be less resource hungry and a bit faster than Tensorflow, but so far I could not confirm this. The standard train and predict functions:
    # how to install the CPU version:
    #cran <- getOption("repos")
    #cran["dmlc"] <- ""
    #options(repos = cran)
    neural.train = function(model,XY) 
      X <- data.matrix(XY[,-ncol(XY)])
      Y <- XY[,ncol(XY)]
      Y <- ifelse(Y > 0,1,0)
      Models[[model]] <<- mx.mlp(X,Y,
           hidden_node = c(30), 
           out_node = 2, 
           activation = "sigmoid",
           out_activation = "softmax",
           num.round = 20,
           array.batch.size = 20,
           learning.rate = 0.05,
           momentum = 0.9,
           eval.metric = mx.metric.accuracy)
    neural.predict = function(model,X) 
      if(is.vector(X)) X <- t(X)
      X <- data.matrix(X)
      Y <- predict(Models[[model]],X)
      return(ifelse(Y[1,] > Y[2,],0,1))

By replacing the neural.train and neural.predict functions, and other functions for saving and loading models that are not listed here, you can run the same strategy with different deep learning packages and compare. We’re currently using Keras for most machine learning strategies, and I’ll also use it for the short-term bitcoin trading system presented in the upcoming 2nd part of this article. There is no bitcoin futures data available yet, so tick based price data from several bitcoin exchanges will have to do for the backtest.

I’ve uploaded the interface scripts for Deepnet, H2O, Tensorflow/Keras, and MxNet to the 2018 script repository, so you can run your own deep learning experiments and compare the packages. Here’s a Zorro script for downloading bitcoin prices from Quandl – EOD only, though, since the exchanges demand dear payment for their tick data.

void main()

You can also get Bitcoin M1 data from Kaggle in CSV format. Here’s a Zorro script for converting it to a Zorro T6 dataset:

void main()
	string InName = "History\\bitstampUSD_1-min_data_2012-01-01_to_2019-03-13.csv";
	string Format = "+%t,f3,f1,f2,f4,f6";

Further reading

(1) Nicolas Rabener, Quant Strategies in the Cryptocurrency Space

(2) Julien Simon, Tensorflow vs MxNet

(3) Zachary Lipton et al, MxNet – The Straight Dope
(Good introduction in deep learning with MxNet / Gluon examples)

(4) F.Chollet/J.J.Allaire, Deep Learning with R
(Excellent introduction in Keras)

36 thoughts on “Deep Learning Systems for Bitcoin 1”

  1. Now TensorFlow have experimental feature allow to compile your model to binary or to C++ source code:

    So, you potentially can deploy your model in R, save it to file and later make fast prediction straight from Zorro, if you able to bind TF runtime/C++ with Zorro.

    But for the other hand for trivial models, as in your article, why you not to add simple dense layers functionality to Zorro, since you already made PERCEPTRON? There a lot C++ source code of deep nets implementation available, also donโ€™t forget about OpenBLAS and your prediction engine would be blazing fast.

  2. That’s possible, but it had no substantial speed advantage. Prediction would be about 50% faster, but the bottleneck is training. Since we normally have no large feature set in trading systems, prediction is just a few matrix multiplications, and is often anyway faster than many standard indicators with large lookback periods.

  3. Nice one Johann! Very interested to hear your ideas about trading cryptos, particularly now that we can throw the futures contract into the mix. Its trading volume wasn’t exactly spectacular leading up to the Christmas break, but no doubt there are many watching with a lot of interest.

    As a nice coincidence, I also just launched a blog series about using deep learning in trading systems. I’ll be using Keras, and of course Zorro.

    Thanks for sharing your work.

  4. Sounds promising – I’m looking forward to the rest of your blog series. And don’t work too much on holidays!

  5. From your post, itโ€™s not clear how often you retrain your model and witch time frame you trade. For FX you previously suggest 1H timeframe, and 25 day retrain period, so there no speed bottleneck for any R deep learning framework at all. What about crypto market? Which timeframe you use and how often retrain your model?

  6. The timeframe is one minute, retraining every 2 weeks. All this will be covered in the second part of the article.

  7. Your every post worth a hundred posts all others authors, you always source of trading wisdom for me, thanks for your sharing.
    Waiting for part 2 impatiently!
    But I still donโ€™t figure out why you point to taring time as bottleneck, if you retraining only every 2 weeks?

  8. Because the time consuming part is the testing, not live trading, where retraining happens in the background anyway. But in walk forward tests the system is training many times, maybe thousands of times when you also do preselection or optimization. That’s where you need multiple cores, GPU support, and any processing power that you can get.

  9. Also can’t wait for part 2 of this article ๐Ÿ™‚ Trying to design a trading bot myself, so I find this blog very interesting.
    However, I think you should read some more about crypto. For example Bitcoin isn’t very anonymous, unlike Monero for example. Also you underestimate the true value of Bitcoin, based on its supply cap ๐Ÿ˜‰
    Potential problems with bit-euro or bit-yuan would be same as fiat – if you can print/issue unlimited amounts, it’s not a very good store of value.

  10. Zorro S already supports CryptoCompare, but not the free Zorro version, so I think this tool will come handy for users of the free version.

  11. How have your results been?

    Relatively simple trend/momentum strategies with various twists perform quite well in backtests, even with commission fees. Though they perform less well since March/April, market is quite choppy now.

    I have a simple bot integrated with exchange API, but my main issue now is limit/market order execution optimization… and I see that order entry is its own heavily researched academic field with some rather advanced math:

    Any advice?

  12. Another paper on the order entry optimization problem “Optimal placement in a limit order book: an analytical
    approach” 2017:

    Are there any open source tools that try to optimize order entry? If only there were a way to know in advance which limits would be filled and when to just market order ๐Ÿ˜‰ perhaps the AI approach can improve results.

  13. Thank you for the link. For optimizing the order limit, you need depth data from the order book. Live order book data is available free on several crypto exchanges, for instance Bittrex, but the order book history is not free. – We’ve meanwhile tested several network structures and got definitely better results than buy and hold, but I had not had time yet to write the second part of the article about it.

  14. Will there be another blog post on this? I am interested in what trading strategies were most successful when trading cryptocurrencies.

  15. Yes, there will be. In fact the strategy was finished long ago, but I had not the yet time to blog about it due to a lot of other projects in the last months.

  16. Possibly in February. I had a large project this year and not much time for the blog, and there’s another article to be released before.

  17. Loved the article! Any updates on if a part 2 could be out any time soon? ๐Ÿ™‚

  18. Hi, have a doubt about the following statement

    Y 0,1,0)

    in my dataset Y is equal to the close therefore always positive, should not be the percentage of variation?
    thanks for any clarification.

  19. If your Ys are always positive, simply use ifelse(Y > Threshold,1,0). Select Threshold so that 1 and 0 are equally distributed.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.