Comments on: Build Better Strategies! Part 3: The Development Process

By: Jeremie

Jeremie — Mon, 11 Nov 2024 09:11:50 +0000

Hi jcl,

Thank you very much for your feedback. After thorough investigations, I’ve noticed that I’m simply doing too much data snooping so even if each strategy is limited to 2-3 parameters, the whole set of strategies tested extend to thousands.
Moreover and that’s the big point: I’ve been using the train sets of my wfo in order to filter strategies. The problem is that those train sets ultimately includes the test sets because of the rolling process so that’s just overfitting to test data, hence the better results.

So basically, this completely breaks the whole process, which virtually is wrong since step 3: choosing from wfo results must be done AFTER having a reliable step 3 with enough observations.

By: jcl

jcl — Thu, 10 Oct 2024 14:47:25 +0000

It’s usually the other way around: test sets produce worse results that train sets. At least, I have not yet heard of an opposite result. Maybe you have discovered the one and only exception to the rule. Anyway, Zorro can eliminate any interference between WFA segments. You can find all details in the Zorro manual under ‘Walk Forward Optimization’.

By: Jeremie

Jeremie — Wed, 09 Oct 2024 17:05:41 +0000

Hi jcl,

Thanks for such clear answer, I understand better the logic now. Actually, my issues come from the fact that when I’m developping the system, so basically data mining the “core” parameters, many situations lead to good test sets results with the optimized parameters from the WFA, but completely bad results in the train sets. Is there a minimum correlation between the test/train sets results to be respected?

Also I’m still wondering if each WFA test segment should be evaluated separately in order to build an average result, or if all segments should be linked and evaluated altogether? Because if all the segments are linked, there are interferences between segments with open trades, and if segments are separated, it doesn’t represent real life use.

Considering reality check, I indeed know it’s the next step but for now, I already suspect some bias in my methodology for WFA.

Thanks

By: jcl

jcl — Wed, 09 Oct 2024 13:44:04 +0000

The data period is nowhere defined in the displayed code. It does not matter, since in real development you would anyway use WFA and always test out-of-sample.

BUT this only applies to the optimized parameters. While developing the system, you will natually select the algorithms in a way that they produce a positive result. So your system is never really out of sample, only the parameters are. That’s why you should always run a reality check when the development is finished. This is the topic of another article on this blog.

By: Jeremie

Jeremie — Tue, 08 Oct 2024 14:13:16 +0000

Hello,

Thanks for such comprehensive article! However there’s still something not clear for me: Which data is used for steps 3 > 5 vs step 6 ?? As far as I understood, the data refers to … the same period for everything? For example, steps 3 > 5 from 2010 to 2020 and then step 6 from 2010-2020. Doesn’t it create a bias because the data until step 5 is now splitted but the core of the system has already been evaluated on it = in sample + out sample? Basically, the out-of-sample data has already been used in the process of validating the strategy during steps 3> 5

Thanks!

By: Tom

Tom — Mon, 26 Aug 2024 04:35:03 +0000

Thanks jcl.

By: jcl

jcl — Sun, 25 Aug 2024 07:28:32 +0000

Yes, the Phase variable changes all the time and rotates with the cycle frequency. And yes, we had a couple systems that used the spectrum function for determining when a dominanty cycle was building up.

By: Tom

Tom — Sat, 24 Aug 2024 19:15:44 +0000

Thanks. I got it. Typically, the phase or phase shift of a standard sinusoid function is a fixed value. Here, Zorro uses it in a slightly different sense, i.e., it changes with time. This seems to indicate that the Phase variable uses by Zorro incorporates the frequency component in its calculations.

I curious if you have tried to use the Spectrum() function to find the dominant period by examining peak amplitudes of the spectral components? Thanks.

By: jcl

jcl — Wed, 21 Aug 2024 08:23:17 +0000

The Phase variable, as the name says, is the current phase angle. It tells whether the dominant cycle is currently going up or down. A sin() has no ‘frequency term’. It has an angle term. https://en.wikipedia.org/wiki/Sine_and_cosine

By: Tom

Tom — Wed, 21 Aug 2024 00:26:35 +0000

Sorry, I still don’t see where the angular frequency term for the sin() goes. Could you please explain it in more in detail? My understanding is that the code:

vars Signal = series(sin(Phase));

has the correct syntax based on Zorro’s Help page. But then this imply that the ‘Phase’ argument above also contains information of the angular frequency. Thank you.