Optimization
Given a portfolio \( \vec{p} = ( p_1, p_2, \ldots )^\intercal \) and a return \( \vec{r} = ( r_1, r_2, \ldots )^\intercal \), we want to optimize growth rate (expected log return), \( g = \mathbb{E}\left[\log\left(p^\intercal r\right)\right] \).
Generating returns
Ignoring correlations and volatility clustering
We don’t know the exact expected return and volatility of each asset. So to be more robust, we treat these as random variables. We model the expected return with a normal distribution. As the volatility can not be negative, we model that with a lognormal distribution.
An important step after that is to convert the geometric return of an asset to its instant return, by subtractin half the variance from the expected return (cf. the expectation of a lognormal distribution).
Then we can generate our return as a normal distribution with the instant return and the volatility. However, it is actually more accurate to represent returns as a Laplace distribution emperically. This makes sense because it is the amount of shocks that increases with volatility. A normal distribtuion with exponentially distributed variance is Laplace. Note that by convention, the standard deviation of a Laplace distribution has a factor \( \sqrt{2} \) compared to a normal distribution.
In conclusion: \[ \begin{align} \mu_\text{g} &= \text{Normal}(\ldots) \\ \sigma &= \text{LogNormal}(\ldots) \\ \mu_\text{i} &= \mu_\text{g} - \sigma^2 / 2 \\ r &= \text{Laplace}\left(\mu_\text{i}, \sigma / \sqrt{2}\right) \end{align} \]
Optimization algorithm
Any standard optimzation algorithm like SGD, SignSGD, Adam with or without momentum works.
Note that we need a special asset, that balances the portfolio to always sum to one. It is not a trainable parameter. By convention we choose cash to fill this role.
Constraints
One thing that we notice when running the optimization is that the results can take a lot of leverage. However, in practice it may not be possible to run a portfolio with that much leverage.
TODO: implement and write more about this: https://math.stackexchange.com/questions/318740/orthogonal-projection-onto-a-half-space
Importance sampling
When we take a sample for the optimization, we sample tail events fairly rarely. They are important to accurately model, especially when portfolio weights get larger (more leveraged). To get stable results without excessive sampling, we can use importance sampling.
TODO: actually implement this and write more.