Accurate and Robust Numerical Methods for the Dynamic Portfolio Management Problem

This paper enhances a well-known dynamic portfolio management algorithm, the BGSS algorithm, proposed by Brandt et al. (Review of Financial Studies, 18(3):831–873, 2005). We equip this algorithm with the components from a recently developed method, the Stochastic Grid Bundling Method (SGBM), for calculating conditional expectations. When solving the first-order conditions for a portfolio optimum, we implement a Taylor series expansion based on a nonlinear decomposition to approximate the utility functions. In the numerical tests, we show that our algorithm is accurate and robust in approximating the optimal investment strategies, which are generated by a new benchmark approach based on the COS method (Fang and Oosterlee, in SIAM Journal of Scientific Computing, 31(2):826–848, 2008).


Introduction
Solving the dynamic portfolio management problem has become an interesting topic ever since empirical findings in financial research suggested that asset returns were predictable. When the distributions of the asset returns are time-invariant, Merton (1969) and Samuelson (1969) have shown that an investor using a power utility function, who re-balances his portfolio optimally, should choose the same asset allocation at every time point, regardless of the investment horizon.
However, if the distributions of the asset returns are time-dependent, for example, when the asset returns follow a vector auto-regression (VAR) model, the optimal asset allocations at intermediate time points are usually not identical. In this case, an investor, who aims to find an optimal asset allocation at initial time, has to first consider all possible asset allocations in subsequent time points, which in turn are also influenced by the initial investment decision.
is briefly described and the alternative Taylor expansion is also discussed. The benchmark algorithm based on the COS method is presented in Section 4. Following that, we display results of the numerical tests in Section 5. A brief discussion of the errors of the simulationbased methods is performed in Section 5.6. We conclude in the last section.

Problem Formulation: The Investor's Problem
This section defines the dynamic portfolio optimization problem, or, in other words, "the investor's problem". We assume that the financial market is defined on a complete filtered probability space (Ω, F, {F t } 0≤t≤T , P) with finite time horizon [0, T ]. The state space Ω is the set of all realizations of the financial market within the time horizon [0, T ], F is the sigma algebra of events at time T , i.e. F = F T . We assume that the filtration {F t } 0≤t≤T is generated by the price processes of the financial market and augmented with the null sets of F. The probability measure P is defined on F.
We consider a portfolio consisting of one risk-free asset and d risky assets, which can be traded at discrete time points, t ∈ [0, 1, . . . , T − 1], before terminal time T . At each trading time t, an investor decides his trading strategy to maximize the expected value of the utility of his terminal wealth W T . Formally, the investor's problem is given by subject to the constraints: Here x s denotes the asset allocation of the investor's wealth in risky assets. Vector transposition is denoted by the prime sign. R f is the return of the risk-free asset, which is assumed to be constant for simplicity, and R e s+1 = [R e,1 s+1 , . . . , R e,d s+1 ] are the excess returns of the risky assets at time s+1. The function U (W T ) denotes the utility of the investor's terminal wealth. V t (W t , Z t ) is termed the value function, which measures the investor's investment opportunities at time t with wealth W t and market state Z t . We assume that {Z t } T t=0 is an F t -adapted Markov process.

Numerical Approaches to the Investor's Problem
From Equation (1), we see that at time t it is impossible for the investor to determine the optimal asset allocation x t without knowing optimal asset allocations {x s } T −1 s=t+1 at future time points. A multivariate optimization problem with respect to all asset allocations {x s } T −1 s=t may be considered, but due to the complexity of the dynamics of Z t it is usually not feasible to solve this problem.
A special case, discussed in Barberis (2000), Brandt et al. (2005) and van Binsbergen and Brandt (2007b), is when the investor has constant relative risk aversion (CRRA) 1 , his 1 The CRRA utility function U (W T ) reads: optimal asset allocation x t is independent of his wealth W t . With this utility function, the optimization problem with respect to the original value function, V t (W t , Z t ), which depends on two variables W t and Z t , reduces to an optimization problem with respect to a simplified value Value function v t (Z t ) can be written as a recursive procedure: Equation (4) is based on the Bellman principle of optimality and dynamic programming (Bellman, 1957), which forms the basis for any recursive solution of the dynamic portfolio problem. The principle can be applied since the state vector is assumed to follow a Markov process and, therefore, the optimal asset allocation x t only depends upon time and the current state Z t . Using the simplified value function and the power utility function with parameter γ, we can solve the investor's problem, in a backward recursion process 2 , as follows: • At time T , we determine the value function as: v T (Z T ) = 1 1 − γ , γ = 1; • At time T − 1, the investor considers the optimization problem: We denote the optimal asset allocation byx T −1 , so that: Recursively, moving backward in time, the following steps are subsequently performed at times t, t = T − 2, T − 3, . . . , 1, 0. and where Equation (2) is termed the power utility function and Equation (3) the log utility function. This utility function is homothetic in wealth, which means that with identical market state Z t , two investors, one with wealth W t and the other with wealth 1, will have the same optimal investment strategy at subsequent time points.
2 Choosing the CRRA utility function is essential for this process to be valid.
• When the investor's optimal asset allocations, {x s } T −1 s=t+1 , are determined, we can calculate the value function v t+1 (Z t+1 ) as: where the last equality is valid by using the definition of v t+1 (Z t+1 ). Value function v t (Z t ) can also be written as: where the last equality follows from the law of iterated expectations.
Either Equation (5) or Equation (6) can be employed to evolve the information in the backward recursion. They respectively correspond to the "value function iteration" and the "portfolio weight iteration", to be discussed in the following subsection. In either case, the optimization problem with respect to x t can be solved via numerical techniques. As mentioned before, there are basically two numerical approaches available for dealing with this problem, one is by grid-searching and the other is by solving the first-order conditions. These techniques are discussed in subsequent sections.

Portfolio Weight Iteration or Value Function Iteration
In the backward recursion process, after either the optimal asset allocations {x s } T −1 s=t+1 or x t+1 and v t+1 (Z t+1 ) have been determined, we need to evolve the information from time step t + 1 to time step t to proceed the recursive computation. We can consider either Equation (5) or Equation (6) for this purpose. The former is termed "portfolio weight iteration" and the latter "value function iteration". In van Binsbergen and Brandt (2007b) the authors show that more stable results can be obtained by the portfolio weight iteration. They explain their results as follows. In the value function iteration, the value function is a conditional expectation approximated by cross-path regression and approximation errors may accumulate in the backward recursion process. In the portfolio weight iteration, since the portfolio weights are bounded by borrowing and short-sale constraints, the approximation error remains bounded throughout the whole valuation process.
However, if the value function at each intermediate time step can be approximated accurately, the value function iteration should yield similar results as the portfolio weight iteration.
In the numerical tests to follow, we will see that our enhanced numerical methods perform highly satisfactory and in most cases, using the value function iteration produces comparable results as the portfolio weight iteration.

Solving First-order Conditions
When the value function v t+1 (Z t+1 ) is known, we consider the optimization problem displayed in Equation (5).
One approach to obtain the optimal asset allocation x t in Equation (5) is to solve the first-order conditions for an optimum, i.e.

E[
Since Equation (7) is not directly solvable with respect to x t , in Brandt et al. (2005) the authors proposed an approach to first approximate the value function v t (Z t ) via a Taylor series expansion and then solve the first-order conditions corresponding to the approximated function. Second-order Taylor expansion of the value function is written as 3 : The corresponding first-order conditions read: and the optimal asset allocationx t , which is assumed to be Z t -measurable, is given by: Here the conditional expectations can be approximated via simulation and cross-path regression, as done in Brandt et al. (2005), Longstaff and Schwartz (2001) and Tsitsiklis and Van Roy (2001). It is mentioned in Brandt et al. (2005) that solving first-order conditions is quite sensitive to the order of the Taylor expansion of the value function and the results from second-order and fourth-order expansions can be different. If we consider the fourth-order Taylor expansion of the value the optimal asset allocationx t is defined as an implicit solution of the following equation: This equation can be treated as a fixed point problem, x = h(x) with h(·) denoting the righthand side in Equation (9). This can be solved by an iterative method. To start the iteration, we need an initial guess of the optimal asset allocation. Following the discussion in Brandt et al. (2005), we can take the solution from the second-order Taylor expansion of the value function as the initial guess x 0 t . The iteration can be conducted by Newton's method for h(x) − x = 0: We stop the iteration, if either the 2-norm of the distance between two consecutive approximations x l t and x l+1 t is smaller than a tolerance value TOL or the number of iterations reaches a predetermined value l max . We take the last iteration x l+1 t as the final solution of Equation (9). In the numerical tests, we choose TOL = 0.0001 and l max = 30. Always the tolerance TOL can be reached, unless stated otherwise.

Stochastic Grid Bundling Method
The Stochastic Grid Bundling Method (SGBM), introduced in Jain and Oosterlee (2015), is a powerful regression-based method for calculating conditional expectations in Equations (8) and (9).
It is shown in Jain and Oosterlee (2015) that applying SGBM is highly efficient for obtaining the early-exercise boundary when pricing American-style options and the estimated path-wise option value is so accurate that the Greeks can be generated directly. In Cong and Oosterlee (2016), SGBM is implemented for solving the dynamic mean-variance portfolio management problem in a robust and efficient way. In this paper, we implement SGBM for the dynamic utility-based portfolio management problem. Similar as Brandt et al. (2005), we take the second-order Taylor expansion in the description of the algorithm for expositional ease. However, in our numerical experiments, we always employ the fourth-order Taylor expansion. Extension to fourth-order expansion can be achieved with the formulas in Equation (9). Our algorithm can be formally described as follows: Step I: Simulation.
. . , T , and set the value function at terminal time The following steps are subsequently performed at times t, t ≤ T − 1.
Step II: Bundling. We bundle the paths at time t into B non-overlapping partitions, B t (1), . . . , B t (B). Let each bundle cover a similar number of paths.
at time t + 1, or, equivalently, the optimal asset allocations read {x s (i)} are known. For these paths, we determine bundle-wise re- , which are constructed using the information at time t + 1. For any path whose state Z t is covered by bundle , the denominator of the right-hand side part in Equation (8), can be approximated by: Similarly, , the numerator of the right-hand side part in Equation (8), can be approximated by: where the regression parameters . For any path whose state Z t is covered by bundle B t (b), the optimal asset allocation is approximated by: The regression step is repeated for all bundles at each time step, so for each path we find the corresponding optimal asset allocation.
Step IV: Transition. For the i-th path in bundle B t (b), we can either apply portfolio weight iteration or value function iteration to transfer the information of the optimal investment strategy from time t to time t − 1. When using the portfolio weight iteration, we just store the optimal asset allocations If we use the value function iteration, the process is slightly more involved. For all paths

Taylor Expansion Based on a Nonlinear Decomposition
In both, the BGSS and SGBM 6 algorithms, an essential step before solving the equations for the first-order conditions is to rewrite the value function, v t (Z t ), in a Taylor series expansion in which the asset allocation x t is separated from the conditional expectations of R e t+1 . A Kth-order Taylor expansion in SGBM can be written as 7 : where g (k) Since the excess return R e t is a nonlinear transformation of the log excess return r e t , i.e.
an alternative way to perform a Taylor expansion for (x t R e t+1 + R f ) 1−γ is given by: where function g 2 (z) is defined by: Functions g 1 (x t R e t+1 ) and g 2 (r e t ) are both identical to (x t R e t+1 + R f ) 1−γ , but different ways of choosing the underlying variable yield different Taylor expansion formulas.
In Garlappi and Skoulakis (2011), the authors term the expansion described in Equation (10) as "Taylor expansion based on a linear decomposition" and the expansion described in Equation (11) as "Taylor expansion based on a nonlinear decomposition". They show that when the centers of Taylor expansions are carefully chosen, the "Taylor expansion based on a nonlinear decomposition" is more accurate than the "Taylor expansion based on a linear decomposition" when approximating the function (x t R e t+1 +R f ) 1−γ . We will call these expansions "original Taylor" and "log Taylor" expansions, respectively, in the rest of this paper. Although the log Taylor expansion has been implemented in Garlappi and Skoulakis (2009) for dynamic portfolio management, their choice of expansion center is not compatible with the algorithm discussed here. We deal with this problem by performing a log Taylor expansion around center 0, as displayed in Equation (11) 8 . According to the numerical tests in Section 5, we find that the log Taylor expansion is indeed a superior choice even when the expansion center is chosen to be 0. The reasoning is that the log excess return r e t usually exhibits a distribution similar to 6 With a little abuse of the term, we also denote our dynamic portfolio management method by "SGBM". 7 For simplicity, we describe the dynamic portfolio management problem with one risky asset here. 8 We especially choose the Taylor expansion center to be 0, because only in this case is there no term related to x t inside the power transformation, for example: the normal distribution. Therefore, a Taylor expansion with respect to this variable, i.e. the so-called "log Taylor" expansion, can yield accurate results with a limited number of expansion terms. The distribution of the excess return R e t usually exhibits a fat tailed distribution, which requires more terms in the original Taylor expansion to approximate its moments.

Grid-Searching Methods
An alternative technique to solving first-order conditions is based on grid-searching, which is an intuitive idea for solving the optimization problem described in Equation (5). In gridsearching, we reduce the optimization problem on the continuous domain to a problem on a discrete domain. For example, if we consider the allocation, x t , of one risky asset, the original optimization problem is solved on a domain [0, 1]. By grid-searching, we construct We determine the maximum, v max and denote its corresponding asset allocation as "the optimal asset allocation".
Although it is mentioned in van Binsbergen and Brandt (2007a,b) that the grid-searching method is robust and avoids a number of numerical issues regarding convergence that occur when solving first-order conditions, it should be noted that grid-searching is an expensive numerical approach. The workload of grid-searching grows exponentially as the dimensionality of the problem increases. Moreover, according to our numerical tests in the low-dimensional cases, the vBB algorithm, which employs grid-searching together with simulation, yields more "uncertain" results (larger variance) compared to the other simulation-based algorithms.
However, if we wish to find an accurate reference solution to the dynamic portfolio management problem, grid-searching seems our only choice since solving first-order conditions essentially relies on Taylor approximations of the utility function, whereas grid-searching does not. In the next subsection we will present our benchmark approach based on the idea of grid-searching.

COS Portfolio Management Method
In this section, we present a benchmark method, based on the Fourier cosine series expansion (COS) method to calculate the conditional expectations. This method was introduced in Fang and Oosterlee (2008) for pricing one-dimensional European options and later in Fang and Oosterlee (2009) for pricing one-dimensional Bermudan and barrier options. In Ruijter and Oosterlee (2012), this method was extended to the two-dimensional case. Because the COS method is not based on simulation, it can yield benchmark solutions to the investor's problem, especially in the basic case with one risky asset and one risk-free asset. Following the previous discussions, this basic investor's problem with power utility function is given by: where the terminal condition reads: v T (Z T ) = 1 1 − γ , γ = 1.
If we denote the conditional transition density function from state Z t−1 to (R e t , Z t ) as f (R e t , Z t |Z t−1 ), the investor's problem reads: The COS algorithm for calculating conditional expectations can be described in five steps: Step I: Truncate the integration range in Equation (14).
If we assume that the integrand is integrable, we can truncate the integration range from  (2012) is is the mean of Z t and ξ Z 2 the standard deviation of Z t . L should be large enough to make the truncation error acceptably low.
Step II: Expand the integrand in Fourier cosines.

If we denote the Fourier cosine expansion of
and similarly define the utility coefficients as: The primed sum means that the first term of the summation has half weight.
Step III: Truncate the infinite series. We truncate the infinite series, as follows: Step IV: Calculate the coefficients A k 1 ,k 2 (Z t−1 ). The coefficients A k 1 ,k 2 (Z t−1 ) can be approximated by F k 1 ,k 2 (Z t−1 ), as follows: Using the following property of cosines: 2 cos(α) cos(β) = cos(α + β) + cos(α − β), we can calculate F k 1 ,k 2 (Z t−1 ) by: (·) means taking the real part of the input data. ψ(u r , u Z |Z t−1 ) is the bivariate conditional characteristic function of (R e t , Z t ) given state Z t−1 : For many asset dynamics models this bivariate characteristic function is known in closed form.
Step V: Calculate the coefficients V k 1 ,k 2 (t, x t−1 ). The coefficients V k 1 ,k 2 (t, x t−1 ) are not directly related to any closed-form expression. However, we can apply numerical integration and the discrete cosine transform (DCT) to approximate V k 1 ,k 2 (t, x t−1 ). To do this, we take Q ≥ max[N 1 , N 2 ] grid points in each spatial dimension and define: R n 1 t := a R + (n 1 + 1 2 )∆R t n 1 = 1, . . . , Q Z n 2 t := a Z + (n 2 + 1 2 )∆Z t n 2 = 1, . . . , Q The midpoint-rule integration gives us The equation above can be calculated efficiently via a two-dimensional DCT, for example, with the function dct2 of MATLAB. Moreover, we can rewrite the sum of multiplications into a multiplication of sums, that is: Then, the two-dimensional DCT can be replaced by two separate one-dimensional DCTs, which helps reducing the computational time.
For state Z t−1 and asset allocation x t−1 , we can calculate the conditional expectation shown in Equation (13) by the COS method. To solve the optimization problem with respect to x t−1 , we employ grid-searching: we evaluate discretized values of x t−1 ∈ { m M |m = 0, . . . , M } and find the largest conditional expectation. The backward recursion process can be performed from time T − 1 to the initial time.
Within the COS method, we have five parameters to adjust the truncation and discretization errors. These are N 1 , N 2 , L, Q and M . Generally, larger values of these parameters lead to more accurate approximations but also to higher computational load. We use the following default parameter setting: According to our experiments, the COS method provides highly accurate results under this setting. However, when the admissible asset allocation can be chosen from a very wide range of values, the COS approach, which is based on discrete grid search, may lose its accuracy. In that case, the SGBM method equipped with the log Taylor expansion and a large number of paths will still generate satisfactory solutions and appears favorable.
Remark 4.2. The COS method suffers from the curse of dimensionality. However, this is a problem for any method involving discretization of the state space and grid-searching. To settle this issue in high-dimensional cases, adaptive discretization, or sparse grids, and grid-searching can be applied.
Remark 4.3. The computational load of the COS method for a dynamic portfolio management problem is mainly related to the DCT computations, for which the computational complexity at each time step is O(N 2 · M · Q · log(Q)). Computations at each time step are performed sequentially, but the computations for the value function at each state point are independent, so it should be possible to accelerate the COS method by parallel processing.

Numerical Experiments
In this section, we test the performance of five methods for generating the optimal dynamic portfolio management strategy. These are: • "BGSS": the method introduced in Brandt et al. (2005); • "vBB": the method introduced in van Binsbergen and Brandt (2007b); • "SGBM": SGBM with the original Taylor expansion; • "SGBM-LT": SGBM with the log Taylor expansion; • "COS": the COS method.
We impose borrowing and short-sale constraints on the asset allocations, that are therefore restricted between 0 and 1. When we implement the simulation-based algorithms, we always generate 2 14 paths. For "SGBM" and "SGBM-LT", which require bundling, we employ 32 bundles at each time step. We approximate the utility function by Taylor expansions, up to 4th-order for both the log Taylor expansion and the original Taylor expansion. For "BGSS" and "vBB", we use polynomials of the state variable up to second-order as the basis functions for the cross-path regression. For "SGBM" and "SGBM-LT", the polynomials are also second-order but in of the state variable and the return variable.
To measure the performance of a dynamic portfolio management strategy, we consider the statistic, "annualized certainty equivalent rate", CER. It describes the annualized return rate of a risk-free asset which at terminal time Y (years) yields the same utility of wealth obtained from the dynamic portfolio management strategy. Equivalently, the CER is the risk-free rate that an investor is willing to accept rather than adopting a particular risky portfolio management strategy. Formally the CER is defined by: where the value function V 0 (W 0 , Z 0 ) is defined by Equation (1). Generally, a portfolio management strategy with high CER is close to the optimal strategy and can thus be regarded as an accurate solution to the dynamic portfolio management problem. We perform numerical tests here for a basic test case where the portfolio contains one risky asset and one risk-free asset. We consider the vector auto-regression (VAR) model to describe the dynamics of the log excess return r e t of the risky asset and its log dividend yield d t , that are chosen as the state variables. Quarterly data are generated with the following process, as in Brandt et al. (2005), van Binsbergen and Brandt (2007b) and Garlappi and Skoulakis (2009) In most of the tests, the initial state, d 0 , is chosen as the unconditional mean, i.e., d 0 = −0.155/(1 − 0.958) = −3.6905. Only in Section 5.4 we will consider three quantiles, the 25%, 50% and 75% quantiles, of the unconditional distribution of state variable respectively as the initial state. The gross return of the risk-free asset is chosen as R f = 1.06 0.25 and the excess return R e t of the risky asset is R e t = R f (exp(r e t ) − 1). Associated to the 1D-VAR model, the characteristic function, which is essential for the COS portfolio management method, can be formulated as 9 :

Quality of the COS portfolio management method
We first check the validity and quality of the COS portfolio management method. For the dynamic portfolio management problem with the 1D-VAR model, we calculate the optimal asset allocations and the corresponding annualized certainty equivalent rates and compare them with the reference values from Garlappi and Skoulakis (2009). As we can see in Table 1, in case of different investment horizons and risk aversions, the COS method always provides accurate approximations of the annualized certainty equivalent rates and also highly satisfactory approximations of the optimal initial asset allocations.
As the COS method with the parameter settings in (15) and the reference method involve some approximation errors, it is difficult to say whose optimal initial asset allocation is superior. However, since it is known that first-order deviations in the portfolio policy have only secondorder welfare effects (Cochrane, 1989) and the COS method and the reference method yield similar annualized certainty equivalent rates, we consider these as the optimal solutions when comparing with simulation-based methods.
Remark 5.1. We have also tested the performance of the COS portfolio management method with different initial states d 0 . For any initial state tested, it generates very similar results as the reference values in Garlappi and Skoulakis (2009). Table 1: Initial optimal asset allocations and the corresponding annualized certainty equivalent rates of the COS portfolio management method, based on the 1D-VAR model, with reference values from Garlappi and Skoulakis (2009

Portfolio management with the buy-and-hold strategy
In this section, instead of the dynamic portfolio management problem, in which an investor decides his optimal asset allocations at intermediate times t = 0, 1, . . . , T − 1, we consider a case where the investor decides his optimal asset allocation at time t = 0 and holds a fixed amount of assets until terminal time t = T . The corresponding value function reads v 0 (Z 0 ) = max 0→T . This type of problem can be viewed as a static portfolio management problem, for which the aforementioned four simulation-based methods ("SGBM-LT", "SGBM", "BGSS" and "vBB") can be applied. The COS method is utilized to generate benchmarks for the optimal asset allocations and the corresponding annualized certainty equivalent rates. Figure 1 shows that "vBB" provides identical results to the optimal ones, since it does not involve Taylor expansion errors. For the other three candidates, in which Taylor expansions are involved, "SGBM-LT" provides the best approximation of the initial asset allocations. When the investment horizon is long, although the asset allocations of "SGBM-LT" are not close to the optimal solutions, their corresponding certainty equivalent rates are similar to the optimal ones. For the other two methods, "SGBM" and "BGSS", the estimates of asset allocations and certainty equivalent rates are acceptable only when the investment horizon is shorter than 10 quarters.
This test indicates that the log Taylor expansion (2 14 paths, 32 bundles) outperforms the original Taylor expansion for approximating the utility functions. The advantage of using the log Taylor expansion is obvious when the distribution of the accumulated excess return, R e 0→T , exhibits a fat tail.

Dynamic portfolio management with different investment horizons and risk aversion parameters
Following the discussion in van Binsbergen and Brandt (2007b), we consider for the dynamic optimization problem the portfolio weight iteration in the transfer step and compare the four simulation-based methods.
In Table 2, we observe that "SGBM-LT", among the four methods, always provides the highest certainty equivalent rates, which implies that the portfolio management strategy generated by "SGBM-LT" is most similar to the optimal one. However, when the investment horizon is long and risk aversion is high, even the results of "SGBM-LT" are not highly satisfactory. In that case, we prefer to solve the dynamic portfolio management problem by the COS portfolio management method. Regarding the simulation-based methods, "SGBM" and "SGBM-LT" are superior to "BGSS" and "vBB", since their corresponding CERs have larger means and smaller standard errors. Different from the findings in van Binsbergen and Brandt (2007b) that value function iteration also results in low certainty equivalent rates here. Table 3 shows that when using "SGBM" or "SGBM-LT", we can also get satisfactory results by the value function iteration in most test cases. Portfolio weight iteration is significantly better than value function iteration when the risk aversion is large and the investment horizon is long.  Figure 2 shows that, for any initial state, "SGBM-LT" performs better than the other three simulation-based algorithms. The intermediate asset allocations generated by "SGBM-LT" are most similar to the optimal ones. At the initial recursion steps, "vBB" also generates similar asset allocations. However, as the backward recursion progresses, the uncertainty in the "vBB" estimates grows and hence the accuracy of "vBB" gets worse.
In any case, "SGBM" and "SGBM-LT" yield estimates with low uncertainties. Moreover, we see that "SGBM-VFI" and "SGBM-LT-VFI", in which the value function iteration is considered in the recursion step, respectively, generate very similar results to those of "SGBM" and "SGBM-LT". These are advantages of the new method to calculate conditional expectations.

Influence of varying model uncertainty
If we consider higher model uncertainty in the 1D-VAR model, the aforementioned methods perform differently. The model uncertainty can be modified by introducing a multiplier M 2 to the original covariance matrix Σ of the white noise vector [ r t+1 , d t+1 ] , so that the covariance matrix of the error term will be: In this test, with a fixed risk aversion parameter γ = 10, we change the multiplier M and the investment horizon and report the certainty equivalent rates corresponding to the different algorithms.   are obtained with sample size 2 14 . In any case, "COS" yields the reference results, which are verified by using "SGBM-LT" with a large sample size 2 18 . In that case, we find, for example, the certainty equivalent rate of "SGBM-LT" has mean value 7.71 and standard error 0.02 when T = 20 and M = 4.

Errors of the Four Simulation-based Methods
In this subsection, we would like to briefly summarize the errors encountered within the methods analyzed. If we do not consider errors in the simulation part, the errors of the four simulationbased methods, "vBB", "BGSS", "SGBM" and "SGBM-LT", can be subdivided into three categories: • approximation error, which occurs when we approximate the true value functions by the Taylor series expansion.
• projection error, which occurs when we use low-order polynomials to approximate the conditional expectations of the value functions or of the approximated value functions.
• regression bias, which occurs when we use cross-path regression to approximate the conditional expectations.
The approximation error does not occur when Taylor series expansions are not involved, for example, in "vBB". However, as we have seen in the numerical tests, "BGSS" and "SGBM" suffer from this source of error in a similar fashion, while "SGBM-LT" appears to suffer less.
The projection error is the main source of error in "vBB", where low-order polynomials are implemented to approximate the value functions, which may be high-order functions when the risk aversion is high, see Equation (12). For "BGSS", "SGBM" and "SGBM-LT", this is generally not a problem since the object functions, as in Equation (9), are at most of fourthorder.
The regression bias, which has been discussed in Cong and Oosterlee (2015), can be controlled effectively by bundling. The regression bias is high in "vBB" and "BGSS" but relatively low in "SGBM" and "SGBM-LT", that benefit from their bundling technique.
A general description of the error components of the four simulation-based methods is listed in Table 5. "SGBM-LT" exhibits a highly satisfactory performance in our tests, since it has relatively small-sized errors in all three aspects. We expect however that when the risk aversion parameter is high and the model volatility is large, even "SGBM-LT" may fail to converge in some cases. In those cases, we propose either to use a large number of paths in the simulation together with more bundles or to implement a variance reduction technique.

Conclusion
In this paper, we enhance a popular dynamic portfolio management algorithm, the BGSS algorithm, in two aspects. First, for the computation of the conditional expectations appearing, we replace the standard regression method by the techniques from the Stochastic Grid Bundling Method, so that the variances of the approximated asset allocations and the corresponding certainty equivalent rates can be reduced. Then, a log Taylor expansion, based on a nonlinear decomposition, is employed in our algorithm. This expansion gives rise to improved results compared to the original ones when approximating the utility function. The resulting SGBM based portfolio management algorithm results in a lower biased approximation of the optimal asset allocations. Based on the COS method and the grid-searching technique, we have developed the COS portfolio management method for generating reference values, which are quite comparable to the reference values and further serve as the "optimal" solutions in our numerical tests.
In our tests, combining SGBM and the log Taylor expansion yields superior results to those of other simulation-based algorithms. In all testing cases, "SGBM-LT" shows the higher certainty equivalent rates. When we merely consider introducing the SGBM components in the regression step, the benefits are obvious: the value function iteration and the portfolio weight iteration associated to both "SGBM" and "SGBM-LT" generate quite similar results, which indicate that the approximation errors at each recursion step are small.
Our simulation-and regression-based algorithm "SGBM-LT" can be generalized to higherdimensional dynamic portfolio management problems. Besides, since our algorithm is robust even in scenarios with high volatility dynamics, it is also possible to focus on models with more complicated dynamics, for example, models with jump components or other time series models. In those cases, we may need some effective bundling technique as proposed in Jain and Oosterlee (2015) and Cong and Oosterlee (2015) but in each local domain we may still use low-order polynomials as the basis functions. This helps to retain the robustness of our algorithm. In this paper, our benchmark method is only employed for the case of one risky asset. It can be extended at least to solving portfolio management problems with two or three risky assets. For all algorithms, it is promising to adopt parallel computation.
One potential future research direction is to combine the grid-searching approach with SGBM. This may be useful for utility-based optimization problems, where the utility function cannot be approximated accurately by its Taylor expansion. In that case, an adaptive gridsearch combined with SGBM may constitute a generic solution.