In multi-objective reinforcement learning (MORL), non-linear utility functions pose a significant challenge, as the two optimization criteria—scalarized expected return (SER) and expected scalarized return (ESR)—can diverge substantially. Applying single-objective reinforcement learning methods to solve ESR problems often introduces bias, particularly in the presence of non-linear utilities. Moreover, existing MORL policy-based algorithms, such as EUPG and MOCAC, suffer from numerous hyperparameters, large search spaces, high variance, and low learning efficiency, which frequently result in sub-optimal policies. In this paper, we propose a new multi-objective policy search algorithm called Multi-Objective Utility Actor-Critic (MOUAC). For the first time in the field, MOUAC introduces a Utility Critic based on expected state utility to replace Qvalue critic, value function, or distributional critic based on Q-values or value functions. To address the high variance challenges inherent in multi-objective reinforcement learning (MORL), MOUAC also adapts traditional eligibility trace to the multi-objective setting called MnES-return. Empirically, we demonstrate that our algorithm achieves state-of-the-art (SOTA) performance in on-policy multiobjective policy search.

, , ,
18th European Workshop on Reinforcement Learning (EWRL 2025)
Intelligent and autonomous systems

Peng, G., Pauwels, E., & Baier, H. (2025). Multi-objective utility actor critic with utility critic for nonlinear utility function. In 18th European Workshop on Reinforcement Learning (EWRL 2025) (pp. 1–22).