Multi-objective utility actor critic with utility critic for nonlinear utility function

Peng, Gao; Pauwels, Eric; Baier, Hendrik

G. Peng (Gao), E.J.E.M. Pauwels (Eric) and H.J.S. Baier (Hendrik)

2025-07-17

Multi-objective utility actor critic with utility critic for nonlinear utility function

Presented at the 18th European Workshop on Reinforcement Learning (EWRL 2025) (September 2025), Tübingen, Germany

In multi-objective reinforcement learning (MORL), non-linear utility functions pose a significant challenge, as the two optimization criteria—scalarized expected return (SER) and expected scalarized return (ESR)—can diverge substantially. Applying single-objective reinforcement learning methods to solve ESR problems often introduces bias, particularly in the presence of non-linear utilities. Moreover, existing MORL policy-based algorithms, such as EUPG and MOCAC, suffer from numerous hyperparameters, large search spaces, high variance, and low learning efficiency, which frequently result in sub-optimal policies. In this paper, we propose a new multi-objective policy search algorithm called Multi-Objective Utility Actor-Critic (MOUAC). For the first time in the field, MOUAC introduces a Utility Critic based on expected state utility to replace Qvalue critic, value function, or distributional critic based on Q-values or value functions. To address the high variance challenges inherent in multi-objective reinforcement learning (MORL), MOUAC also adapts traditional eligibility trace to the multi-objective setting called MnES-return. Empirically, we demonstrate that our algorithm achieves state-of-the-art (SOTA) performance in on-policy multiobjective policy search.

Additional Metadata
Keywords	MORL, Eligibility trace, Actor critic, Policy gradient
Conference	18th European Workshop on Reinforcement Learning (EWRL 2025)
Organisation	Intelligent and autonomous systems
Citation APA Style AAA Style APA Style Cell Style Chicago Style Harvard Style IEEE Style MLA Style Nature Style Vancouver Style American-Institute-of-Physics Style Council-of-Science-Editors Style BibTex Format Endnote Format RIS Format CSL Format DOIs only Format	Peng, G., Pauwels, E., & Baier, H. (2025). Multi-objective utility actor critic with utility critic for nonlinear utility function. In 18th European Workshop on Reinforcement Learning (EWRL 2025) (pp. 1–22).

Free Full Text ( Final Version , 7mb )

Multi-objective utility actor critic with utility critic for nonlinear utility function

Publication

Publication

Address

CWI researchers

Questions or comments?

Multi-objective utility actor critic with utility critic for nonlinear utility function

Publication

Publication

Workflow

Workflow

Add Content