Real-world reinforcement learning must optimize multiple objectives, satisfy safety constraints, and remain robust to model uncertainty, yet existing methods tackle these challenges in isolation. We introduce Multi-Perspective Actor–Critic (MPAC), a unified framework that combines value decomposition with component-specific risk assessment, enabling safety-critical objectives to act conservatively while performance-oriented objectives retain appropriate optimism.
An influence-based weighting mechanism dynamically adjusts objective importance based on decision relevance and learning progress, eliminating the need for fixed scalarization or prior reward tuning. This produces policies that are simultaneously safe, robust to perturbations, and less conservative than traditional safe or robust RL approaches.
Evaluation on a complex energy-management domain, together with continuous-control benchmarks featuring safety constraints and perturbed dynamics, shows that MPAC consistently achieves stronger multi-objective trade-offs.