Lotfy, A., Alzayed, M., Chaoui, H. et Boulon, L. (2025). Centralized multi-agent SOC control for battery health using proximal policy optimization in EVs. IEEE Transactions on Vehicular Technology . ISSN 0018-9545 1939-9359 DOI 10.1109/TVT.2025.3571784
Prévisualisation |
PDF
Télécharger (1MB) | Prévisualisation |
Résumé
Abstract
Lithium-ion batteries (LIBs) have garnered significant attention due to their expanding use in various applications, including electric vehicles (EVs) and smart grids. To meet the diverse requirements of these applications, LIB cells are configured in different architectures, such as multiple cell/module/pack which are arranged in series and parallel configurations. In series configurations, a state of charge (SOC) balancing system is essential to ensure uniform SOC levels across all cells. For battery electric vehicles (BEVs), which rely solely on LIBs as their energy storage system [ESS], maximizing the ESS capacity is crucial for extending the driving range. SOC balancing is a key strategy to achieve optimal utilization of ESS capacity in EVs. This paper presents a model-free cooperative multi-agent control framework designed to regulate and balance the SOC of lithium-ion battery (LIB) cells in EVs during real-time driving operations. The proposed method utilizes a series architecture comprising three LIB cells, each equipped with a buck-boost converter and a proportional-integral (PI) controller, controlled by a reinforcement learning (RL) agent. The Proximal Policy Optimization (PPO) algorithm is used as the RL agent in this multi-agent framework, where each PPO agent independently manages the SOC of a corresponding battery cell based on observed data. During the training phase, all PPO agents work collaboratively to balance the SOCs of the LIB cells, thereby preventing interruptions in EV performance. The effectiveness of the proposed approach is demonstrated by comparing its performance with single-agent methods such as PPO, Soft Actor-Critic (SAC), and Twin Delayed Deep Deterministic Policy Gradient (TD3), as well as with other multi-agent methods. The results show that the proposed method performs better than the existing approaches, indicating its potential for superior performance.
| Type de document: | Article |
|---|---|
| Mots-clés libres: | Batteries State of charge Optimization Training Smart grids PI control Computer architecture Lithium-ion batteries Costs Battery management systems Active Cell Balancing Proximal Policy Optimization Centralized Training with Decentralized Execution State of Health |
| Date de dépôt: | 24 nov. 2025 20:13 |
| Dernière modification: | 24 nov. 2025 20:53 |
| Version du document déposé: | Post-print (version corrigée et acceptée) |
| URI: | https://depot-e.uqtr.ca/id/eprint/12428 |
Actions (administrateurs uniquement)
![]() |
Éditer la notice |


Statistiques de téléchargement
Statistiques de téléchargement