Lenient multi-agent deep reinforcement learning

Published in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2019

Recommended citation: Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient Multi-Agent Deep Reinforcement Learning. In AAMAS (pp. 443-451). International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA/ACM. http://ifaamas.org/Proceedings/aamas2018/pdfs/p443.pdf

Much of the success of single agent deep reinforcement learning(DRL) in recent years can be attributed to the use of experiencereplay memories (ERM), which allow Deep Q-Networks (DQNs)to be trained efficiently through sampling stored state transitions.However, care is required when using ERMs for multi-agent deepreinforcement learning (MA-DRL), as stored transitions can be-come outdated when agents update their policies in parallel [9]. Inthis work we applyleniency[22] to MA-DRL. Lenient agents mapstate-action pairs to decaying temperature values that control theamount of leniency applied towards negative policy updates thatare sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation intabular fully-cooperative multi-agent reinforcement learning prob-lems. We evaluate our Lenient-DQN (LDQN) empirically against therelated Hysteretic-DQN (HDQN) algorithm [20] as well as a mod-ified version we callscheduled-HDQN, that uses average rewardlearning near terminal states. Evaluations take place in extendedvariations of the Coordinated Multi-Agent Object TransportationProblem (CMOTP) [6]. We find that LDQN agents are more likelyto converge to the optimal policy in a stochastic reward CMOTPcompared to standard and scheduled-HDQN agents.

Download paper here

Share on

Twitter Facebook LinkedIn

Gregory Palmer

Share on