Posts by Collection

portfolio

publications

Lenient multi-agent deep reinforcement learning

Published in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2019

Much of the success of single agent deep reinforcement learning(DRL) in recent years can be attributed to the use of experiencereplay memories (ERM), which allow Deep Q-Networks (DQNs)to be trained efficiently through sampling stored state transitions. However, care is required when using ERMs for multi-agent deepreinforcement learning (MA-DRL), as stored transitions can be-come outdated when agents update their policies in parallel. Inthis work we apply leniency to MA-DRL. Lenient agents mapstate-action pairs to decaying temperature values that control theamount of leniency applied towards negative policy updates thatare sampled from the ERM. This introduces optimism in the value-function update, and has been shown to facilitate cooperation intabular fully-cooperative multi-agent reinforcement learning prob-lems. We evaluate our Lenient-DQN (LDQN) empirically against therelated Hysteretic-DQN (HDQN) algorithm as well as a mod-ified version we callscheduled-HDQN, that uses average rewardlearning near terminal states. Evaluations take place in extendedvariations of the Coordinated Multi-Agent Object TransportationProblem (CMOTP). We find that LDQN agents are more likelyto converge to the optimal policy in a stochastic reward CMOTPcompared to standard and scheduled-HDQN agents.

Recommended citation: Palmer, G., Tuyls, K., Bloembergen, D., & Savani, R. (2018). Lenient Multi-Agent Deep Reinforcement Learning. In AAMAS (pp. 443-451). International Foundation for Autonomous Agents and Multiagent Systems Richland, SC, USA/ACM. http://ifaamas.org/Proceedings/aamas2018/pdfs/p443.pdf

Fully Convolutional One-Shot Object Segmentation for Industrial Robotics

Published in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2019

The ability to identify and localize new objects robustly and effectively is vital for robotic grasping and manipulation in warehouses or smart factories. Deep convolutional neural networks (DCNNs) have achieved the state-of-the-art performance on established image datasets for object detection and segmentation. However, applying DCNNs in dynamic industrial scenarios, eg, warehouses and autonomous production, remains a challenging problem. DCNNs quickly become ineffective when tasked with detecting objects that they have not been trained on. Given that re-training using the latest data is time consuming, DCNNs cannot meet the requirement of the Factory of the Future (FoF) regarding rapid development and production cycles. To address this problem, we propose a novel one-shot object segmentation framework, using a fully convolutional Siamese network architecture, to detect previously unknown objects based on a single prototype image. We turn to multi-task learning to reduce training time and improve classification accuracy. Furthermore, we introduce a novel approach to automatically cluster the learnt feature space representation in a weakly supervised manner. We test the proposed framework on the RoboCup@ Work dataset, simulating requirements for the FoF. Results show that the trained network on average identifies 73% of previously unseen objects correctly from a single example image. Correctly identified objects are estimated to have a 87.53% successful pick-up rate. Finally, multi-task learning lowers the convergence time by up to 33%, and increases accuracy by 2.99%.

Recommended citation: Schnieders, B., Luo, S., Palmer, G., & Tuyls, K. (2019, May). Fully Convolutional One-Shot Object Segmentation for Industrial Robotics. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (pp. 1161-1169). http://www.ifaamas.org/Proceedings/aamas2019/pdfs/p1161.pdf

Negative Update Intervals in Deep Multi-Agent Reinforcement Learning

Published in Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (AAMAS), 2019

In Multi-Agent Reinforcement Learning (MA-RL), independent cooperative learners must overcome a number of pathologies in order to learn optimal joint policies. However, addressing one pathology often leaves approaches vulnerable to others. For instance, hysteretic Q-learning addresses miscoordination while leaving agents vulnerable towards misleading stochastic rewards. Other methods, such as \emph{leniency}, have proven more robust when dealing with multiple pathologies simultaneously. However, leniency has predominately been studied within the context of strategic form games (bimatrix games) and fully observable Markov games consisting of a small number of probabilistic state transitions. This raises the question of whether these findings scale to more complex domains. For this purpose we implement a temporally extend version of the Climb Game, within which agents must overcome multiple pathologies simultaneously, including relative overgeneralisation, stochasticity, the alter-exploration and moving target problems while learning from a large observation space. We find that existing lenient and hysteretic approaches fail to consistently learn near optimal joint-policies in this environment. To address these pathologies we introduce Negative Update Intervals-DDQN (NUI-DDQN), a Deep MA-RL algorithm which discards episodes yielding cumulative rewards outside the range of expanding intervals. NUI-DDQN consistently gravitates towards optimal joint-policies in deterministic and stochastic reward settings of our environment, overcoming the outlined pathologies.

Recommended citation: Palmer, G., Savani, R., & Tuyls, K. (2019, May). Negative Update Intervals in Deep Multi-Agent Reinforcement Learning. In Proceedings of the 18th International Conference on Autonomous Agents and MultiAgent Systems (pp. 43-51). http://www.ifaamas.org/Proceedings/aamas2019/pdfs/p43.pdf

The Automated Inspection of Opaque Liquid Vaccines

Published in 24th European Conference on Artificial Intelligence - ECAI 2020, Santiago de Compostela, Spain, 2020

In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial’s surface. We train 3D-ConvNets to predict the likelihood of 20-frame video samples containing anomalies. Our unaugmented dataset consists of hand-labelled samples, recorded using vials provided by the HAL Allergy Group, a pharmaceutical company. We trained ten randomly initialized 3D-ConvNets to provide a benchmark, observing mean AUROC scores of 0.94 and 0.93 for positive samples (containing anomalies) and negative (anomaly-free) samples, respectively. Using Frame-Completion Generative Adversarial Networks we: (i) introduce an algorithm for computing saliency maps, which we use to verify that the 3D-ConvNets are indeed identifying anomalies; (ii) propose a novel self-training approach using the saliency maps to determine if multiple networks agree on the location of anomalies. Our self-training approach allows us to augment our data set by labelling 217,888 additional samples. 3D-ConvNets trained with our augmented dataset improve on the results we get when we train only on the unaugmented dataset.

Recommended citation: Palmer, G., Schnieders, B., Savani, R., Tuyls, K., Fossel, J. D., & Flore, H. (2020). The Automated Inspection of Opaque Liquid Vaccines. arXiv preprint arXiv:2002.09406. https://ecai2020.eu/papers/794_paper.pdf

A deep learning approach to identify unhealthy advertisements in street view images

Published in Pre-print (Under review at Nature Scientific Reports), 2020

While outdoor advertisements are common features within towns and cities, they may reinforce social inequalities in health. Vulnerable populations in deprived areas may have greater exposure to fast food, gambling and alcohol advertisements encouraging their consumption. Understanding who is exposed and evaluating potential policy restrictions requires a substantial manual data collection effort. To address this problem we develop a deep learning workflow to automatically extract and classify unhealthy advertisements from street-level images. We introduce the Liverpool 360 degree Street View (LIV360SV) dataset for evaluating our workflow. The dataset contains 26,645, 360 degree, street-level images collected via cycling with a GoPro Fusion camera, recorded Jan 14th – 18th 2020. 10,106 advertisements were identified and classified as food (1335), alcohol (217), gambling (149) and other (8405) (e.g., cars and broadband). We find evidence of social inequalities with a larger proportion of food advertisements located within deprived areas, and those frequented by students and children carrying excess weight. Our project presents a novel implementation for the incidental classification of street view images for identifying unhealthy advertisements, providing a means through which to identify areas that can benefit from tougher advertisement restriction policies for tackling social inequalities.

Recommended citation: Palmer, G., Green, M., Boyland, E., Rios Vasconcelos, YS., Savani, R., Singleton, A. (2020). A deep learning approach to identify unhealthy advertisements in street view images. arXiv preprint arXiv:2007.04611. https://arxiv.org/abs/2007.04611

talks

The Automated Inspection of Opaque Liquid Vaccines

Published:

In the pharmaceutical industry the screening of opaque vaccines containing suspensions is currently a manual task carried out by trained human visual inspectors. We show that deep learning can be used to effectively automate this process. A moving contrast is required to distinguish anomalies from other particles, reflections and dust resting on a vial’s surface. We train 3D-ConvNets to predict the likelihood of 20-frame video samples containing anomalies. Our unaugmented dataset consists of hand-labelled samples, recorded using vials provided by the HAL Allergy Group, a pharmaceutical company. We trained ten randomly initialized 3D-ConvNets to provide a benchmark, observing mean AUROC scores of 0.94 and 0.93 for positive samples (containing anomalies) and negative (anomaly-free) samples, respectively. Using Frame-Completion Generative Adversarial Networks we: (i) introduce an algorithm for computing saliency maps, which we use to verify that the 3D-ConvNets are indeed identifying anomalies; (ii) propose a novel self-training approach using the saliency maps to determine if multiple networks agree on the location of anomalies. Our self-training approach allows us to augment our data set by labelling 217,888 additional samples. 3D-ConvNets trained with our augmented dataset improve on the results we get when we train only on the unaugmented dataset.

teaching

Outreach

Computer Science, University of Liverpool, Computer Science Department, 2016

Outreach (2016 - 2018)