MARL in Swarm Robotics

1.0 | Prelude

As part of a master’s class, specifically, “Research Methods for Intelligent Systems”, we were instructed to conduct a systematic literature review in pairs (M. Osmanovic & I. Paulsson). Ours was conducted on the topic of Multi-Agent Reinforcement Learning in Collaborative Swarm Robotics: A Systematic Literature Review. The full literature can be accessed and downloaded here: Systematic_Literature_Review.pdf. In contrast, this blog post will contain cherry-picked or paraphrased snippets. The structure of the SLR was inspired by Barbara Kitchenham, a British computer scientist and software engineer known for her research on systematic reviews in software engineering. Note that the term systematic emphasizes the reproducibility of the literature review, both in methodology and findings.

2.0 | Abstract

A systematic literature review (SLR) was con- ducted that investigates how multi-agent reinforcement learning (MARL)-based swarm robotic systems, and their extensions, contribute to improving collaboration and adaptability in search and rescue (SAR) missions. The utilization of swarm robotics within hazardous environments has the potential to reduce human exposure to danger. Recent research within the field has resulted in significant progress, however most research is conducted in simplified and static environments with small homogeneous swarms and single-stage objectives. For the review, twenty-four relevant articles from IEEE Xplore database were systematically gathered and analyzed. Identified key research gaps include the need for greater environmental fidelity, realistic communication constraints, and models that handle dynamic multi-objective tasks.

3.0 Introduction

Research Question

How do multi-agent reinforcement learning (MARL) swarm robotic systems, and their extensions, contribute to improving collaboration and adaptability in search and rescue missions?

4.0 | Method

The IEEE search query. The systematic literature review was limited to IEEE due to project timestrain constraints and may not have considered all relevant papers.

Inclusion and Exclusion Criteria

Inclusion Criteria

Peer-reviewed publication.
Written in English.
Published between:
- Start: September 27, 2007
- End: September 27, 2025
Focus of results: Must address improvements in at least one of the following:
4.1. Multi-agent search strategy
4.2. Multi-agent collaboration
4.3. Agent adaptability

Exclusion Criteria

Studies where the model does not use Reinforcement Learning (RL)
- (e.g., Evolutionary Computation, Game Theory, etc.)
Imitation learning based approaches.
Centralized task allocation methods.
Static centralized communication structures.
Foundational or elementary models or frameworks (i.e., lacking applied or novel methodological contributions).

Cohens Kappa was used as a criterium to assess the inclusion and inclusion criteria. All gathered articles were first screened based on title and abstract only. Subsequently, the articles which were agreed upon to include or disagreed about whether or not to include were took part in the second screening stage. During the second screening stage, the full texts were analyzed. A high Kappa value was yielded.

Thereafter, the quality of each article was independently assessed by both authors using eight criteria:
(1) clarity of the problem formulation,
(2) environment fidelity,
(3) suitability of evaluation metrics,
(4) inclusion of baselines and/or ablations,
(5) statistical robustness (e.g., were experiments performed more than once?),
(6) generalization of results (i.e., diversity of environmental conditions),
(7) reproducibility (availability of code, data, configurations, etc.), and
(8) explicit discussion of threats to validity (internal and/or external).
As before, Cohen’s Kappa was calculated for each criterion (with 95% confidence intervals), and agreement was very high.

5.0 | Findings

Although reinforcement learning has contributed substantially to swarm robotics, most existing studies still evaluate their methods in environments that are nearly static. This is at odds with physical reality, which is intrinsically uncertain and constantly changing. Consequently, the adaptive capabilities of current MARL systems in dynamic swarm-robotic scenarios remain questionable.

Another key limitation concerns the widespread focus on single-objective optimization. Real-world swarm-robotic applications, however, often involve multi-stage tasks with multiple, sometimes competing objectives. Addressing such complexity would require MARL models that can simultaneously handle both multi-stage and multi-objective dynamics. Yet, only a few studies explore either dimension individually, and none integrate both.

Adaptability is also closely tied to communication. Yet, most studies do not impose realistic communication constraints on their agents. Given that communication in real systems is often unreliable or bandwidth-limited, MARL methods capable of handling such dynamic constraints are essential.

Another underexplored aspect is uncertainty itself. Very few works employ Bayesian techniques to quantify uncertainty within agents’ internal representations. Frameworks such as Karl Friston’s Free Energy Principle offer promising avenues for uncertainty-aware decision-making but remain largely absent from current MARL-for-swarm-robotics research.

For MARL approaches to genuinely support utility maximization in real-world swarm robotics, it is therefore crucial that future simulation environments reflect the uncertainties and dynamics characteristic of physical systems. At the same time, simulations alone cannot fully capture this complexity. To strengthen external validity, real-world experimentation must complement in silico studies.