Robust Distributed Decision-Making in Robot Swarms: Exploiting a Third Truth State

Based on work by Michael Crosscombe, Jonathan Lawry, Sabine Hauert and Martin Homer.

Reaching an optimal shared decision by applying only decentralised algorithms is a key aspect of many multi-agent and swarm robotic applications. As humans, we often have to come to some conclusions about the current state of the world so that we can make informed decisions based on these conclusions and then act in a way that will achieve some desired state of the world, given what we believe to be the current state. Should our belief about the current state be wrong, then we could be choosing actions that would likely lead to unintended consequences. Of course, expecting every person to have perfect, up-to-date knowledge about the current state of the world is unrealistic, and so we often rely on the beliefs and experiences of others to inform our own beliefs.

We see this too in nature, where honey bees must choose between a large number of potential nesting sites in order to select the best one. When a current hive grows too large the majority of bees must choose a new site to relocate to via a process called “swarming” – a problem that can be generalised to choosing the best of a given number of choices;  referred to as the “best-of-n” problem. To do this, bees rely on a combination of their own experiences and the experiences of others in the hive in order to reach an agreement about which is the best site. We can learn from solutions found in nature to develop our own models and apply these to swarms of robots so that, by having pairs of robots interact and reach agreements at an individual level, we can distributed the decision-making process across the entire swarm.

Decentralised algorithms such as these are often considered to be more robust than their centralised counterparts, but this is rarely put to the test. However, robustness is crucial where robots typically possess limited hardware capabilities in order to maintain cost-effectiveness at such large numbers, as well as in scenarios which might be critical to the protection or preservation of life such as in search and rescue operations. In this context we aim to introduce an alternative model for distributed decision-making in large robot swarms and examine its robustness to the presence of malfunctioning robots, as well as compare it to an existing model: the Weighted Voter Model (WVM) (Valentini et al., 2014).

The best-of-n problem

 

kilobot_img_annotated

Kilobots are small, low cost robots used to study swarm robotics. They each contain 2 motors for movement and an RGB LED and IR transmitter for communication.

In this work we study the best-of-n problem where n = 2 and compare our three-valued model with a simplified version of the WVM. In this version of the WVM, robots (specifically Kilobots) move around randomly in a 1.2m^2 arena and at any point in time are in one of two primary states: either signalling, or updating. Those in the signalling state are signalling their current belief that either “choice A is the best” or “choice B is the best”, and they continue to do so for a length of time proportional to the quality of the choice i.e. in our experiments, choice A has a quality of 9 and choice B has a quality of 7, and so those believing that choice A is the best choice will signal for a longer duration than those believing that choice B is the best. This naturally creates a bias in the swarm where those signalling for choice A will do so for longer than those signalling for choice B, and this will inevitably affect the updating robots. Those in the updating state will select a signalling robot at random from their local neighbours, provided that they are within their communication radius (10 cm limit for the Kilobots), and adopt that robot’s belief.

In the three-valued model, instead of immediately adopting the signalling robot’s belief, they instead form consensus according to the following rule: Provided that a belief in choice A corresponds to a truth state of 1 and choice B a truth state of 0, then we introduce a third truth state of 1/2 representing “undecided” or “unknown” as an intermediate state when transitioning between the truth states 1 and 0. If the two robots conflict in their beliefs such that one believes choice A (1) to be the best and the other choice B (0) then the updating robot adopts a new belief state of 1/2. If one robot has a stronger belief, either in choice A (1) or choice B (0), and the other is undecided (1/2), then the stronger belief is preserved. Through repeating this approach, both the WVM and our three-valued model eventually lead to convergence across the swarm to a single choice, where we say that the swarm has successfully reached consensus about which is the best choice. Furthermore, the swarm converges to the best of the two choices, that being choice A. We then go on to adapt our model so that a percentage of the population is malfunctioning such that the malfunctioning robots adopt a random belief state (either 1 or 0) instead of updating their belief with another robot, before continuing to signal for that random choice. We run experiments both in simulation and for physical robot swarms of 400 Kilobots.

Results

In the figure, we show results as a trajectory of the Kilobots signalling for either choice A or choice B.

kilobot
Experiments involve a population of 400 kilobots where, on average, 10% of the swarm is malfunctioning (signalling for a random choice). We can see that the three-valued model eventually reaches 100% of the (functional) Kilobots in the swarm signalling for choice A in the equivalent of just over 4 minutes. This model outperforms the WVM which, while quicker to converge to a steady state, achieves below 90% on average. The inclusion of this intermediate truth state in the three-valued model slows convergence when compared with the WVM, but in doing so provides a means for robots to not immediately adopt another robot’s belief when they are in disagreement about which choice is the best. The third truth value, representing undecided, also means that if an updating robot is in conflict with the selected signalling robot, then by adopting this truth state, their signalling of this belief will preserve any neighbouring robot’s belief should they select them while they are updating, thereby preventing them from switching to a potentially worse choice. In contrast, for the WVM, an updating robot adopting a worse belief has the potential to cause a knock on effect for other updating robots, as their belief is propagated to nearby updating robots upon entering the signalling state.
For robotic systems where malfunction is a possibility, it is therefore preferable to choose the three-valued model over the WVM. Despite slower convergence in the three-valued model, the presence of the third truth state makes the population more robust to malfunctioning robots that randomise their beliefs while in the updating state. In the WVM, robots are more susceptible to being affected by this kind of malfunction, as updating agents would immediately adopt a worse belief should it be signalled for within their communication radius. As updating agents then adopt a worse belief, this propagates through the swarm as more robots begin signalling in favour of this worse belief, increasing the likelihood of additional robots adopting a worse belief. This explains why the WVM fails to converge fully to the correct belief of choice A being the best.

 

In the video, we show a time-lapse of experiments performed on 400 Kilobots where blue lights represent those signalling for choice A, and those for choice B. Those in the intermediate state (1/2) may be coloured either red or blue. The green Kilobots are performing as if malfunctioning, such that they adopt a random belief and then signal for that belief.

Conclusions

In this work we have proposed a three-valued model for consensus in multi-agent systems and swarm robotics. We have compared this three-valued model to the weighted voter model, focussing on both speed of convergence and robustness to individual malfunction or error. Experiments were conducted using both a simulation environment and physical Kilobot swarms of 400 robots. The results from both sets of experiments confirm that the three-valued model is more robust to the presence of malfunctioning or noisy individuals than the WVM, but that the WVM is quicker to converge to the best option. Future research would consider ways to close the gap between the three-valued model and the WVM in terms of speed of convergence while maintaining improved performance to the presence of malfunction/error. We also intend to consider different distributed algorithms for decision-making which take account of numerous beliefs being signalled within a robot’s radius of communication. So far, both models considered in this work only consider a single robot’s belief while updating but there exist other models, such as majority rule models and models for opinion-pooling, which take account of the beliefs of many robots.

 

Decision-making in robot swarms has been explored by other researchers in distributed robotics, including [James Marshall], [Marco Dorigo], [Radhika Nagpal]. More specifically, our research builds on the works of [Gabriele Valentini], whose weighted voter model is compared to our proposed three-valued model, and [Andreagiovanni Reina] whose work includes a third truth state while the underlying model remains very different to our own.


 

Michael Crosscombe is an Engineering Mathematics PhD student at the University of Bristol. Follow him on twitter here .

0 Comments

Leave Your Reply

Your email address will not be published.