Learning Systems Group

Home | People | Research | Publications | Courses | Admissions
 

[back]

Distributed Learning in Swarm Systems
Ling Li, Alcherio Martinoli, Yaser Abu-Mostafa

Abstract. Distributed learning is the process by which autonomous agents learn to find individual strategies that maximize the team performance. The major challenge in designing a distributed learning algorithm is to solve the credit assignment problem in a highly dynamic system where each agent has only partial and noisy information about the global task. We investigate several learning techniques in a concrete case study in collective robotics (the stick pulling experiment). Our results based on a faithful microscopic probabilistic model show that learning improves both the performance and the adaptability of the swarm system, and an initially homogeneous team usually becomes specialized after learning.

Motivation and Aims. Natural systems consisting of many agents, such as ants, wasps, and termites, exhibit complex behavior which appears to transcend the abilities of the relatively simple constituent individuals. Artificial swarm systems based on swarm intelligence consist of relatively simple autonomous agents. They are truly distributed, self-organized, and inherently scalable since there is no global control or communication mechanism. The agents are designed to be simple and interchangeable, and may be dynamically added or removed without explicit reorganization, making the collective system highly flexible and fault tolerant.

When applying rules extracted from natural systems to artificial problems, the difference between the natural systems and artificial problems essentially requires different control parameters to be used. Learning, as an automatic way to adjust control parameters, is used to adapt rules to new problems and to improve the performance. Learning also serves as a way to adapt to a changing environment.

Research and Achievements. We investigate several learning issues in swarm systems under a case study—the stick pulling experiment. This is a strictly collaborative problem where collaboration between two non-communicating robots is required to complete the task. Each robot in the experiment is characterized by a gripping time parameter (GTP), which is the maximal length of time that a robot waits for the help of another robot while holding a stick. The goal of learning is to find a proper GTP for every robot so that the team can pull up sticks from the holes in the arena as quickly as possible. We base our experiments on a probabilistic model which is faithful in simulating experiments with real robots. In order to compare learned performances with optimal solutions, we have performed a systematic search in the parameter space and measured the optimal performances of homogeneous and heterogeneous teams consisting of 2 to 6 robots.

By integrating learning ability into individual robots, the whole team can adapt according to environmental changes and can maintain a near-optimal performance (Fig. 1). We tested several learning algorithms, including adaptive line search and Q-learning. We found that, for this case study, learning algorithms which directly search for optimal parameters work much better than those based on reward estimation.

Figure 1: The performance (collaboration rate) with learning. Individual reinforcement was used and heterogeneity was allowed. Robots were initially given a gripping time. With learning, they adjusted their gripping time and achieved a higher performance. Different colors represent experiments with different number of robots. Error bars are standard deviations of performance over 50 runs. Dashed curves are performance without learning.

Compared with the optimal performance obtained from the systematic search, the learned performance is a bit lower on average (Fig. 2). We are currently investigating several hypotheses why this happens. For instance, the type of reinforcement and noise may influence the team performance after learning. Our experiments show that, although learning cannot lead to optimal performance, it does enhance adaptability and stability of the whole team. As an untested hypothesis, we conjecture that any learning model can only achieve a trade-off between optimality and adaptability.

Figure 2: Comparison of performance with different reinforcement and team diversity.

If homogeneity is not enforced, an initially homogeneous team may specialize through learning (Fig. 3). Our results show that policies allowing specialization achieve in general similar or better performances than policies forcing homogeneity (Fig. 2). We developed an ad hoc method to measure the specialization, and found that specialization is growing sub-linearly as a function of the number of robots.

Figure 3: During one simulation, 4 robots had 210s as the initial gripping time, and they specialized at the end of the simulation.

Our future work includes further study of the impact of noise on learned solutions, and the measure of specialization as a function of task constraints as well as the team size.

Rationale. The study of distributed learning in swarm systems enables us to design more powerful and more robust swarm systems that can work under changing environments and tasks which cannot be handled simply by collective mechanisms (e.g. allocate different number of units to different tasks). Furthermore, our work also helps understand the behavior of complex systems consisting of many autonomous agents.

Publications/References

A. J. Ijspeert, A. Martinoli, A. Billard, and L. M. Gambardella. Collaboration through the exploitation of local interactions in autonomous collective robotics: the stick pulling experiment. Autonomous Robots, 11(2):149–171, 2001.

K. Lerman, A. Galstyan, A. Martinoli, and A. J. Ijspeert. A Macroscopic Analytical Model of Collaboration in Distributed Robotic Systems. Artificial Life, 7(4):375–393, 2001.

L. Li. Distributed Learning in Swarm Systems: A Case Study. M.S. thesis, California Institute of Technology, Pasadena, CA, 2002.

L. Li, A. Martinoli, and Y. S. Abu-Mostafa. Emergent Specialization in Swarm Systems. In H. Yin et al., eds., Intelligent Data Engineering and Automated Learning — IDEAL 2002, vol. 2412 of Lecture Notes in Computer Science, pp. 261–266. Springer-Verlag, Berlin, 2002.


Updated: 08/16/2002