Abstract.
Distributed learning is the process by which autonomous agents
learn to find individual strategies that maximize the team performance.
The major challenge in designing a distributed learning algorithm
is to solve the credit assignment problem in a highly dynamic
system where each agent has only partial and noisy information
about the global task. We investigate several learning techniques
in a concrete case study in collective robotics (the stick pulling
experiment). Our results based on a faithful microscopic probabilistic
model show that learning improves both the performance and the
adaptability of the swarm system, and an initially homogeneous
team usually becomes specialized after learning.
Motivation
and Aims. Natural systems consisting of many agents, such
as ants, wasps, and termites, exhibit complex behavior which appears
to transcend the abilities of the relatively simple constituent
individuals. Artificial swarm systems based on swarm intelligence
consist of relatively simple autonomous agents. They are truly
distributed, self-organized, and inherently scalable since there
is no global control or communication mechanism. The agents are
designed to be simple and interchangeable, and may be dynamically
added or removed without explicit reorganization, making the collective
system highly flexible and fault tolerant.
When
applying rules extracted from natural systems to artificial problems,
the difference between the natural systems and artificial problems
essentially requires different control parameters to be used.
Learning, as an automatic way to adjust control parameters, is
used to adapt rules to new problems and to improve the performance.
Learning also serves as a way to adapt to a changing environment.
Research
and Achievements. We investigate several learning issues in
swarm systems under a case study—the stick pulling experiment.
This is a strictly collaborative problem where collaboration between
two non-communicating robots is required to complete the task.
Each robot in the experiment is characterized by a gripping
time parameter (GTP), which is the maximal length of time
that a robot waits for the help of another robot while holding
a stick. The goal of learning is to find a proper GTP for every
robot so that the team can pull up sticks from the holes in the
arena as quickly as possible. We base our experiments on a probabilistic
model which is faithful in simulating experiments with real robots.
In order to compare learned performances with optimal solutions,
we have performed a systematic search in the parameter space and
measured the optimal performances of homogeneous and heterogeneous
teams consisting of 2 to 6 robots.
By
integrating learning ability into individual robots, the whole
team can adapt according to environmental changes and can maintain
a near-optimal performance (Fig. 1). We tested several learning
algorithms, including adaptive line search and Q-learning. We
found that, for this case study, learning algorithms which directly
search for optimal parameters work much better than those based
on reward estimation.
Figure
1: The performance (collaboration rate) with learning. Individual
reinforcement was used and heterogeneity was allowed. Robots were
initially given a gripping time. With learning, they adjusted
their gripping time and achieved a higher performance. Different
colors represent experiments with different number of robots.
Error bars are standard deviations of performance over 50 runs.
Dashed curves are performance without learning.
Compared
with the optimal performance obtained from the systematic search,
the learned performance is a bit lower on average (Fig. 2). We
are currently investigating several hypotheses why this happens.
For instance, the type of reinforcement and noise may influence
the team performance after learning. Our experiments show that,
although learning cannot lead to optimal performance, it does
enhance adaptability and stability of the whole team. As an untested
hypothesis, we conjecture that any learning model can only achieve
a trade-off between optimality and adaptability.
Figure
2: Comparison of performance with different reinforcement
and team diversity.
If
homogeneity is not enforced, an initially homogeneous team may
specialize through learning (Fig. 3). Our results show that policies
allowing specialization achieve in general similar or better performances
than policies forcing homogeneity (Fig. 2). We developed an ad
hoc method to measure the specialization, and found that specialization
is growing sub-linearly as a function of the number of robots.
Figure
3: During one simulation, 4 robots had 210s as the initial
gripping time, and they specialized at the end of the simulation.
Our
future work includes further study of the impact of noise on learned
solutions, and the measure of specialization as a function of
task constraints as well as the team size.
Rationale.
The study of distributed learning in swarm systems enables us
to design more powerful and more robust swarm systems that can
work under changing environments and tasks which cannot be handled
simply by collective mechanisms (e.g. allocate different number
of units to different tasks). Furthermore, our work also helps
understand the behavior of complex systems consisting of many
autonomous agents.
Publications/References
A.
J. Ijspeert, A. Martinoli, A. Billard, and L. M. Gambardella.
Collaboration through the exploitation of local interactions
in autonomous collective robotics: the stick pulling experiment.
Autonomous Robots, 11(2):149–171, 2001.
K.
Lerman, A. Galstyan, A. Martinoli, and A. J. Ijspeert. A Macroscopic
Analytical Model of Collaboration in Distributed Robotic Systems.
Artificial Life, 7(4):375–393, 2001.
L.
Li. Distributed Learning in Swarm Systems: A Case Study.
M.S. thesis, California Institute of Technology, Pasadena, CA,
2002.
L.
Li, A. Martinoli, and Y. S. Abu-Mostafa. Emergent Specialization
in Swarm Systems. In H. Yin et al., eds., Intelligent
Data Engineering and Automated Learning — IDEAL 2002,
vol. 2412 of Lecture Notes in Computer Science, pp. 261–266.
Springer-Verlag, Berlin, 2002.