This thesis investigates several learning issues in swarm systems under a case study--the stick pulling experiment. This is a strictly collaborative problem where collaboration between non-communicating robots is required to complete the task. We base our experiments on a probabilistic model which is faithful in simulating experiments with real robots. We extend the systematic search with early stopping and get the optimal performances of fully heterogeneous teams consisting of 2-6 robots.
By integrating learning ability into individual robots, the whole team can adapt according to environmental changes and can maintain a near-optimal performance. We test several learning algorithms, including adaptive line search and Q-learning. We find, for this case study, that learning algorithms which directly search for optimal parameters work much better than those based on reward estimation.
Compared with the optimal performance obtained from the systematic search, the learned performance is a bit lower on average. We discuss several issues that may hinder learning from finding the optimal parameters, such as different reinforcement, noise, and adaptability. Our experiments show that, though learning cannot lead to optimal performance, it does enhance adaptability and stability of the whole team. As an untested hypothesis, we conjecture that any learning model can only achieve a trade-off between optimality and adaptability.
Though the team is initially homogeneous, specialization is observed after learning. Our results show that policies allowing specialization achieve in general similar or better performances than policies forcing homogeneity. We develop ad hoc methods to measure the specialization, and find that a measure of specialization is sub-linear to the number of robots.