Sugawara Lab, Waseda University

Two groups of 400 agents play a coordinated combat game. Shoot their opponents, and if the agent hit an opponent three times, it disappears. The group that has lost all its agents loses the game. As they learn, they get into various attack configurations where they have coordinated strategies as a whole. Since both sides are learning, each learns advanced and complex strategies over time.

Pattern formation -- An application of coordinated combat game (Japanese narration)

It is a learning of collective action where agents form a line drawing as a whole, and it is an application of the group combat game. Let's think of each point as an agent (for example, a drone in the sky). They work together to form a given shape as a whole. It has not been learned to be a specific shape, but can form any shape it wants. In fact, during training, we generate patterns that are completely different from these shapes (a group of randomly generated points) and use them to train the agents.

■ Research Videos 2（Multi-agent pickup and delivery problem by centralized control)

In the series of videos in this section, the agents are robots that deal with the problem of delivering materials located in many places to their individual destinations. There are 100 materials to be delivered in this environment. Each path is only as wide as one car, and the robots choose detours (i.e., detours) or synchronize (i.e., pause and wait) to avoid collisions. The parameter β determines the degree of priority to be given to which of these choices is more likely to be made. Also, when the agents find it impossible or difficult to work due to the high possibility of collision, they temporarily return to their garage to prevent too much congestion in the environment. Note that in these videos, the action plans are generated and maintained by a centralized component. The width of the path is limited, and if the path is narrow, the agent carrying the materials must proceed sideways. So the agent must pre-rotate in a wide enough area to orient itself properly.

Video 1

Experimental setting: The number of agents is 15, N_K = 3, N_P = 3 and β = 100 in Environment 1. (sorry, no voice for anonymity). See our EUMAS paper.

red node: parking location; blue node: pickup and delivery locations; green-filled: small nodes (where an agent can wait but cannot rotate); hollow green: large node (an agent can wait and rotate); gray rectangle: narrow edge; black rectangle: wide edge. Note that the agent carrying a big material has to consider the rotation (at large node) to pass narrow edges. Most of nodes where agents can wait are located at intersections and corners.

Under these conditions, all agents moved around the environment smoothly without any conflicts. In the latter half, some agents returned to their parking locations, but this is because they waited out of the environment due to task bias (i.e., overlapping work in particular locations). By having agents with no work waiting in the garage, they could reduce the chance of conflicts. The comb-shaped areas on both sides are the agents' parking locations (garages).

Video 2

Experimental setting: The number of agents is 15, N_K = 3, N_P = 3 and β = 100 in Environment 2. (No voice for anonymity)

The second environment (Environment 2) is topologically identical to the first one (Environment 1) and we added some nodes not only at intersections but also on edges. These nodes on edges usually connect the narrow and wide edges and agents with large materials may have to rotate at this node to pass the narrow edge. These nodes on edges can reduce the wait action because rotation at an intersection is more likely to block other agents’ movement than that at an node on the edge.

Under these conditions, all agents moved around the environment (Environment 2) more smoothly than in Environment 1 without any conflicts. Since a number of nodes, at which agents can rotate or wait for synchronization, were added to on the middle of edges in this environment, agents could move more efficiently because of less chances of blocking the movement of other agents. The number of agents that returned to their parking locations in the latter half was also smaller than that in Environment 1. The comb-shaped areas on both sides are the agents' parking locations (garages). In the following videos, only Environment 2 is used.

Video 3; Crowded environment

Experimental setting: PAPO (Env.2, No. of agents M = 40, N_K = 3, N_P = 3, β = 100) (No voice for anonymity)

The comb-shaped areas on both sides are the agents' parking locations (garages). Under this condition, the number of agents are large and the environment is easily congested. Therefore, their moves became inefficient due to so many possible conflicts (trying to move to the same node) and the additional planning for resolving the undesirable situations. Note that some agents returned their garages in our method; this could prevent excessive congestion by reducing the number of agents.

Video 4; priority to detours

Experimental setting: The number of agents is 15, N_K = 4, N_P = 1 and β = 50 in Environment 2. (No voice for anonymity)

In this experimental setting, the value of beta is 50, so very small. We can see that all agents moved around the environment smoothly without any conflicts.

Agents were less willing to avoid collisions by shifting their timing in synchronization (i.e., waiting), resulting in a higher probability of choosing detour routes. This could reduce the number of unnecessary stops for synchronization, but on the other hand, it also increased the planning time a bit. Note that the comb-shaped areas on both sides are the agents' parking locations (garages).

Video 5, Priority to synchronization (wait)

Experimental setting: Submission 168, PAPO (Env. 2, No. of agents M = 15, N_K = 4, N_P = 1, β = 800)

In this experimental environment, β = 800, the agents are more likely to use the synchronization strategy to avoid collisions, and many of them behave as if they are waiting without taking detours. Of course, it is not easy to determine whether a detour or synchronization is appropriate, as it depends on the topological structure of the surrounding paths and the locations of other agents. In this environment, reducing the value of β, as in Video 4, slightly reduces the overall moving time, but increases the planning time. Note that the comb-shaped areas on both sides are the agents' parking locations (garages).

■ Research Videos 3（Multi-agent pickup and delivery problem with fluctuated moving speed by fully decentralized control)

In the series of videos in this section, as in the previous series of videos, the agents are tasked with delivering 100 materials, scattered over many locations, to their respective destinations, but with fluctuations (often delays) in their movements that prevent them from following the plan. Therefore, the delay of one agent may cause conflicts with other agents, and their plans will need to be modified. However, this effect can also spread to other agents. However, centralized plan generation and maintenance suffers from the fact that delays occur simultaneously at many points, and the cost of calculating the elimination of these effects is high. Furthermore, even if the plans are modified, they will need to be modified again at the next time because of other delays. This is due to the fact that centralized control is a global calculation with a holistic view. In our study, the agents individually generate plans (i.e., paths) to the destinations, but since there are fluctuations in their movements, they short-sightedly check for conflicts in only one or a few steps, and if there is no problem, they move only one node, and repeat this operation. Since there is a possibility of fluctuation in the movement of agent itself and other agents even after advancing one node, the agent will proceed while checking if there is no conflicts with other agents in the current plan.

Video 1 (Fully distributed, fluctuated moving speed)

Conditions: number of agent is 14, the CCW size R is 8, detour adjustment weight delta is 0.1 and standard deviation of Gaussian noise of moving speed is 0.1. (No voice for anonymity)

We set the size R of the collision check window (CCW, how far ahead it decides to act) to be 8 (R=8), and the detour adjustment weight, delta, for detours when conflicts (such as collisions) are predicted to occur to be 0.1 (delta=0.1). In this case, since the agent proceeds carefully, checking far ahead, it tries to find the possibility of a collision early, and if it finds one, it is more likely to resolve it synchronously (i.e., by easy means such as shifting the timing by waiting) than by taking a detour. As a result, there are few places where traffic congestion occurs, and when it does, the congestion is light and quickly relieved. However, the agents are very cautious, so the planning time is slightly longer (but it is very small compared to the time of actions). Note that the red nodes in the video indicate the locations where agents attempt to reserve their stay in the CCW. The nodes that are connected in a comb-like pattern are the agents' parking lot.

Note that the agent's velocity is assumed to fluctuate with Gaussian noise (standard deviation 0.1, 10%). However, it is usually considered as a delay only, and only affects the direction of the delay.

Video 2 (Fully distributed, fluctuated moving speed)

Conditions: number of agent is 14, the CCW size R is 1, detour adjustment weight delta is 0.1 and standard deviation of Gaussian noise of moving speed is 0.1. (No voice for anonymity)

We set the size of the collision check window (CCW, how far ahead the agent decides to act), R, to 1 (R=1), and the detour adjustment weight for detours when conflicts (such as collisions) are predicted, delta, to 0.1 (delta=0.1). In this case, the agents are short-sighted and proceed by checking only the last minute, so they notice the possibility of a collision very last minute, and even if they notice it, they try to resolve it by easy synchronization (i.e., waiting and shifting the timing). As a result, traffic congestion is likely to occur, and even when detours are necessary, the resolution of traffic congestion may be prolonged due to easy "waiting. However, the agent will eventually realize that a detour is necessary and the traffic jam will be relieved. On the other hand, the planning time will be shortened. The red nodes in the video show the locations that agents are trying to reserve their stay in the CCW. The nodes connected in a comb-like pattern are the agents' parking lot.

The speed fluctuation is the same as in the previous video.

Page updated

Google Sites

Report abuse