In addition to unicast, there is also multicast and to some extent anycast. For multicast and anycast, the IP destination address refers to a group of IP hosts. For multicast the idea is that a packet sent to the multicast group address, should reach all hosts in the group. While for anycast, it should reach only one of the hosts in the group.
We will describe briefly how multicast works and look at the different technologies involved.
We're basically interested in multicast as an IP service, but for multicast to work, there also needs to be support at the link layer, also often called layer two. This is discussed in [MCL2].
Another model is the so called "Source Specific Multicast" [SSM]. In contrast the classical model is often called "Any Source Multicast" (ASM). With SSM a host joins not G, but the tuple (S,G) where S is a unicast IP address, and G is a multicast group. Here S specifies a specific source S, and by joining (S,G) the host specifies that it wants packets from source S and not any other source. Note that the host may join several sources with the same group.
Another way of explaining this, is that when a source S sends a packet, it should reach all hosts that have either joined G or (S,G). As we will see later, things are not really this simple, but this is the basic idea.
The pair (S,G) is sometimes called a channel. A channel can have many listeners, but only one sender. Note that there are specific address ranges for SSM use. A host may also join specific sources for other groups though.
SSM is quite useful when there is only a single source, for instance a television broadcast. The source address can be announced together with the group address, and it solves possible issues with other people sending to the same group. A TV broadcaster would typically want to avoid others from also sending to the channel/group they're using. In general SSM is fine with a relatively small and fixed set of sources. Each receiver must in some way be informed what the set of sources is, and possibly allow for this set to change at any time. This is done at the application level. With ASM the whole source discovery problem is solved at the network level, which simplifies applications with many sources, like e.g. video conferencing, quite a bit. For discussion on this, and how one may simulate ASM with SSM, see [SSMDISC] and [SSMMULTI]. For more information on SSM, see also [RFC3569].
In order to receive multicast, a host needs to join the multicast groups it wishes to receive from. In the case of SSM, the host should also specify source addresses. This is done using IGMP for IPv4 and MLD for IPv6. A multicast router will periodically send queries to the hosts on the link asking which groups they are listening to. A host should also send a join message when it joins a new group, and a leave message if possible when it leaves. For a host to specify both source and group as needed by SSM, IGMPv3 or MLDv2 is needed. For more information, see [RFC3376] on IGMPv3, [RFC2710] on MLD and [MLD2] on MLDv2.
Note that a multicast router doesn't care exactly which hosts are members of which groups, neither how many. All it wants to know is whether there is at least one listener for a given group or a given source group pair.
In order to support ASM, PIM-SM takes care of source discovery by using a rendez-vous point (RP). This is a router in the PIM domain that simply said, knows which sources exist, and which groups have listeners. Initially when a new source starts sending, a multicast router on the same link as the source, will start sending "PIM register" messages to the RP; unless it's sending to an SSM group. They are sent as unicast and contain the multicast packet. The RP will then usually send "PIM register-stop" message back. If it knows of any listeners it will before sending the stop message, send "PIM (S,G)-join" messages towards the source, and make sure it receives the multicast packets natively. The routing table is used to determine where to send the (S,G)-joins, using S. The RP will send it to a neighbouring PIM router, and that again will then send a new join towards S. Finally it will reach a router on the same link as S, and the packets sent by S will be forwarded along the path built by the joins. Note that PIM "register" and "register-stop" messages are sent as unicast directly to the RP or the edge router respectively. Other PIM messages are sent as multicast only on the link, only reaching other PIM routers on that link.
When a PIM router learns that it has directly connected hosts listening to a group G or receives a (*,G)-join from another router, it will itself send a (*,G)-join towards the RP. Each router will look up the RP-address in the routing table to determine where to send the join. This builds a so-called shared tree, or RP tree (RPT), from the RP to the receivers. The multicast packets received by the RP will be forwarded down this tree.
Note that when an edge router, one with connected listening hosts, starts receiving packets on the RPT, it may as an optimization build so-called shortest-path trees (SPT's) towards the sources, instead of receiving data through the RP. It does this by sending (S,G)-joins towards the source, similar to what the RP does, and when it starts receiving on the SPT, it can prune the source from the shared tree. Some routers do this when they receive the first packet from a new source, others do it if the data rate from the source is above a certain threshold, and some never do. Note that it's possible to use SPT for some sources of a group, and use RPT for the other sources. Also, if there are hosts joining explicit sources, the router can build the SPT's at once without first receiving packets from the RP. Thus if all hosts join explicit sources, there is no need for the RP.
Usually hosts join specific sources only for SSM. The RP is necessary to do source discovery when they don't. But if hosts were to always join specific sources, it wouldn't be needed at all. PIM routers need to be configured with which RP to use for different groups. That is, for all groups where a host should be able to just join the group (not a specific source), there needs to be an RP defined. There can be different RP's for different groups. However, in a single PIM domain, all PIM routers need to be configured with the same RP-address for the same group in order to have full connectivity throughout the domain.
The RP configuration can be quite simple. It could be just a single RP for all groups, and a static configuration on each router. There are also dynamic protocols like BSR and Auto-RP. They both work by having candidate-RP's that announce themselves. For each group where there is a candidate, one is elected. If the elected RP stops sending candidate announcements, another candidate can be elected. If there is a network failure, say a link goes down, there might be separate parts with different RP's where multicast still works in each part. Multicast routing is used to distribute Auto-RP info also. So one will need to have static config for an RP for the Auto-RP group, or maybe use PIM-DM for that group. BSR on the other hand, is part of the PIM specification, and has its own mechanism for flooding BSR messages throughout the network.
With MSDP, one sets up peerings between pairs of RPs, the peerings are tcp sessions. When an RP learns of a new source from PIM-SM, it will announce it to its MSDP peers. Also, a router receiving a source announcement from one peer, will forward it to its other peers. In this way, the source announcements can be flooded throughout a network of peers. When an RP receives a source announcement for a group with local interest, which means that someone has previously sent a (*,G)-join to this RP, it will send (S,G)-join towards the source S, building a SPT from the source in the other domain. Data received on the SPT can then be forwarded as usual.
MSDP is also sometimes used between routers in the same domain. The most common example is perhaps when using Anycast-RP [RFC3446]. In a multicast domain there should normally be a single RP for a given group, but with the help of MSDP there could be multiple RP's configured with the same RP address. The hosts will then use the "closest" RP with respect to the domain's routing, possiby using different ones. But by using MSDP there will still be multicast connectivity. This allows load-balancing, and will also give faster fail-over, since the RP taking over will already have knowledge of the sources.
IPv6 has a well-defined structure of scopes for multicast. There is also a new BSR specification that allows for scope, see [BSR]. By using this it can be possible to use PIM-SM in larger networks than usual. This is done in e.g. the M6Bone [M6BONE]. It still requires some trust and coordination though, so it's not the ideal way to do things.
One way of making things scale better, could be to let each organization have a block of multicast addresses and have their own RP for that block. Then when someone in the organization creates a multicast session, a group from their address space should be used, and everyone uses their RP. The biggest problem here is how everyone can learn the RP address. It can't be done with dynamic protocols like BSR, they require too much cooperation. One suggested solution to this problem is embedded-RP [EMBRP], which specifies how the RP-address can be encoded in the group address. So when someone sends to or joins a group, routers supporting embedded-RP can immediately obtain the RP-address from the group address.
The PIM-SM specification is also being revised, see [PIMREV]. It's just a minor update, and routers using old and new specifications should work together.
Another new PIM protocol is Bi-directional PIM [BIDIR]. The most important differences from standard PIM-SM, is that there is no PIM register. On each link the multicast topology a so called designated forwarder (DF) is elected. The DF is the router on the link that is closest (the best route) to the RP address (RPA). The DF is responsible for forwarding downstream data (data going down in the tree, in direction from RPA to receivers) onto the link, and also upstream data (data in direction from source to RPA) it sees on the link, on its link towards the RPA. By doing this election on all the links, a tree is built for each RPA. This tree will be used as the usual shared tree in PIM, but also for sending data from sources towards the RPA. In other words, it's bi-directional.
To actually build forwarding state to get data forwarded towards receivers, (*,G)-joins must still be sent. Each router sends the joins towards the RPA, to the DF on that link, i.e. the joins follow the tree built by the DF's. Forwarding from source to RPA is quite simple. Provided there is a known RPA for the group, each DF for that RPA simply sees the packets on its link, and forwards them upstream. This way, one totally avoids the PIM register process. The necessary state, the DF election, is done prior to the source sending. Actually, there is no need for an RP router. One only needs an RP address that tells where in the network the tree should be rooted, and this address need not belong to any router (nor host). Not needing an RP router is a great benefit. By using only shared trees, there is no need for any router to have source specific state, which also is a big improvement for groups with large numbers of sources. Also, in the parts of the shared tree where there are only sources, one doesn't need any group specific state either.
In order to support SSM, and ASM with source-specific joins, one would still need to support (S,G)-joins to build SPT's though. We think it should be possible for a bi-dir implementation to also support that. Of course that will require per-source state, one of the advantages with Bi-dir is avoiding this state. Passing all traffic via the RP may of course cause longer delays than using an SPT.
| testnett-gruppe@uninett.no | 2004-01-08 |