Prefix Label
Forwarding (PLF) - Modified Header Forwarding for IPv6
Previously titled: Ivip6 - instead of map-encap,
use the 20 bit Flow
Label as a Forwarding Label in the /ivip6/ directory.
2010-01-07: I intend to write this up as an ID, but for now, this page and this one lfe/ (Label Forwarding in Edge Networks) is the best account of this work in progress.
To the main Ivip page
Robin
Whittle rw@firstpr.com.au 2008-08-07 (Updated 2008-08-18)
A Short Explanation
is
here: psg.com/lists/rrg/2008/msg02076.html
(2008-08-04) - including why this proposal is unrelated to MPLS. Also,
this is just for IPv6 packets, not any other protocol. Below is
a chart
of the
short explanation, followed by a long
explanation.
I have to invent new terminology, since this is a
novel approach. Initially I called this proposal "Flow6", "Flow
Label Forwarding", then "Label Forwarding in the Core".
As
per the notes at the main Ivip page I am now calling
this approach Prefix Label
Forwarding (PFL). This is only for IPv6. A somewhat
similar approach for IPv4 is ETR
Address Forwarding (EAF). Both involve reusing bits in the
existing IP header and both require relatively minor upgrades to most
DFZ routers before they could be deployed at all.
For IPv6, 20
bits are available and I use 2^19 of these 2^20 combinations to forward
the packet to a border router which advertises the prefix in
which the ETR is located.
For IPv4 (EAF), 30 bits are
available, and I use this to encode the 30 most significant bits of the
ETR's address. Then (via suitably modified core and internal
routers) the packet is forwarded all the way to the ETR. I have
not yet written a proper description of EAF. Below is a
good description of PFL, but please read the Short Explanation above
first. |
Note, the bit numbers I
give here are ordinary binary bit numbers, where 0 is the least
significant. I need to update it to also show the IETF bit
numbers, where 0 is the most significant.
Contents
>> Chart of the example>> Historical background>> Two things
which would need to be done for this proposal to be practical>>
Advantages over map-encap (LISP, APT, Ivip4 and TRRP)>>
Advantages over Translation (Six/One Router)>>
The "Flow Label" to become the "Forwarding Label">>
Terminology and main concepts >
Conventional
networks
> SEN - Scalable
End-user Network
> SPI - Scalable PI address space
> Traffic
Engineering - load sharing
>
Micronet
> UAB - User Address Block
> MAB - Mapped
Address Block
> ITR -
Ingress Tunnel Router
> ITRD - ITR with full mapping
Database
> ITRC
- ITR with Cache
> ITRH -
ITR function in sending Host
> OITRD - Open Ingress Tunnel
Router in the DFZ
> ETR - Egress Transit Router
> CEP - Core
Egress Prefix
>
FLPER - Forwarding Label Path
Exit Router
>> Mapping system>>
QSD - Query Server with full Database>>
QSC - Query Server with Cache>> Tutorial by
way of example -
Detailed Explanation > Enhanced
RIB functionality
>
Enhanced FIB functionality
>>
Transition: non-upgraded networks>>
Transition: non-upgraded core routers>>
PMTUD (Path MTU Discovery)
>> TTR Mobility
On
other pages:
>> Using the Label Forwarding approach
in Edge networks too including:
> List of new functions for core routers Introduction
This page discusses one of the two
approaches I am suggesting for a new scalable routing and addressing
architecture for the Internet. See the Ivip page for the full context.
The proposal here is not a map-encap (LISP, APT, Ivip4,
TRRP etc.) or translation (Six/One Router) scheme - but uses a
different technique to get packets across the interdomain routing
"core" of the IPv6 Internet.
This is by using most, or all, of
the 20 bit IPv6 Flow Label header for a completely different purpose to
what it was intended for - as a Routing Label instead. This
proposal would require some modest enhancements to the FIB and RIB
functions of core BGP routers. I guess these enhancements could be
achieved with software updates. There is no change to the BGP
behaviour of the routers. I guess this is practical, considering it
will be 2015 or later before IPv6 usage is large enough that a
scalable routing solution actually needs to be implemented. IPv6
has a thousand or so advertised prefixes in 2008. It is the IPv4
Internet which has the scaling problem, with 260k+ prefixes
bgp.potaroo.net .
For
now I am calling this method of transporting an otherwise unmodified
packet across the interdomain core of the IPv6 Internet:
Prefix
Label Forwarding (PFL)
Label
Forwarding in the Core - LFC
The
20 bits is used
by each core router to direct
look up (index into an array in the FIB) the Forwarding
Equivalence Class for the packet, so it is easy for the router to
forward the
packet towards the desired network.
These 20 bits
will be
set by an ITR (Ingress Tunnel Router) before the packet gets the core,
and the rest of the packet, including its source and destination
address are not affected. The destination address is
ignored by these modified core
routers,
so the destination provider network to which the packet will be
forwarded is set by the ITR, which sets these bits according to the ETR
mapping of the micronet which matches the packet's destination address.
This only applies to a special set of prefixes
advertised by providers who perform the ETR function - provider
networks used by the new kind of end-user network.
To understand the following fully, you will need to understand
map-encap proposals in general, and ideally the Ivip proposal in
detail. The 8 Page Conceptual
Summary and Analysis of Ivip (
../) is the best place
to start.
This
page discusses my specific proposal for using this apparently novel
technique for Ivip6. Further to this, discussed in a separate
page linked to below, I also propose using the same techniques in
private networks. That will be called
Prefix Label Forwarding (PFL) Label
Forwarding in the Core - LFC .
This second purpose is not directly related to the routing
scaling problem, but I think it is worth exploring at the same time.
Ivip6f is the new name for the
proposal initially known as
FLOWv6.
I made the FLOWv6 proposal on the RRG list, on 2008-07-31:
Previously,
I discussed the
Ivip
map-encap system primarily for IPv4, but also for IPv6, although I was
less enthusiastic about it for IPv6, due in part to the heavier
encapsulation overhead caused by the IPv6 IP header being 40 bytes,
rather than IPv4's 20 bytes. See the start of this message for some
examples of map-encap overhead for IPv6:
psg.com/lists/rrg/2008/msg02034.html
(2008-07-31).
I will now use the
Ivip to refer to the
general architecture and common elements of the two scalable routing
and addressing proposals:
- Ivip4
- Map-encap tunneling for IPv4
only.
- Ivip6 -
Using
the current Flow Label bits as a Forwarding Label, to carry otherwise
unmodified packets
from ITRs to the border router which advertises the prefix in which the
ETR is
located Label - with somewhat modified RIB and FIB
functions in the core IPv6 routers.
I
believe this approach has many benefits over the alternatives:
map-encap and translation (Six/One Router). These are listed in
sections below.
Below is a revised
version of this proposal. This design is in an early stage
of development and some
parts of the description discuss finer points which are not central to
understanding the operation of the system. I put these parts in grey.
It
will take me some
time to refine the design and find a good way of presenting it, with
the more detailed discussion well separated from the basic description.
Then I will write it up as an Internet Draft.
Some
of these finer details relate to the complexities of the
various business and network models the system will need to work in.
This may
appear as complexity of the Ivip6 system, and to a degree this is true.
However, if the other proposals were fully documented
by exploring all the relevant network and business scenarios, and
to solve all the problems which arise, they too would be seen to
involve at least this level of
complexity.
Please let me know of any criticisms you have about
the design, suggestions for improvement etc. I will answer any
queries and will be happy to discuss it further by
phone.
Later,
I
intend to add separate pages on:
This
is not the place to fully explain map-encap in general or Ivip in
particular. Please refer to the Ivip page for that.
Nonetheless, the following explanation does mention some
pertinent parts of Ivip.
Chart
of the example
[#chart] Here
is a chart of how the packet is handled - for the short explanation
listed
above, and for the long explanation which
follows.
This sequence shows the Destination Address of the
packet and the 20 bits of the proposed Forwarding Label (the old
Flow Label) as packet is sent from one host in a provider network
to another host in an end-user network, via an internal router,
an
ITR border router, two core transit routers and the border router of
the recipient provider network used by the end-user network. The
ITR could be in various locations and the sending host could be in an
end-user network, or in a network with no ITRs - in which case the ITR
will be an OITRD. In all cases, the same events happen:
- ITR
looks up mapping for the destination address' micronet - and writes
the 20 bit result to the Forwarding Label.
- The ordinary
FIB
function of this and subsequent core routers use this 20 bit value to
index into an array of FEC values, which quickly tells the FIB which
interface to forward the packet on.
- The packet is forwarded
across the core like this and arrives at a border router of the
provider network to which the destination host's end-user network is
using for connection to the Net.
The
Destination Address is never changed, nor is the Source Address or
any other part of the packet - only the 20 bit Forwarding
Label.
Packet's
Action location
Sending host Packet
is emitted, with the
destination address
being that of a host which is located in an
end-user network which uses the new kind of
Ivip-managed "micronet" address space. All such
space is within 4::/3.
Forwarding Label = 0
0000
Destination Addr = 4000:0050:7000:1234::33Router
1 Packet is forwarded towards
border router, becauseInternal
it matches 4::/3 and because there is an internal router
route for this prefix, leading to the nearest border
router.
Forwarding Label = 0
0000
Destination Addr = 4000:0050:7000:1234::33Router 2Border
router which is also an
ITR.Step
1 - the FIB of
the ITR function The ITR
function of the FIB
recognises the
Destination Address is within the 4::/3 prefix
which covers all the Ivip-managed new form of
address space, for scalable support of
potentially millions of end-user networks
with portable, multihomable, address space.
The FIB does a mapping lookup (to a local query
server at first, but subsequent packets with the
same Destination Address use the cached result).
The lookup is for the Destination Address and
the mapping reply includes a caching time, the
starting address and length of the micronet which
the Destination is within, and 20 bits (bits 96 to
115) of the ETR address to which this micronet is
mapped. In our example, these 20 bits in hex are
"0 0003".
This is because the ETR address for this micronet
is:
E000:0003:0000:0055::7
- ----
Forwarding Label = 0
0003
Destination Addr = 4000:0050:7000:1234::33
Step
2 - theordinary routerFIB
function:
Since this is a core router with the modified
functionality, the FIB recognises that the
Forwarding Label is non-zero - and therefore
ignores the Destination Address, instead using the
Forwarding Label to decide which of its neighbours
to forward the packet to. To do this it needs to
determine the Forwarding Equivalence Class (FEC)
for this packet.
As described fully below, the FIB already has an
array FLFEC[2^20] into which which the RIB has
written the FEC values for all the 2^20
currently advertised /32 prefixes in E00::/12.
The Forwarding Label's value causes the FIB to
read the FEC value from FLFEC[00003]. For instance
this value directs the router to forward the
packet out of its interface 7.
The router forwards the packet to that neighbor:
Forwarding Label = 0 0003
Destination Addr = 4000:0050:7000:1234::33Router
3 1st
coretransit router:
This core router also has the modified
functionality and performs the same algorithm:
It ignores the Destination Address and uses the
Forwarding Label value to index into its FLFEC[]
array. In this example the resulting FEC value
causes the packet to be sent from interface 2 to
the neighbor:
Forwarding Label = 0
0003
Destination Addr = 4000:0050:7000:1234::33Router
4 2nd
coretransit router:
The same quick process as for Router 3.
The packet is forwarded to a (perhaps the) border
router of the provider network (or one of the
provider networks if this micronet is multihomed)
which the destination host's end-user network is
using to connect to the Net. This router
advertised the prefix E000:0003::/32 (the 0 0003rd
of this set of 2^20 such prefixes) which matches
the ETR address to which this micronet is mapped.
Forwarding Label = 0 0003
Destination Addr = 4000:0050:7000:1234::33Router
5Border
routerof recipientprovider
network:
Since this router advertised the E000:0003::/32
prefix, so it recognises that when a packet
arrives from the core with its Forwarding Label
set to 0 0003 that this is the end of that
packet's journey across the core. Its FIB zeroes
the Forwarding Label:
Forwarding Label = 0
0000
Destination Addr = 4000:0050:7000:1234::33
In this example, there is no "ETR" as such -
because this recipient provider network's
internal routing system has a route for
the address range of the end-user network.
The packet is therefore forwarded by normal
means from this router, potentially through
other internal routers, to some router which
links to the end-user network. Within the
end-user network, the packet is forwarded to
the destination host.If
the recipient network's internal routing system does not carry the
end-user network's prefix, then this border router needs to somehow get
the packet to the right ETR - the router which does have a link to the
end-user network. If there is only one ETR in the network, that
is easy. However if there is more than one, the border router
needs to get the full 128 bit address of whatever this micronet is
mapped to. That is the address of the desired ETR. See the
full explanation below for details - but usually this will involve no
delay, since this border router has already looked up and cached the
mapping information for this micronet.
Historical
background
Brian Carpenter,
writing in the RRG list on
2008-08-03:
psg.com/lists/rrg/2008/msg02067.html
provided some history which is relevant to the technique I describe
below. It is from
Louis Pouzin, who
made great contributions to the early design of packet switched
computer
networks.
OK, explained like that, it seems
coherent with Pouzin's
proposal in 1974 that the catenet
address format should be
<Format> <PSN
name> <Local name>
where
you propose to put the <PSN name> in the current flow label
field.
Pouzin's full explanation read:
"There is no need to interpret the destination address
any more than required to find an appropriate
gateway
in the correct direction. Putting gateway names in
addresses is unacceptable, as it would tie up addressing
and network topology. Thus, only PSN [packet switched
network] names should be used as catenet [internet]
addresses.
Delivering a message to a final destination
is
carried out
only by the final PSN."
[Pouzin74]
Pouzin, L., Interconnection of packet switching networks,
7th
Hawaii International Conference on System Sciences, Supplement,
pp.
108-109, 1974.
On
2008-05-29, he also wrote, in part (
psg.com/lists/rrg/2008/msg01384.html):
Now, I fear he was right, but that's
not what got implemented.
We
got a model based on fixed length addresses without a format
prefix.
I didn't see in the IPng discussion and don't see now how
we
can jettison that.
I understand that Pouzin's
original proposal was that the packet's destination address would have
a variable length local address.
What I am proposing below is
related to Pouzin's proposal in that the router looks only at a
particular part of the address section of the packet in order to
determine how to forward it towards its destination.
However, my
proposal differs in these respects:
- This is not a
generalised approach to how all Internet packets are handled, just
for solving a particular problem of getting packets from an ITR
(Ingress Tunnel Router) to a destination end-user network which
connects to the Net via one or more provider networks.
- The
packet's destination address points to the address in the final
end-user network where the packet will be forwarded to. This is
not directly related to the actual prefix of the recipient provider
network which
the core routers will be forwarding the packet to.
- The
ITR sets these 20 or so bits, based on a mapping lookup, which tells it
which provider network to forward (or tunnel) the packet to so it can
be delivered to the correct end-user network. Ordinarily, in
Pouzin's proposal, the bits used by the router to determine forwarding
are part of the whole address. In this Ivip6 proposal, these 20
or so bits
are totally separate from the destination address of the packet, and
cause the packet to be forwarded to a recipient provider network which
advertises the prefix of the ETR, which is independent from the prefix
(Mapped Address Block) in which the destination address of the packet
is located.
Two
things which would
need to be done for this proposal to be practical
The two primary things which would need to
happen
for Ivip6 to be feasible are:
Rename the 20 bit "Flow Label" in the IPv6
header to "Forwarding Label" and develop new semantics for it -
to
support Ivip6 in the core and potentially other uses in edge networks.
This would involve withdrawing
RFC 3697 and replacing
it with a new RFC.
Recent
messages (2008-08-01) from Brian Carpenter and
Tony Li indicate the bits are probably not currently used for any
substantial purpose:
Ensure a sufficiently high proportion of
IPv6 BGP routers have modest upgrades to
their FIB and RIB functionality, by the time Ivip6 is deployed.
There is no change to the BGP protocol or implementation.
These
upgrades are mentioned in the Short Explanation
#short
and are described in detail below. I guess these could be
implemented in many modern routers via a firmware upgrade.
Advantages over map-encap (LISP, APT, Ivip4
and TRRP)
The benefits
of the Ivip6 approach over the
Encapsulate (map-encap) techniques - including in most cases Ivip4 -
seem to include:
- No
header overhead. Packets remain the same length.
- No PMTUD
problems whatsoever - a Packet Too Big message will be sent to the
sending host, with the original packet details, from any router in the
full path, including the LFC portion of the path. (I assume
that the sending host won't care if the "flow label" bits in the
returned packet fragment are different - maybe this will require a
modification to to host stacks.)
- Significant reduction
in
computational effort for each such packet passing through a core
router, compared to it fighting its way through up to 48 bits of
destination address to determine the packet's Forwarding Equivalence
Class (FEC).
- Traceroute will work fine through the full path,
including the LFC portion of the path.
- Ease of continuing to
filter out packets with spoofed source addresses at border routers.
(Ivip4 already does this, since the outer header's source address
is the same as the sending host's address.) If the provider network in
which the ETR is located normally has its border routers set to reject
packets arriving from the core with source addresses matching any of
the provider network's own prefixes then this works fine and normally
with Ivip6, but does not work for LISP, APT or TRRP, where the outer
source address is that of the ITR. To implement this with LISP,
APT or TRRP, either the border routers would need to look into each
encapsulated packet and filter on the inner header's source address, or
all ETRs would need to do similar filtering on the source address of
the decapsulated packets. This filtering - looking for packets
which match any one of potentially tens of thousands of prefixes - is
extremely expensive and best done with TCAM - so it can't easily be
done for a large number of prefixes if the ETR function is done.
These
are all major
benefits which I think justify upgrading the core routers and using the
Flow Label bits for this purpose. The first three are major
factors in the complexity,
communications overhead and computational demands of handling packets
which are sent through the core to hosts in the new scalable form of
end-user networks.
Point 3 overcomes one of the major objections
I had to IPv6: the heavy computational effort routers have to do in
order to classify each incoming packet to determine its FEC, and so to
determine which of the router's interfaces to forward it on. This
depends on the length of the prefix which the destination address is
found to be in, but now that most (all?) RIRs are handing out IPv6 PI
space as /48 prefixes, it could involve routers processing the most
significant 48 bits of the destination address of incoming packets.
See my notes on Tree Bitmap and how state-of-the-art routers
actually classify packets here:
../../sram-ip-forwarding/router-fib/
- very expensive hardware with dozens of CPUs lots
of DRAM running hot. Cisco has 188 250MHz 32 bit CPUs on a single
chip!
Point 1 is partly about not causing
packet-too-big problems, since the packet is not made any longer.
Point
1
also concerns efficiency. Encapsulation for IPv6 is even more
undesirable an operation than with IPv4, since it adds at least 40
bytes to each packet for basic IP-in-IP encapsulation, whereas
the IPv4 header is 20 bytes. (Ivip4 uses simple IP-in-IP
encapsulation, and I had planned to use this for IPv6 as well.)
For IPv6 map-encap with LISP: IP, UDP and LISP headers, the
overhead is 56 bits. See details at the start of:
psg.com/lists/rrg/2008/msg02034.html
For instance, a VoIP packet stream, with 20 bytes of
payload per packet and 50 packets a second (8:1 compression of the
originally 64kbps audio) is 32,000 Bps with ordinary IPv6,
or 39,200 Bps including Ethernet headers. With IP-in-IP
encapsulation for IPv6 these rates become 48,000 at the IP packet level
and 55,200 Bps including Ethernet headers. With LISP (IP, UDP and
LISP headers) the rates become 54,400 and 61,600 Bps.
With
traffic volumes multiplying rapidly, and potentially hundreds of
millions of people sending 50 VoIP packets a second - supposedly for
free - each with
typically 20 bytes of payload, it is highly desirable to avoid
encapsulation in the scalable routing solution for IPv6.
Advantages
over
Translation (Six/One Router)
For
the July 2008 revision of
Six/One Router and my tentative critical review, please see:
psg.com/lists/rrg/2008/msg02034.html
.
The
benefits over the only currently discussed Translation scheme (Six/One
Router): seem to include:
- No address rewriting, so:
- Less complexity and
computational effort in the ITR and ETR functions ("translation
routers" in Six/One Router).
- No problems with the header
changing in ways which upset IPsec or other cryptographic protocols.
- No
contortions of bits 64 to 79 to produce the same checksum due to
changing bits 80 to 127.
- No
need for using a prefix of provider space to match each prefix of
end-user network space.
- Ivip6 includes OITRDs (like LISP PTRs)
to collect packets from non-upgraded networks, so there is full
support, including for multihoming, for hosts in networks without
ITRs. This is vital for making the system attractive even when
few other networks have adopted it.
The
"Flow Label" to become the "Forwarding Label"
This proposal involves a completely
different use for the 20 bit "Flow Label". Below, I will refer to
it as the "Forwarding Label".
The RFC which defines it
is
tools.ietf.org/html/rfc2460
(1998-12).
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| Traffic Class |* * * * * Flow Label * * * * * * * * *|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Payload Length | Next Header | Hop Limit |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ Source Address +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| |
+ +
| |
+ Destination Address +
| |
+ +
| |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
These 20 "Flow Label" bits are sick of always being
set
0000 0000 0000 0000 0000.
They all sang together that they
want
to do
Useful Work:
20-bits-singing.html The
non-normative appendix:
tools.ietf.org/html/rfc2460#appendix-A
has been superseded by:
For this Ivip6 proposal to proceed, RFC3697 would
need to be withdrawn and replaced with a new definition of the
semantics of these 20 bits.
These 20 bits are not included in
any checksums or in IPsec cryptographic integrity checks.
RFC
3697 states that the Flow Label is to be set by the sending host and
not altered for the packet's entire trip to the destination host.
It is supposed to be set in a coordinated fashion by
applications, so that if there are two separate logical flows of
packets between this host and one destination host (for instance one
concerning an HTTP session and the other being VoIP packets) then the
packets of the two flows will have different Flow Label values.
The aim is to enable routers in the middle of the network make
good decisions about sending one stream (one "flow") along one path and
the other by
another path. It is important not to send packets from one flow
along two different paths, because of the likelihood that the packets
would arrive out of order. Some discussion of how routers in the
core might do this, to improve the performance of the whole network,
are in a paper:
The
Flow Label is only useful for distinguishing two or
more flows from one host to one other host.
Although the
intention was that the Flow Label be set by the sending host, there is
nothing to stop any intermediate node, such as a router, changing its
value. This would not be detected by IPsec and there is no checksum
packet integrity checking by routers which would be upset by
a router changing the value. That the Flow Label bits
physically can be changed by routers or any other node the packet
passes through is evident from these
sections of RFC 3697 which discuss the limited security problems which
would result from such changes:
P5: . . . forging a non-zero Flow Label on packets that
originated with a zero label, or modifying or clearing
a label, could only occur if an intermediate system
such as a router was compromised, or through some other
form of man-in-the-middle attack.
P6: Hence modification of the Flow Label by a network node
has no effect on IPsec end-to-end security, because it
cannot cause any IPsec integrity check to fail. As a
consequence, IPsec does not provide any defense against
an adversary's modification of the Flow Label (i.e., a
man-in-the-middle attack).
So
it is clear that it is practical for the ITR (Ingress Tunnel Router) to
change the value of the Flow Label, which below we will refer to as the
Forwarding Label.
Terminology and main
concepts
Here
are some terms and concepts which are necessary to an understanding of
the whole Ivip6 system.
It is best to develop a good
understanding of the Ivip4 system (currently described simply as
"Ivip") from the material in the Ivip page:
../ .
Some
of the concepts described here are required in a map-encap system and
may or may not be required in Ivip6. For instance, the ETR has
definite work to do in any map-encap system, but in one mode of
operation of Ivip6 (the provider network's internal routing system
automatically handles all end-user network prefixes), there is no need
for such a device. In other modes, there does need to be
something like an ETR. I will use the term "
Core to End User Network
function" to refer to whatever has to be done to the packet
after it
arrives at the recipient provider network for it to be forwarded to to
the destination end-user network.
Conventional
networks
Provider and end-user
networks whose
address prefixes are managed exactly as they are today - by advertising
each one in the global BGP system.
Conventional networks which
have no ITRs (Ingress Tunnel Routers) are known as "non-upgraded"
networks.
There are three classes of global unicast address
space in use:
Conventional provider and end-user
networks | These will continue to
use whatever prefixes they
use today, but not within the prefixes noted below. |
SPI
(Scalable PI) address space for new
kind of end-user network | All within 4::/3
Ivip's
mapping system divides this into micronets.
All micronets are
within one MAB (Mapped Address Block) and all MABs (in this example)
are within 4::/3. MABs are advertised in BGP, but not the smaller
divisions within them, so that packets sent to these addresses from
networks without ITRs will be forwarded to Open ITRs in the DFZ
(OITRDs). |
CEP (Core
Egress Prefix) For ETR addresses | 2^20 /32 CEP prefixes, all within E000::/12
In this example, there are 2^20 CEP prefixes, each a /32.
All ETR addresses (the address to which a micronet is mapped) are
within one of these CEP prefixes. |
SEN
- Scalable End-user Network
A
network for an end-user using the new SPI
(defined below) form of address space. In order to solve the routing
scaling problem, we need to have most or all new end-user
networks, and many, most or all existing (conventional) end-user
networks adopt SPI space.
Therefore we need to make this new
form of address space and type of end-user network highly attractive to
the
great majority of end-users, of all sizes - including for instance
corporations, universities, schools and large hosting companies.
It is not good enough to say that big hosting companies shouldn't
want this kind of address space, just small ones - because most small
hosting companies want to be big, and would try to avoid the new kind
of address space if this were the case.
All SENs either have
their own ITRs (defined below, including perhaps ITR functions in the
sending host) or are connected to the Net via one or
more conventional networks which provide ITRs to handle their outgoing
packets. So the term "non-upgraded network" only applies to a
conventional network without ITRs - never to a SEN.
SPI
- Scalable PI address space
A
new
form of address space, intended solely for end-user networks (all
networks other than those of Internet Service Providers) which is
Provider Independent, but in a manner which supports scalable routing.
Conventional
PI prefixes are each globally advertised in the BGP system. The large
number of these prefixes, and their rate of change, is the cause of
the
routing scaling problem.
SPI address space remains stable for
each end-user network, no matter which one or more ISPs they use to
connect to the Net. SPI space is therefore
entirely portable and can
be used for multihoming. SPI space is typically rented from some
organisation which may have nothing to do with the current one or more
ISPs each end-user network uses to connect to the Net.
For IPv4,
translation schemes are not
suitable and there is no possible Routing Label in the IPv4 header - so
Ivip provides SPI IPv4 space via map-encap tunnels. In the
context of what follows, "SPI space" refers to the new kind of IPv6
address space provided by Ivip6, with packets being sent across the
core with Label Forwarding in the Core (LFC) "tunnels".
Traffic
Engineering - load sharing
Ideally
SPI space can also be used for inbound Traffic Engineering (TE) too. In
Ivip - both Ivip4 and Ivip6 - inbound TE is achieved indirectly
within certain limits, rather than with the explicit load balancing
arrangements of the other map-encap schemes. Ivip provides very
fine-grained control of mapping with
real-time user control - and this may enable inbound TE which
is
superior to that possible with the other, non-real-time control,
map-encap schemes.
In the non-Ivip map-encap schemes (LISP, APT
and TRRP), and in Six/One Router, ITRs (or their equivalent in Six/One
Router) perform explicit load balancing TE for all EIDs (these
proposals' name for what in Ivip is a "micronet") by being told by the
mapping information to spread the outgoing load over two or more ETRs.
This is assuming the destination network is multihomed. The
mapping information provides the ITR with "weights" for each of the two
or more ETRs. The ITR needs to choose how to send packets
statistically according to these instructions, while ensuring that all
packets of any one presumed flow are all sent the same way. (The
ITR can typically only assume different flows if the destination and/or
source address of the packets is different - we assume the ITR is not
equipped to look at source or destination port numbers.)
In the
non-Ivip schemes, end-user control of the mapping information used by
the world's ITRs is "slow". It is infeasible to change the
mapping frequently and have ITRs respond accordingly. So the TE
load sharing weights cannot be adjusted in real-time.
With Ivip,
there is no explicit TE load sharing. The mapping information
tells the ITR to tunnel packets which are addressed to a particular
micronet to a
particular ETR address. There is no concept of multiple ETR
addresses, either for load sharing or for the ITR to detect a failure
of one ETR so it can send packets to another instead. Ivip
separates out the multihoming fault detection and recovery functions
from the scalable routing system and requires end-user networks to do
this themselves - or hire someone else to do it for them. In
practice, multihoming monitoring and failure
recovery decisions are likely to be handled by a specialized company
which is contracted by the end-user network. That company's
system will automatically change the mapping to another ETR in the
event
that the current ETR is not operating correctly.
TE load
sharing can still be achieved with Ivip, but it is not possible to load
split traffic being sent to a single IP address (IPv4) or to a single
/64 prefix (IPv6). To achieve inbound load balancing TE over two
or more provider links, the end-user must split the recipient hosts
over multiple IP addresses so that they can be covered by separate
micronets. Then, each micronet can be mapped to any one of the
multiple ETRs at the multiple ISPs - thereby giving the end-user
real-time control over the incoming traffic levels of each of the links
from the two or more ISPs. (Each mapping update costs a small
fee, such as a few cents, to pay for the burden on the fast-push
system. Mapping
updates are therefore not a burden
on most parts of the Ivip system. Some end-users with busy
networks
would still find it attractive to change their mapping dynamically to
optimise their usage of the two or more providers.)
Micronet
A
contiguous sequence of IP addresses - of
the new SPI type of address space - which are mapped to a single
"locator" address. In most map-encap schemes, the micronet concept is
implemented as an EID (Endpoint IDentifier) prefix.
In Ivip4 and
Ivip6, a micronet is not necessarily a binary-boundary prefix. In
Ivip4, a micronet can start on an IP address and have a length of any
integer (to 2^24) number of IPv4 addresses. That is to say: the
granularity of Ivip4's mapping
system is 1 IP address.
Ivip6's mapping granularity is a /64
prefix. The starting point of an Ivip6 micronet is on any /64
boundary, inside whatever overall prefix the new kind of end-user SPI
space is available. In the example which follows, all SPI space
is in the 4::/3 prefix, but this is just an example, and some larger or
different prefix would be defined in the final system. (The 1/8
of the IPv6 address space still provides 2^61 /64 prefixes, or 2^45
/48s. That is 32 trillion /48s - or for a world population of 10
billion people, about 32,000 /48s, each of 64k /64s (each with up
to 2^64 hosts . . . ) for every person.
In the mapping system, a
micronet is specified by a starting address (64 bits) and a length, in
/64 steps. In principle, the length may be up to 64 bits, however in
practice it may be limited to 32 bits.
UAB
- User
Address Block
This is a contiguous
range of addresses which are controlled by one end-user. An end-user
may be as large as a corporate or university network, or simply an
individual who has a mobile device, such as a cellphone.
Typically they would rent this space from a company who runs the
MAB (Mapped Address Block, described below) within which the UAB is
located - either directly or through some intermediary company.
UABs
are integer numbers of /64s, just like micronets. They are specified by
a 64 bit starting address and a 64 bit length. ITRs, ETRs (described
below) and the mapping distribution system do not use UABs. A UAB is an
administrative construct.
End users can divide their UAB into as
many micronets as they like, and each micronet can be mapped to any 128
bit IP address - the address of an ETR (or an ETR-like function) which
forwards the packet to the destination end-user network. In the
example used in this explanation, all MABs, UABs and micronets
must be within the E00::/12 prefix.
A single UAB could be used
to create
multiple micronets, and each micronet could be mapped to a different
ETR, in any ISP in any country.
MAB
- Mapped
Address Block
A Mapped Address
Block is a BGP advertised prefix in which the enclosed address space is
managed by the scalable routing system. This space contains many
(typically thousands and perhaps many millions) of micronets.
The
individual micronets are
not
advertised as prefixes in BGP. However, the entire MAB of which
they are a part
is advertised
as a single BGP prefix.
While technically a single MAB could
provide space for just one SEN, this would help little - or not at all
- with the routing scaling problem.
Generally, each MAB should
be relatively large compared to the size of micronets. (That said, some
SENs may need only a single micronet of /64, and others may require
many more, much larger, micronets - so there isn't a typical size of
micronet.)
Generally, each MAB should include a large number of
micronets, such as hundreds or millions of them. This will enable the
micronets serving the needs of very large numbers of SEN end-user
networks to be handled from a single MAB subset of the address
space which requires
a single BGP advertisement.
Each MAB in advertised in BGP to
facilitate support of non-upgraded networks. As described below
OITRDs are ITRs in the DFZ which advertise the MABs to collect packets
sent to SPI address space by hosts in non-upgraded networks.
It
is desirable to limit the total number of MABs, since each one
contributes a prefix to the "global DFZ routing table" - which is what
we are trying to limit the size of. For instance the current IPv4
DFZ routing table size of 260k or so is regarded as undesirable -
bgp.potaroo.net - but the real
concern is that without a new scalable routing and addressing
architecture, this number will grow to half a million, a million
etc. We probably don't want more than two or three hundred
thousand MABs. There are reasons for limiting the number of
micronets in each MAB as well. There should be no problem in the
final system creating MABs which are big enough not to be excessively
numerous.
ITR - Ingress Tunnel
Router
The term "Tunnel" often
refers to a two-way
arrangement between two hosts which have already exchanged a number of
packets and which have set up the tunnel after a series of two-way
communications - which usually takes a second or two.
In a
map-encap scheme, the term refers to the ITR delivering packets to an
ETR, where the final destination address of those packets is not the
ETR address, but the address of a host in some SPI address space
of some end-user network
which the ETR connects to. This is a much more tenuous tunnel
arrangement than is typical with VPNs etc.
Firstly, it is
a one-way
tunnel. (Although it is possible that the ETR is also an ITR
tunneling packets in the opposite direction to the ETR function of the
original ITR, these would be two unrelated tunnels.)
Secondly,
the ITR needs to deliver packets reliably to the ETR without any
preliminary communications, and without having any prior knowledge of
the ETR at all. For instance, when the ITR (in this example, a
caching ITR, rather than one with a full database of mapping
information) receives a packet addressed to some micronet X, for which
it has no cached mapping information, it needs to issue a map request
message to a nearby Query Server. The response (typically within
a few tens of milliseconds) tells the ITR the ETR address to tunnel
this packet to, and any other packets whose destination address matches
the micronet which this packet's destination address is part of. (The
mapping reply returns the start and length of this micronet, as well as
the ETR address.)
The "tunnel" function for map-encap involves
encapsulating the original packet - by placing a header in front of it.
The outer header has the ETR's address as its destination
address. For Ivip4, this is a simple IP-in-IP IPv4 header.
The resulting packet is then forwarded by conventional BGP
routers to the network in which the ETR is located, and then that
network's internal routing system forwards the encapsulated packet to
the ETR.
In Ivip4 - Ivip as described elsewhere, I state
that the internal routing systems of provider networks should not
handle the address space of the end-user networks. My reasons for
stating this include difficulties keeping that routing system
responding as quickly and reliably as ITRs respond to changes in the
mapping. I will contemplate this more for Ivip4, and for Ivip6 am
considering both situations - where the internal routing system of
provider networks does and does not have a route for the end-user
network.
In a map-encap scheme the ETR strips off the outer IP
header and by one means
or another forwards the raw packet (unaltered from its state when it
left the sending host, other than its TTL having being decremented
according to the total number of routers it passed through) to the
destination SEN end-user network.
For Ivip6, this transportation
of the packet, typically across the core of the Net - to the provider
network by which the destination network connects to the Net - does not
use any extra headers, but simply involves the ITR setting the
Forwarding
Label (the old Flow Label bits) to a value which will cause all the
(Ivip6 upgraded) BGP routers in the core to efficiently forward it to
the correct recipient provider network. This is a novel
technique, and it
is not entirely clear that "tunneling" is the best term for it.
However, because "tunneling" is the best term for the
encapsulation approach of the map-encap schemes, "tunneling" will be
retained as the description of a process based on the Forwarding Label
which achieves the same purpose.
The most obvious location for
ITR function to be implemented is at the border routers of networks, to
collect and encapsulate all packets which need to be tunneled to an
ETR.
For instance, BGP routers
at the borders of all conventional networks which SENs use to
connect to the Net.
It will also be possible to locate
routers which perform the ITR
function inside the source end-user network, or inside
the provider network the end-user network uses to connect to the
Net. We
will not discuss every aspect of where ITRs could be located. For
simplicity in this explanation, we will assume that ITRs are border
routers at the edge of
a provider network, where the packets leave that network and are
forwarded to routers of other Autonomous Systems (such as transit
providers) so that they will
ultimately be forwarded to the provider network with the ETR which
connects to the SEN end-user network which has the micronet which
covers the packet's destination address.
The ITR function
processes all packets whose destination address falls within a
micronet which is part of any MAB it advertises, to set the Flow Label
bits to a value which uniquely
identifies the BGP advertised prefix towards which this packet should
be forwarded by all BGP routers in the inter-domain core. (Full
explanation below.)
ITRs can be dedicated routers or servers
running sufficient routing software that they have packets in need of
encapsulation sent to them.
ITRs can
also be located inside SEN networks and are likely to be found at the
border of an SEN network and the one or more conventional provider
networks
which the SEN networks uses to connect to the Net.
The third
place an ITR function can be found is, in effect, in the DFZ - where it
is known as an OITRD (Open ITR in the DFZ).
Like ITRCs, ITRDs
can be on conventional
addresses or SPI addresses - but not behind NAT. This is because
they need to be reachable when a Query Server has a Mapping Cache
Update message to send to it.
ITRD - ITR with full mapping Database
An ITR is really a caching ITRC with
an integrated QSD, or an ITRC
using a QSD in the same rack connected directly by Ethernet.
ITRC - ITR
with Cache
ITRs are typically
caching ITRs: ITRCs. They cache the mapping information they currently
require, and do not attempt to store a copy of the entire
mapping
database.
ITRH - ITR function in
sending Host
A caching ITR function
can also be built
into a sending host. This could be a zero or low cost method of
reducing or
eliminating the need for separate ITRs.
Like dedicated ITRs,
the ITRH needs an
address which be reached from anywhere, so they can receive Mapping
Cache Updates from query servers. This means they can be on
conventional or SPI addresses. They cannot be behind NAT.
OITRD
- Open Ingress Tunnel Router in the DFZ
Ivip's OITRDs do much the same job as
LISP's PTRs (Proxy Tunnel Routers).
OITRDs are distributed
around the Net, conceptually "in the DFZ" to attract and process
packets sent to micronet addresses by hosts in "non-upgraded"
networks:
those conventional networks which have no ITRs of their own.
In
fact, OITRDs
are within or at the border of some conventional AS network, like many
other ITRs.
A typical ITR at the border router of an AS
does not accept packets addressed to micronets from routers outside the
AS. It only accepts these packets from internal routers.
(It does this by advertising one, many or all MAB prefixes to the
internal routing system, but not to its BGP neighbours in other ASes.)
Likely
locations of OITRDs are Internet exchanges, peering points etc.
They
are ideally close to non-upgraded networks, so the total path traveled
by the packet from its source, through the OITRD and to the ETR is not
much longer than, or is the same distance as, the most direct path from
the
sending host to the ETR which serves the SEN end-user network in which
the destination host is located.
Ideally, in the future, all
conventional IPv6 networks will have their own ITRs and OITRDs will not
be needed.
An OITRD advertises one or more MABs to its BGP
neighbours in other ASes and so attracts packets sent from nearby
non-upgraded networks.
It then does what all ITRs do: use
mapping information to set the Forwarding Label of the packet so that
(upgraded) core routers will forward the packet
towards the ETR to
which this micronet is currently mapped.
The
business case for
Ivip6 OITRDs is identical to that for Ivip4 OITRDs, as discussed
in this message:
The above message was written just
before
FLOWv6 was developed, to it assumes that Ivip for both IPv4 and IPv6
will use map-encap. The fact that Ivip6 ITRs, including OITRDs,
use Label Forwarding in the Core rather than map-encap does not
alter
the business case for their deployment.
ETR
- Egress
Transit Router
The ETR is a
required physical device in Ivip4 and the other map-encap schemes.
It may be required as a physical device in Ivip6, or there may be
no need for an actual device. Although it may be somewhat
confusing, this description of Ivip6 generally assumes that there is a
physical ETR. Below we discuss a scenario in which there is no
need for an actual ETR - when the end-user network's micronet is
handled by the internal routing system of the provider by which the
end-user network connects to the Net.
Ivip6 uses the Forwarding
Header so the core BGP routers - and perhaps internal routers in the
network(s) of the sending host, between the ITR and that network's BGP
border
router - will forward the packet towards this ETR address. This use of
the Forwarding Header for tunneling across the core is somewhat
more elaborate and
flexible than in the map-encap system and is described more fully
below, however in many ways it is a simpler and more elegant
approach.
There
is no encapsulating header to remove - as there is for map-encap ETRs.
There is no need to rewrite addresses, as there is in the
receiving Translation Router in Six/One Router (the equivalent of
the ETR in a map-encap scheme).
In scenarios where the ETR
exists as an actual device, it will probably zero the
Routing Label bits when it receives a packet forwarded from the
ITR.
The
most important function the ETR performs is that it recognises from the
destination address which SEN network the packet should be forwarded
to, and forwards the packet to that network. The ETR could connect to
one or to many separate end-user networks.
As the packet was
forwarded across the core of the Net, the destination address has been
ignored by these BGP core routers, due to them
using the Forwarding
Label to decide which port to forward the packet on.
CEP
- Core Egress Prefix
As described
more thoroughly below, the
IPv6 address space is administered to create a regular series of
prefixes, each of which can be advertised in BGP.
In this
explanation, there are 2^20 such prefixes: 1,048,576. Each has the same
length, say /32. /48 would probably be fine too.
In practice,
fewer than these would be required, so the final system may have half
this number or less.
A conventional provider network which has
one or more ETRs (or which connects one or more SEN end-user networks
to the Net) and which has a single "site" - such as a network in
a city, or a data centre - needs one CEP. If it has multiple such sites
and does not want to ferry traffic between them which addressed to
ETRs, then it needs a separate CEP for each such site.
The
mapping of a micronet to a particular ETR address is constrained
so that that address is always within a particular block of address
space: such that it always is within one of the CEPs.
ETRs
are located on addresses within one of these CEPs. Below we discuss
administrative arrangements for this limited resource of about a
million CEPs. Each advertised CEP places a burden on the control
plane of the global BGP system - adding another route to the "global
BGP routing table" (sometimes this is states simply as "adding another
route to the DFZ"). To reduce the routing scaling problem, it is
desirable to have as few advertised prefixes as possible, so there
needs to be good reason for a provider to obtain a CEP prefix.
[#core-edge] CEP
prefixes can also be used for other purposes than the one discussed
here: as
the prefix which the Routing Label causes the tunneled packet to be
forwarded to. In principle it is possible that a provider could
also use the CEP prefix for the space it uses internally and for its PA
customers. In that case, each provider site might advertise just
the single CEP prefix, at one or more border routers. This would
involve abandoning current address assignments, and would not be
necessary to achieve a scalable routing system. However, in
principle, it is an attractive outcome for those who seek a clear
separation between core and edge.
The inter-domain core could
consist purely of BGP routers handling traffic between providers, all
of which use a single CEP at each of their sites. All end-user
networks would use the new kind of address space. All traffic
traveling across the core to any end-user network would be forwarded
according to its non-zero routing label, which is easier for
the core routers than conventional Tree-Bitmap analysis of the
destination address until a matching prefix is found. [#number-of-CEPs]
It
may never be necessary to use the full possible number of CEPs -
2^20, 2^19. Perhaps a few tens of thousands will be all
that is required for the foreseeable future, assuming IPv6 is widely
adopted. For instance, if there are 10 billion people, split into
cities of 1 million each, there are 10k cities. An average ISP
serves a million customers, and has branches in 5 cities. So each
ISP has 5 sites and 50k CEPs are required.In the TTR
(Translating Tunnel Router) extensions to map-encap for mobility, there
are likely to be multiple competing TTR companies, each with TTRs in a
wide variety of locations all around the world. The TTR
behaves exactly like an ETR to the map-encap system. The fact
that it has a two-way tunnel to the Mobile Node (MN) does not affect
the operation of the map-encap system. Nor does the fact that the
TTR may also be performing ITR functions on packets being sent out from
the MN.Side-note on assignment of CEP prefixes to
support
TTR mobility
The TTR approach to
mobility is equally applicable to Ivip6. The TTR's private
two-way arrangements with its MN are similarly not a concern for the
main Ivip6 system, including its mapping system.
There is,
however, a challenge for both Ivip4 and Ivip6 in providing the prefixes
by which these TTRs connect to the core. If there were 100 TTR
companies, each with a TTR site (one or more TTRs) at 1000
geographically and topologically dispersed sites, then this could
require 100k additional advertised prefixes (CEP prefixes in the case
of Ivip6) just for these TTRs. While there may never be this
number of TTR companies, or a real need for this number of separate TTR
sites, we need to consider the impact of this system on the number of
advertised prefixes.
One approach is to
consider mobility, and
competition between TTR companies, as a worthwhile reason for adding
100k prefixes to the global routing table. For Ivip6, this also
means taking up 100k or so of the 2^20 or 2^19 possible CEPs.
This too may be judged as reasonable and desirable compared to
the alternatives. However this brute-force approach is not the
only method by which this number of TTRs could be located as desired.
One
approach is to have a single CEP prefix for each physical site, and for
the various TTRs of the various companies to share it. (In
practice, it seems likely there wouldn't be 10 different TTRs at a
site, but one or a few, with the company which owns those TTRs
contracting out its services to the various customer-facing TTR
companies.)
An objection to this is that all the TTRs at that
site would depend on a single router which advertises the CEP prefix.
The workaround for that is two such routers at the same site.
Another approach is based on the notion that some
or
many TTR sites will need to be close to mobile access networks, so the
most logical location for them is inside the provider network of that
mobile access network. There, they could all use a single CEP,
including perhaps the CEP the mobile provider has for that site.
FLPER
- Forwarding Label Path Exit Router
The
FLPER is the BGP router at which
the traffic packet, having had its
Forwarding Label set by an ITR, completes its journey across the core.
This
is fully described below in the Tutorial by way of Example section. The
FLPER
is a BGP router at the border of a
the recipient provider network - router which advertises the
CEP prefix which encloses
the 128 bit address to which the packet's destination address micronet
has been mapped.
There
may be one or more border routers which advertise this CEP prefix.
The FLPER may perform the ETR
function, or if there is no
need for an ETR function - as is the case where the provider network's
internal routing system directly handles the micronet address space -
the FLPER has almost nothing to do. In this scenario,
the packet's path results naturally from Ivip6's requirement that core
routers use a non-zero Routing Label to forward the packet. A
border router which advertises the CEP prefix which corresponds to the
value of the Routing Label will recognise this and therefore recognise
the packet has completed its trip across the core. The Routing
Label now contains information of no further importance, so the FLPER
zeroes it and looks at the packet's destination address - which is to a
micronet of a
particular end-user network. Since, in this example, the
provider's internal routing system advertises a route which covers this
micronet exactly, or covers a wider address range which encloses this
micronet, the FLPER forwards the packet according to the internal
routing system rules. The only action taken by the FLPER is to
zero the Routing Label bits. However, this may not strictly
speaking be necessary if internal routers ignore those bits. To
Do: explain more about how packets from sending hosts find their way to
an ITR en-route to the border router, or to the border router which
also does the ITR function, or when the ITR function is in the sending
host.
To Do: Note and explain more fully that internal
routing
systems always operate on binary boundary prefixes, but that a micronet
can begin and end on any /64 boundary.
To Do: Link to
discussion of why I think it is best for internal routing systems of
provider networks not to have routes for the prefixes of end-user
networks, but to rely on every packet (including those sent from within
the provider network) going via an ITR and an ETR, rather than relying
on the internal routing system to respond rapidly to mapping changes.
Mapping
system
This is a system by which
end-users can issue commands which change their micronets' starting
points and addresses and by which they can change
the 128 bit
address within a CEP to which each micronet is mapped.
In Ivip4,
each
micronet is mapped to a specific 32 bit address: the address of the
ETR, which will remove the outer header and forward the packet to the
end-user network. The ITR needs this full 32 bit address, because
it writes that address to the destination address of the encapsulating
header.
In
Ivip4, the mapping distribution system fast-pushes all changes to the
mapping database to tens of thousands of full database Query Servers
all over the world. Each mapping record consists of the 32 bit
starting address of the micronet, its length (24 bits) and the full 32
bit address of the ETR.
[#20-bits-only]
In
Ivip6, for each micronet, the mapping system holds a
full 128 bit address of a physical or notional ETR, to which all
packets
addressed to that micronet will be tunneled. There are
some differences from the Ivip4 situation, which generally
simplify the
requirements on the Ivip6 mapping system:
- The ITR does not
need the full 128 bit address of the real or notional ETR. It
only needs the 20 bits (or probably 19 or perhaps less in a practical
system) which it will write into the Forwarding Header. This
is a
significant reduction in the amount of data the mapping system must
send to full database Query Servers all over the world.
- If
any of the three following conditions is true then there is no need for
the FLPER to find out the full 128 bit ETR address to which a micronet
is mapped. (This matches map-encap, in that the ETR does not need
to look up any mapping for the packet - the outer destination address
is that mapping value, and the ETR has this address - and that address
is no longer needed.)
- There
is no physical ETR - the recipient provider network's internal
routing system handles the micronet address space
- The one
FLPER, or perhaps any one of the FLPER (they all advertise the CEP
prefix) is capable of forwarding the packet to the end-user network,
without relying on any ETR or on the provider's internal routing system.
- There
is only one ETR, and the FLPER(s) will forward all packets received
from ITRs to that ETR.
- Alternatively to 2, if the FLPER
needs to decide which ETR to send the packet to, then it needs to do a
mapping lookup of the full 128 bit "ETR" address. This is never
required for a map-encap ETR.
I have further work to do on
this last situation. If there is a way of satisfying this need by
some FLPER for the full 128 bit ETR address, without fast pushing the
full 128 bits around the world to tens of thousands of Query Servers
(since otherwise, only 20 bits are required) then this would
significantly lighten the load on the fast-push mapping distribution
system. It would also lighten the storage load for full database
query servers.
This is a specialized topic, not directly
relevant to the explanation:
One approach would be to only send the 20 bit
value via the fast push system, but have each FLPER router covered by
one of two scenarios:- The FLPER gets a full feed of mapping
updates, just like a QSD. Then it can see when any mapping change
mentions the CEP prefix it advertises.
- The FLPER has an
arrangement with one or more nearby QSDs to send it all those mapping
changes which mention the CEP prefix it advertises. This would be
an additional function to add to the QSD, which would be regrettable.
Unfortunately, it would also be complex, because the QSD would
need to catch not just the mapping changes which involved a micronet
being mapped to an ETR address in this CEP, but mapping changes which
involved a micronet formerly mapped to an address in this CEP
being mapped to some other address, either within the CEP or to some
CEP prefix other than this one .
If this could be assured, this would be a good
solution if the FLPER could query some query server to retrieve the
full 128 bit ETR address for the micronets which were mapped to any ETR
address in this FLPER's CEP. The volume of these queries and
responses would be pretty low, which means a handful of specialized
servers in the world would handle the load. I will think more
about this, and about any delay or unreliability problems which might
prevent the FLPER from doing its job properly the instant the mapping
was changed. Below I will assuming the matters
mentioned in grey above can be resolved, so the Ivip6 fast-push mapping
system only needs to push a 20 (19?) bit "ETR address" (actually the
relevant bits of the CEP prefix which enclose the ETR address) in each
mapping update..
In Ivip6, the mapping system pushes to
all full database Query Servers the 64 bits which define the starting
address of the micronet (on a /64 boundary), its length, which we can
probably safely limit to 32 bits (maximum micronet size is one /32) and
the 20 bit value (19 bits in practice?) which defines the CEP
prefix of the provider network the packet should be forwarded to.
This
is would be another advantage of the Ivip Label Forwarding in the Core
approach compared to map-encap. If map-encap was used
for
IPv6, the full 128 bit address would need to be sent to all the full
database Query Servers, and to every ITR which needed this micronet's
mapping.
So, for each micronet, the amount of mapping
information pushed globally would be:
Micronet
Micronet "ETR" Total
Total
start
length
address bits bytes
IPv4: Ivip4
32
24
32 88 11
IPv6:
Ivip6
64 32
20 116 14.5
IPv6: map-encap-A 64
32 128
224 28
IPv6: map-encap-B 128
7 128 *
N 263+ 33+
Ivip6
should requires only 14.5 bytes to be sent by the fast push mapping
distribution system for each micronet. Of course there would be
considerable overhead, but also perhaps some compression, in the full
practical system. In the
example below, some compression could be obtained, including some due
to all
micronet space being within the 4::/3 prefix, which reduces the number
of bits required to specify the micronets's start point to 61
bits.
By comparison, figures are shown for
"map-encap-A" which is the Ivip4 approach to map-encap as it would be
applied to IPv6, with the full 128 bit address of the ETR.
Figures
are also shown for a notional "map-encap-B" which represents, in part,
the mapping information to be sent by LISP, APT or TRRP. Note
that only LISP-NERD and APT involve pushing all the mapping data to
thousands of ITRs or Default Mappers (APT full database Query Servers).
For a single-homed EID prefix
(all these systems use binary
boundary prefixes, rather than Ivip's more flexible micronets) -
without any TE load sharing arrangements or multihoming failover
selection for multiple ETRs - the mapping data presumably includes a
full 128 bit base address for the EID prefix. This is in
accordance with the RRG rough consensus (To Do: link to the message in
June) that the mapping granularity should be a single IP address.LISP, APT and TRRP use a 7 bit binary number to
specify the length in
bits of the EID prefix, which is more compact than Ivip6's use of a 32
bit integer to specify the number of /64s spanned by the micronet.
For a single-homed EID prefix, a single 128 bit ETR address is
required. For multihomed EID prefixes, multiple 128 bit ETR
addresses
need to be specified ("* N" in the table above), with further bits for
TE weighting and for priorities when choosing which alternative ETRs to
tunnel packets to when the ITR decides than an ETR is unreachable. QSD
- Query Server with full Database
QSDs
get the full continual feed of mapping updates from the fast push
mapping system.
They handle queries from nearby ITRs - ITRCs and
ITRHs. An ITRD (full database ITR) is really a caching
ITRC with an integrated QSD, or an ITRC
using a QSD in the same rack connected directly by Ethernet.
QSC
- Query Server with Cache
These
can optionally be deployed, so there may be one or more layers of
QSCs between ITRCs/ITRHs and the nearest one or several QSDs. The
purpose is to reduce the number of QSDs needed.
When
a QSC has no cached information which answers a query, it pass the
query upwards to (or towards, via one or more QSCs) the nearest
local QSD. When the QSC receives the reply, it caches it and
sends
the response downwards to (or towards, via one or more QSCs) the
ITRC/ITRH which made the request.
Likewise, when a QSC gets a
Mapping Cache Update message from a QSD above it (perhaps via one or
more
QSCs), it passes it downwards to whatever ITRCs, ITRHs or QSCs below
it which, in the last 10 minutes (for instance) queried the mapping
for this micronet.
Mapping Replies and any subsequent Mapping
Cache Updates are secured by a nonce which the querier places in the
query which gave rise to them.
Tutorial
by way of
example - Detailed Explanation
This
is a detailed explanation of the system, using a particular example.
Please see the chart above
#chart - which
illustrates this explanation.
For
simplicity, we assume that all core IPv6 routers have been upgraded
for Ivip6. In a section below we discuss transition arrangements
while not all routers have Ivip6 upgrades.
We will also ignore
OITRDs in this explanation - the ITRs which collect and tunnel to ETRs
the packets sent from non-upgraded networks which are addressed to
micronet addresses.
In this example, CEPs (Core Egress Prefixes)
are /32s and a prefix
E00::/12 has been reserved for them. Consequently, the first few CEPs
are:
CEP-0 E000:0000::/32
CEP-1
E000:0001::/32
CEP-2
E000:0002::/32
and
the highest is:
CEP-1048575
E00F:FFFF::/32
In our example, so far, 8191 CEP
have been allocated - only to operators of provider networks. CEPs are
only
needed by provider networks which SEN end-user networks use for their
connection to the Net. Assuming Ivip6 is widely adopted, this
would include all, or almost all, provider networks.
CEP-0 is
reserved. (The final design may reserve more low numbered or
high-numbered CEPs for other purposes.) In our example, the allocated
CEPs include:
CEP-0001 E000:0001::/32 ISP-A
(has only one
"site")
CEP-0002
E000:0002::/32 } ISP-B (has 30 "sites")
CEP-0003
E000:0003::/32 }
... }
CEP-0031
E000:001F::/32 }
CEP-0032
E000:0020::/32 } ISP-C (has two "sites")
CEP-0033
E000:0021::/32 }
It
is not desirable to have a million CEPs, since each is advertised in
BGP and so places a burden on the entire core routing system.
CEPs
are only allocated to organisations which need them, and pay for
them. (To Do: develop plans for administering these CEPs and for the
commercial and regulatory aspects of Ivip6.)
The
ISPs generally have other "conventional" prefixes, outside this special
CEP set - as they do today. The ISPs use these "conventional"
prefixes for their own internal purposes, and for some of their
customers. Those customers use the space in today's "PA" manner.
Whether they get a single IP address or a prefix, and whether they
get it for a short dial-up or mobile session, of for some period of
years, the space they get is only available as long as they use this
ISP. It is "PA" - Provider Assigned - space and therefore not
portable to other ISPs.
These conventional prefixes and their PA
usage has nothing to do with the SPI space provided by Ivip6.
We
will consider two end-user networks with SPI space: Net-X and Net-Y.
For simplicity of explanation, these micronets are from the same MAB
(Mapped Address Block).
In
our example, the prefix 4::/3 has been reserved for MAB prefixes. It
is not absolutely necessary for all MABs to be in any reserved prefix
such as this, but it would simplify the functionality of ITRs and
ETRs.
In IPv4, for a map-encap system, there is no chance of
making all the MABs appear in some clearly defined subset of the
whole address space - since, over the next five to ten years, there
needs to be progressive conversion of a great deal of the whole
address space into MABs.
In IPv6, by administrative fiat, it
would be easy for the IANA to carve out two special prefixes which
would make the Ivip6 system simpler to implement. In addition to
the above-mentioned E00::/12 reservation for 2^20 CEP prefixes, in
our example, the IANA reserves 1/8 of the entire IPv6 address space
for MABs: 4::/3 . (See
www.iana.org/assignments/ipv6-address-space
for current
assignments.)
Some company D - probably, but not necessarily an
ISP or an RIR - has been assigned the MAB:
4000:0050::/24
There
could be 2 million MABs of this size in the 4::/3 reservation.
MABs
don't necessarily need to be of the same size, or have no gaps between
them.
We don't want tens of millions of MABs.
Ideally, we probably want a few dozen or at most a few hundred. Each
MAB will have its own stream of mapping updates. Each OITRD will
advertise one or more - or potentially all - active MABs.
D
rents out some of this MAB's space to Net-X and Net-Y. This rental is
effectively
permanent. Unless D goes broke (in which case the MAB would be
taken over by another company such as D and probably administered to
preserve the previous assignments), X and Y can have their space for
as long as they like.
Sidebar on fees for mapping changes
and for OITRD traffic
Both
Net-X and Net-Y pay D for their
space, such as a certain fee per year for each /64. They also pay D
for the mapping changes they make. This would probably be a charge
per update, or some flat fee for a certain number of updates per
month.
In this fast-push mapping distribution system, it is
important that end-users pay for the updates they send on the
system. The fee may be as low as a few cents per update. These fees
help pay for most of the fast-push system, especially the Launch
servers and Replicators. This occurs through company D and others
like it, who directly or indirectly pay for the operation of the
fast-push system.
The fee per update also discourages
"excessive" use - such as changing the mapping ever few seconds for
months on end - to implement fancy TE, or just to create annoyance.
Each mapping change involves a small amount of computation, storage
and communications bandwidth in the entire fast-push system and in
all recipient QSDs.
The cost will be very low, and it should
still be low enough that end-users with busy networks will find it
attractive to use frequent mapping changes to fine-tune the inbound
TE of their multiple links. The space of a network would be split
into separate micronets, each with some recipient hosts. By
dynamically changing the ETR each micronet is mapped to, the
incoming traffic volume can be managed in real-time and directed as
desired to each of the two or more ETRs and so via each of the two
or more links from the two or more ISPs.
Net-X and Net-Y
also pay D for D's operation of a global network of OITRDs which
handle packets addressed to the above-mentioned MAB, sent by hosts
in non-upgraded networks. This means that Net-X and Net-Y will
probably pay according to traffic flowing through the OITRDs which
was addressed to each end-user's micronets.
This is because one
SPI end-user network might have only a small amount of space,
perhaps just a single micronet of /64, but could run a very popular
web site on it, and so generate far more OITRD traffic than another
end-user network, which has much more space.
D would have a
sampling system to estimate OITRD traffic, it would not make sense
to count every byte.
In the following examples, ordinary IPv6
prefix notation will be used to show the base address and length of
each micronet, but in practice the micronets can start and end at
any /64 boundary.
Net-X has the micronet:
4000:0050:7000::/48
This
is 65,536 contiguous /64s:
4000:0050:7000::
to
4000:0050:7000:FFFF:FFFF:FFFF:FFFF:FFFF.
This sounds like quite
a large micronet, but it is technically valid and perhaps there will
be call for such micronets.
Net-Y's micronet has just two /64s:
4000:0050:9999:6666::/63
Micronets
and UABs can range from a single /64, in principle to as many /64s
as fit in the MAB. In this case, the /24 MAB covers 1.02 trillion
/64s.
Before depicting the passage of a packet through the Ivip6
system, we will describe
the function
of the CEP prefixes.
While
an ISP could use space within an CEP prefix for any purpose, here we
assume that all ISPs use these prefixes solely for ETRs.
Our
example involves two CEP prefixes:
CEP-0001
E000:0001::/32 ISP-A
CEP-0003
E000:0003::/32 ISP-B's "Site-2".
ISP-A advertises its
CEP-0001 from a single border router.
ISP-B advertises its
CEP-0003 from two border routers at its second site.
The BGP
system treats these CEP prefixes exactly the same as any other BGP
prefixes. (Note, ISPs must advertise these CEP prefixes intact - no
more specific prefixes within them.) All BGP routers therefore
develop and maintain best paths
for both these prefixes, and likewise for all the other CEP prefixes.
[#enhanced-rib]
Enhanced core router RIB functionality
Please also see separate page:
>
List of new functions for core routersThe new
functionality in the RIB specifically recognises this
set of
8191 or whatever CEP prefixes, due to the fact they are within the
IANA defined prefix of E00::/12.
The new RIB function is
programmed to detect each such /32 CEP prefix, and to copy its FEC
value (the internal value by which the router's FIB knows which
interface to forward the packet from) to a special array in the FIB.
This is the FLFEC[] array.
FLFEC[] is indexed 0 to 1048575.
In a practical system, it is possible that the lower half
of this be reserved for Forwarding Labels found by routers in the core
(LFC), and the top half be reserved for Label Forwarding in Edge
networks (LFE).
Each
element in FLFEC[] stores a FEC value, copied straight from the FEC
of the corresponding CEP in the RIB.
So in a given core router,
if the BGP RIB has decided that the best path towards ISP-A's
CEP-0001 is "Interface 7", then the FEC value
which
specifies "Interface 7" is copied to the location 1 in FLFEC[].
With
Ivip6, it is required that all packets being handled in the BGP
core have their Forwarding Label set according to the following rules:
Set
to 0 if the packet has not had its Forwarding Label set to a
particular
value by any ITR.
Any non-zero value is assumed by all core
routers (we
assume in this example they are all upgraded to Ivip6
functionality)
to represent the fact that this packet's
destination address is for
a micronet which is currently
mapped to some ETR address
within a particular CEP
- where this CEP's distinguishing bits 96 to
115 are
directly specified
by
the value of the Forwarding Label.
Having set the
stage, we now
provide an example packet flow, a packet sent by a host Host-A to
another host Host-B.
Host-A is on a conventional address in some
ISP's
BGP advertised prefix, or in a conventional PI space end-user
network.
Host-B is in Net-X's /48 micronet mentioned above:
4000:0050:7000::/48
Host-B's
address is:
4000:0050:7000:1234::33.
Net-X
is currently using
ISP-B's second site for Internet access, and the address of the ETR
incoming packets should be forwarded to
(via Ivip6's Forwarding
Label
in the Core system) is:
E000:0003:0000:0055::7
The
packet is sent by Host-A and forwarded by its network's internal
routing
system towards a border router, which also has ITR functions. The
ITR function recognises it as being addressed to somewhere in the
SPI (Scalable PI) address space, since all such space is defined to
be within a micronet - and since all micronets are within MABs and
all MABs within the prefix 4::/3, as is this packet's destination
address.
In our example the ITR has no cached mapping
information
for this address. A subsequent packet from Host-A to Host-B will have a
less
complex process, due to the presence of cached mapping data in the
ITR's FIB.
When the packet is analyzed by the FIB, the result is
of the form:
This packet
is addressed to a section of the
address space which is known to be covered by the Ivip6 scheme, but
the FIB currently has no mapping information for this particular
address.
Therefore,
hold the packet and query the routing processor (RIB) to ask for the
mapping information for this address. Soon (10 or 20 msec max), when
the reply arrives, the packet will have
its Forwarding Label set and then will be forwarded to a BGP router in
the
core.
Subsequent packets matching the micronet which was
specified
in the mapping reply will be handled by a faster, FIB-only, process
(next box) which sets the Forwarding Label to the same value, and again
forwards
the
packet to the core. |
This is one of
four initial responses the
FIB could produce. The other 3 are listed at
psg.com/lists/rrg/2008/msg02029.html
Briefly,
they are:
Use cached
mapping
information for this packet's destination prefix to set the Forwarding
Label, as above, before forwarding the packet to the core. |
Send the
packet conventionally, based on the normal FIB analysis of
its destination address to determine the shortest BGP advertised prefix
it is within. |
Drop
the packet or process it via via some slower and more arduous
mechanism - which is not needed for Ivip6. |
Once
in the core
the packet is handled by one or more upgraded BGP routers.
In
our example, the ITR requests mapping information for the packet's
destination address:
4000:0050:7000:1234::33
Actually,
since the mapping system's granularity is /64, the map request is
for the 64 bit value, in hex:
4000 0050 7000 1234
Within
a few tens of milliseconds, the response from the local QSD (full
database query server) comes back to the effect:
The
queried address is within the micronet:
4000:0050:7000::/48
which
is currently mapped to the ETR at:
E000:0003:0000:0055::7
Cache
this response for 600 seconds. |
The
Ivip6 section of the ITR's RIB caches this information, and processes
it into a form to be sent to the FIB:
Any incoming packet
matching:
4000:0050:7000::/48
should
have its Forwarding Label set to (hex):
0 0003
and
should then be
handled by the usual forwarding mechanism. |
The
RIB sends
this to the FIB, and by one means or another the FIB matches the
stored packet to this new rule. (600 seconds later, the RIB will
tell the FIB to delete the above rule.)
[#enhanced-fib]
Enhanced core FIB functionality Please
also see separate page:
>
List of new functions for core routers
(This
explanation is for an FIB function in the ITR, but it is the same
function as needs to be added to the FIBs of all core routers.)
Now
the packet has its
Forwarding Label set to (hex) 0 0003 and the FIB's forwarding mechanism
(enhanced to do this Ivip6 additional function) looks at its Forwarding
Label, discovers it is 3, and uses this to index into the array
FLFEC[].
This produces the correct FEC value for this packet -
the number which will cause it to be sent out the interface which
leads to the BGP router which is the best path towards the prefix in
which the ETR is located.
Once it reaches that core router,
the
same process happens:
Forwarding Label ==
0?
Yes: Use ordinary FIB process to analyze destination
address
until the longest matching prefix is
found. Then use
that information to look up
the FEC for this prefix.
No: Use the Forwarding Label to
index into FLFEC[] to retrieve
the FEC.
Forward according to
this FEC value.
This process is repeated for as many core
routers
which the packet is forwarded to, until it reaches a BGP router at
the border of the provider network in which the ETR is located.
This is the FLPER (Forwarding Label Path Exit Router).
This
will be faster and simpler than the usual process of
each core router analysing
up to 48 bits in the destination address with the Tree-Bitmap
algorithm.
In this way, as long as the packet is handled by an
upgraded core router, it will be forwarded towards one of the border
routers of ISP-B's Site-2.
Note that the packet does *not*
contain any address which refers to the CEP prefix advertised by
ISP-B's
Site-2:
CEP-0003 E000:0003::/32
The
Forwarding Label was
set just once by the ITR in the source site. Once set, the packet is
easily handled by (upgraded) core routers and is forwarded towards
whichever one or more core routers advertise this prefix.
When
the packet
reaches the border router for ISP-B's Site-2, this FLPER router
performs a
somewhat different operation. It recognises that the value in
the Forwarding Label (hex 0 0003) matches to the above CEP
prefix, which this router advertises. So it zeroes the Forwarding
Label and presents the packet to its normal FIB process. In this
example, ISP-B's internal routing system has a route which covers the
destination address of the packet.
The FLPER router sets the
Forwarding Label to 0.
The standard FIB function of this
border
router and of any other internal routers now forwards the packet to the
end-user network.
So in this example, there was no actual "ETR"
function, other perhaps than the FLPER recognizing the Forwarding Label
should now be set to 0.
If ISP-B's internal routing system did
not handle the end-user network's prefix, then something else needs to
happen. This is discussed above in the section
FLPER
section
Transition:
non-upgraded networks
The
task of this transition arrangement is to ensure that packets sent
by hosts in networks without ITRs are all forwarded to an OITRD,
where they can have their mapping looked up and their Forwarding Label
set
appropriately. The same principles which apply to Ivip OITRDs apply
also to Ivip6 OITRDs:
OITRDs should be distributed widely around
the Net.
They should be able to handle peak packet rates without
unreasonable
losses.
Their locations should try to minimize the packet taking
an overall longest path than it would without Ivip6.
They
will be paid for by the organisations who rent micronet space to
end-users. See:
#business-oitrd .
Transition:
non-upgraded core routers
Ivip6 is
only going to be useful once a substantial number, probably a
majority, of BGP core routers have the Ivip6 upgrades. This is a
significant hurdle for deployment, although perhaps tunneling could
be used between upgraded routers initially when only a few DFZ routers
are upgraded. (This would raise packet length and PMTUD problems.)
It
will probably be many years before IPv6 usage is so high that a
scalable routing and addressing solution needs to be deployed - plenty
of time for new routers to have the extra functions and for many older
ones to be upgraded with firmware.
Ideally there
needs to be a way the system can work reliably even when some
percentage
of routers are not upgraded - such as 10% or less.
Non-upgraded
routers in the core and in edge networks are fine provided there are no
ITRs or ETRs located behind them.
The most
important thing to ensure is that each upgraded BGP router, including
the border routers, never forwards to any non-upgraded router a
packet which has its Forwarding Label set. The non-upgraded router
would ignore the Forwarding Label, and do a standard BGP FIB operation
on the destination address.
This undesirable situation would
result in the packet being forwarded towards the nearest OITRD which
is advertising the MAB which encloses the destination address.
There, according to the above algorithms, the packet will have its
Forwarding Label set again to the same value it already has, and that
Forwarding
Label will be used to forward it to a router which should take it
towards the network which has the ETR.
The packet could
easily get into a loop and so be dropped, as its TTL reaches
zero.
There are probably better ways of ensuring packets with a
non-zero Forwarding Label are only sent to upgraded core routers,
but some techniques
to protect against this might include:
Manually configure every
upgraded BGP router not to accept routes matching E00::/12 from
neighbours which are not upgraded.
and/or:
Manually
configure every non-upgraded BGP router not to accept (and therefore
not to offer) any routes to any neighbours if they match E00::/12.
There
may still be problems with not enough upgraded routers in a particular
part of the core to handle the Forwarding Label forwarding of packets.
PMTUD
This
proposal is at a
very early stage of development, but it seems that there are
no PMTUD problems with this approach.
The fact that packets do
not get any longer is a major benefit compared with map-encap
systems. Solving those problems, including
making the best use of
jumboframe paths in the DFZ, is quite challenging:
Assuming
the TTL value is still decremented every time the packet is handled
by a router, Traceroute should still work fine through the entire
path, including the section where forwarding is controlled by the
Forwarding Label.
At any router in this part of the path, if the
packet is too long for the next-hop MTU, the router should be able
to send a Packet Too Big (PTB) message to the sending host.
There
is a potential gotcha here:
Would
the sending host recognise the PTB message, if it comes back with a
copy of the start of the too-long packet with a value in what the
host may regard as the "Forwarding Label" which is different from the
all
zeroes value the packet was sent with?
There is nothing
in
RFC1981 specifying
how fussy the sending host should be before accepting a PTB message as
valid. The PTB message itself (
RFC1985) sends back a
large slab of the packet which gave rise to it: "As much of invoking
packet as will fit without the ICMPv6 packet exceeding 576 octets."
Since the IPv6 header is 40 bytes and the ICMPv6 PTB header is 8
bytes, up to 528 bytes will be returned.
To Do: try to figure
out how fussy current IPv6 implementations are about accepting PTBs.
In the longish timescale we have for deploying an IPv6 scalable
routing solution, there is plenty of scope for altering host OS code to
cope with a packet fragment containing a different "Flow Label" to that
in the original packet.
A similar problem applies to the other
ICMPv6 error messages: Destination Unreachable, Time Exceeded and
Parameter Problem. The packet fragment which the host needs to
check carefully against its record of packets recently sent may have a
different "Flow Label" value to what was sent. Hosts need to be
fussy about accepting these messages, to prevent against spoofed
packets (with values guessed by the attacker, who is assumed not to be
in the path of the outgoing packets) causing a DoS problem.
A
messy
workaround, rather than changing fussy host software, would be for the
complaining router to send the PTB with a copy of the original packet,
but with the Forwarding Label bits (the "Flow Label" bits as far as old
host software is concerned) set to 0.
Then what if the
sending host included an ITRH function and set the Forwarding Label
bits to some non-zero value? Hopefully the rest of that host
software would be upgraded properly and so be wise enough to the
new use of these bits to ignore
them when checking the ICMPv6 Packet Too Big messages copy of the
initial part of the packet.
This is
a major advantage over map-encap schemes, where the source address
of the too-big packet may be that of the ITR (not with Ivip, which uses
the sending host's
address) and where the too-big packet is longer than and different
from the packet sent by the sending host - resulting in any PTB
message not being recognised by the sending host.
Any
translation scheme (Six/One Router is the only one so far) would have
serious difficulties with PMTUD in the translated part of the path,
since the packet has different addresses to those it had when it
left the sending host. So even if the PTB was somehow sent back to
that host, a properly implemented PMTUD system on that host would fail
to recognise the PTB as relating to any packet this host sent.
TTR
Mobility
Any map-encap scheme,
and Ivip in
particular, can support a global mobility scheme with
highly attractive characteristics. A paper on this will appear soon.
For now, the descriptive material at:
describes
the Translating Tunnel Router approach to extending a map-encap
scheme for mobility.
It is not necessary to change the mapping
every time the mobile node gets a new care-of address. Typically a
mapping change, to select a new TTR, is only required when the
care-of-address moves more than about 1000km or so from wherever the
current TTR is.
The TTR principles should apply in general to a
system such as Ivip6. Instead of tunneling packets across the DFZ
to the ETR-like TTR, they would be forwarded according to the
Forwarding Label.
However, the Forwarding Label approach
won't work taking
packets to and from the mobile node and the TTR. So tunneling should
be used for
this, as described in the above-mentioned material.
This
raises some PMTUD problems. Fortunately, the TTR <--> MN tunnel
technology is not related at all to the map-encap scheme or to the
Ivip6 system, and can be negotiated at set-up time between the TTR
and MN. This means that there does not need to be a single fixed
technology for this tunneling, enabling a variety of techniques,
innovation, and more localized potential solutions to PMTUD.
Typically,
the two-way tunnel between the TTR and the MN (actually, the MN can
make tunnels multiple TTRs) to will be two-way and use the
same techniques as
encrypted VPNs. These two-way tunnels are a lot easier to handle PMTUD
over than the so-called tunnels from an ITR to an ETR in a map-encap
system, where an ITR has to
get packets to an ETR which it has had no prior contact with and
with which it cannot reasonably engage in extensive communications.
The
type of tunnel used will naturally cause PTB messages to be sent
to the TTR or the MN. The MN can modify its packet size
accordingly.
The TTR has two choices. It can either fragment the long
packets it
is getting from one or more hosts via one or more ITRs or it can send
back a PTB message for any packets which exceed the length of the
tunnel MTU, once encapsulated. Any PTB message the TTR generates
to
packets arriving via ITRs using Label Forwarding in the core will
naturally go straight back to the sending host.
.