Responding to a critique of part of my Quick Introduction . . .

To the main Ivip page

To the part of the Quick Introduction this refers to

Robin Whittle  rw@firstpr.com.au  Last update: 2007-07-28

Please be sure to read the errata/ page, particularly concerning ICMP messages, "inner SA = outer SA" and the problem of MTU limits with tunneled packets and Path MTU Discovery.

This paragraph was criticised as containing factual errors, particularly the notion of the database being "massive" and the ITRs' communications being "complex".
Note regarding critique:

"Now, your gross generalization of LISP and NERD/CONS and eFIT-APT's approach is actually factually incorrect.  "this massive database" is a subjective term that has not factual support. CONS and APT differ greatly in how they handle restoral. In NERD 500k mappings at 50bytes each is 256meg.  Similarly 'complex communications and decisions regarding which ETR' is also subjective.  What you categorize as 'complex' in the ITR is actually merely selecting a value based on a priority and weight metric inside a short list."

Please see my longer response below this box, which largely concerns how complex it is for ITRs in LISP-NERD. LISP-CONS and eFIT-APT to reliably handle ICMP Destination Unreachable messages concerning reachability and Path MTU Discovery.

"The basic issue is that "adding Multi-homing service restoration" to the mapping database makes the mapping database have the same rate/change characteristics as BGP today.  The core idea of an external mapping database is that site subscription rate/change is different than site reachability rate/change.  The data-plane is actually the best place to trigger service restoral, and the mapping control plane is simple and relatively static."
I understand from this that the underlying purpose is to keep the change in mapping information slow, while allowing fast service restoration.  I stand by my statement about a more complex and larger database being required to support the ITR making these real-time decisions about multihoming service restoration.

I agree that the LISP/eFIT-APT approach of delegating restoration responsibility to ITRs enables the mapping data itself to be rather static, but I don't regard the resulting "control plane" as being simple.

I think the complexity or simplicity of the "control plane" of the new mapping architecture referred to in the last sentence of the quoted critique has several aspects, including:
  1. The complexity and size of the mapping data, which is greater than with Ivip, since it must encode multiple RLOCs, Priorities and Weights.
  2. The complexity of the mapping distribution system.  This is definitely complex and ambitious for LISP-CONS or Ivip.  LISP-NERD is conceptually simple, but to get the same throughput for all those ITRs, even when they copy update data from each other, would be pretty demanding, depending of course on how many EIDs there were, and how often the mapping changed. 

    LISP's and eFIT-APT's mapping is more complex and intended not to need changing as fast as Ivip's.

    eFIT-APT simply sends the mapping data along BGP.  This is conceptually simple, but I am not sure it is practical or desirable, since we are trying to take the load off BGP's control plane.  
  3. The complexity of ITR functionality and of the communications between ITRs and ETRs.  In this respect, Ivip is vastly simpler than the other proposals.



The following is a slightly edited version of a message to the RAM list on 2007-07-28 Australian time, which is about the time the RRG meeting starts in Chicago on 2007-07-27.
This message mainly concerns the difficulty which I think a
LISP-NERD, LISP-CONS or eFIT-APT ITR (and eFIT-APT Default Mapper)
would have handling ICMP DU messages regarding reachability and
Path MTU Discovery.

The only way I can imagine these being handled securely seems to
involve an impractically large amount of state and processing.

None of these difficulties apply to Ivip ITRs because they are not
concerned with reachability and because they don't get ICMP DU
code 4 messages if a tunneled packet with DF set is too long for a
router.
Please be sure to read the errata/ page, particularly concerning ICMP messages, "inner SA = outer SA" and the problem of MTU limits with tunneled packets and Path MTU Discovery.


I respond to the "massive" critique at the end of this message,
since I want to focus on what LISP or eFIT-APT ITRs must do in
order to securely determine reachability and in order to support
Path MTU Discovery.

Ivip ITRs don't do any of this. They don't test for reachability,
they don't make decisions to tunnel to one ETR or another, and
they are not involved in the ICMP Destination Unreachable (DU)
messages generated by routers handling tunneled packets, because
Ivip's tunneled packets have the sending host's address as the
source address. With LISP and with eFIT-APT, the source address
is that of the ITR, so it gets the ICMP DU messages.

eFIT-APT definitely does have complex communications between the
ITR and its Default Mapper, and with the destination ETR and its
Default Mapper potentially sending messages to both the ITR and
its Default Mapper. This is in addition to both ITR and Default
Mappers (which are themselves ITRs) performing the complex
operations I describe below which are necessary, as they are for
LISP, to reliably accept ICMP messages regarding reachability and
fragmentation.


LISP/eFIT-APT ITRs (and Default Mappers for eFIT-APT) will
legitimately get ICMP DU messages under at least the following
conditions. There may be more situations and more codes.

For simplicity I will ignore the codes in RFC 1122 and just work
with those in:

http://tools.ietf.org/html/rfc792

ICMP Fields:
Type 3

Code 0 = net unreachable;
1 = host unreachable;
2 = protocol unreachable;
3 = port unreachable;

4 = fragmentation needed and DF set;

5 = source route failed.

The ITR will legitimately get codes 0, 1, 2, 3 or 5 when one of
the tunneled packets it sent fails to reach its destination ETR
for some reason.

(This is not a reliable way of testing the reachability of the
ETR, since it maybe the failure to reach it doesn't generate an
ICMP DU message, or maybe that message is filtered or lost. Ivip
would rely on the user having their own multihoming monitoring
system - or paying for someone else's system to do the job - to
detect reachability problems and then change the mapping of their
prefix so the packets are tunneled to another ETR. That system
could involve cryptographically secured challenge and response
exchanges which don't rely on ICMP at all.)

I understand in LISP and eFIT-APT that ETRs are intended to
determine the reachability of their one or more destination
hosts/routers (the devices to which the send packets after
decapsulating them). An ETR might have multiple such destination
hosts, and when it receives from an ITR an encapsulated packet
which is addressed to a destination host which is unreachable, I
think the ETR is supposed to send some kind of ICMP DU message
back to the ITR. This would probably be code 0,1 or 5 - and not 4.

Since reachability is so crucial to the functioning of LISP and
eFIT-APT, it is vital that the ITRs (and also the Default Mappers
for eFIT-APT) reliably reject the attempts of attackers to spoof
ICMP DU messages. Otherwise, a single spoofed ICMP DU message
could clobber a stream of packets to an ETR, making it a handy DoS
technique.

In order for the ITR to be able to distinguish between genuine
ICMP packets and those spoofed by an attacker, it needs to retain
the following items of state, for each packet it tunnels.

I will call this a "Tunneled Packet Record".

1 - UDP Source Port: 16 bits. (Assuming this changes - maybe it
doesn't or is not regarded as necessary for testing ICMP
messages, but I will assume it is necessary.)

2 - UDP Length: 16 bits.

3 - UDP Checksum: 16 bits.

4 - Tunnel Endpoint: The ETR's address = the destination address
of the outer packet: 32 bits for IPv4 or 128 bits for IPv6.

So the state which needs to be stored for each tunneled packet is:

IPv4: 18 bytes
IPv6: 54 bytes

I will assume it needs to keep these for a second, since a genuine
ICMP packet would probably arrive within a second.

If this ITR is handling 1Gbps of encapsulated packets, of 100
bytes each, this is a million items of state, as just described,
being stored and garbage collected every second. I think this is
impractical.

When an ICMP DU message arrives, the router's CPU needs to
determine whether it is genuine or not. It can't do this with
absolute certainty - since an attacker can easily guess a Packet
Length, and perhaps a Source Port, and then can try random numbers
for the Checksum. This is my understanding of what the ITR must do.

The ICMP message contains the IP header (however long that is for
IPv4 or IPv6) and then the 8 bytes which follow copied from the
packet which gave a router cause to generate the message.

Now that LISP has adopted UDP encapsulation - and so as eFIT-APT,
since it simply cites the LISP I-D as a reference for
encapsulation - these 8 bytes in the ICMP message consist purely
of the UDP header which follows the IP header of the tunneled
packet. I will call this the ICMP-8 words:

UDP Source Port UDP Dest Port
UDP Length UDP Checksum

To pass the test, the ICMP message must pass the following tests:

a - Have a Source Address (SA) matching one of the Tunneled Packet
    Records' "Tunnel Endpoint".

b - Have the ICMP-8 UDP Dest Port equal 4342.

c - Have the ICMP-8 UDP Source Port equal the "UDP Source Port"
    of one of more of the Tunneled Packet Records which passed
    test (a).

d - Have the ICMP-8 UDP Length equal the "UDP Length" of one
    of more of the Tunneled Packet Records which passed tests
    (a) and (c).

e - Have the ICMP-8 UDP Checksum equal the "UDP Checksum" of one
    of more of the Tunneled Packet Records which passed tests
    (a), (c) and (d).

This would be a pain if there were a few dozen Tunneled Packet
Records, but there could be a million of them at any one time. I
think this is not at all practical for any high-speed ITR.

Anything less than this will render the ITR an easy target a
single spoofed ICMP packet. A few such spoofed packets, with
source addresses matching all the ETRs the ITR has in the mapping
data for a particular EID, will mean that the ITR will not tunnel
packets addressed to this EID, because it considers all the RLOC
addresses in the mapping data unreachable.

(BTW, I assume that the ITR will periodically send test packets to
the ETR in order to see if it is reachable again. If so, then the
procedures for testing the validity of ICMP DU messages need to
first check to see if it was a response to one of these probes.

Such probing is another instance of the complexity of ITR
functionality and communications in LISP/eFIT-APT.)


Another thing which (I think) LISP/eFIT-APT ITRs must support is
Path MTU Discovery.

Please be sure to read the errata/ page, particularly concerning ICMP messages, "inner SA = outer SA" and the problem of MTU limits with tunneled packets and Path MTU Discovery.

When the ITR receives an ICMP DU message with
code 4, it must perform the above series of tests on it and then
some further operations, which requires two more items of state to
be kept for each tunneled packet:

5 - IP Header of the original packet. 20 bytes for IPv4 or
40 bytes for IPv6.

6 - 8 bytes following that header from the original packet.

This brings the total size for each "Tunneled Packet Record" to:

IPv4: 46 bytes
IPv6: 102 bytes

It would be possible to reduce the saving of state by not
bothering to save items 5 and 6 above except for those packets
with with DF (Don't Fragment) set to 1, since when this DF bit is
copied to the outer encapsulating header, it is only these
tunneled packets which should generate an ICMP DU code 4 message.

This requires the ITR to store state at about 5 to 20% of the
volume of bits it is tunneling in packets - and to garbage collect
it about a second later. This seems really impractical.


Once an ICMP DU code 4 packet passes the first set of tests, then
from whichever Tunneled Packet Record which it matches, the ITR
reads the Sending Host's Address and creates a new ICMP DU code 4
message, setting its Destination Address to the Source Address of
the IP Header from Tunneled Packet Record, its Source Address to
the Destination Address of the this IP Header, including the
entire IP Header and the "8 bytes following" in the packet and
calculating its checksum.

I regard this as "complex" ITR functionality!


Regarding the database being "massive", I wrote this in the
expectation that any new architecture will eventually handle
hundreds of millions or maybe more than a billion different "EIDs"
(to use LISP terminology).

The critique was that a mapping database covering 500,000 EIDs and
requiring 50 bytes for each is not "massive". I agree this is not
"massive", but I foresee a much wider application of the new
architecture than simply multihoming for a few more times the
number of AS-end-users as we have today.

If a LISP or an eFIT-APT system was built, instead of something
with higher speed and broader ambitions such as Ivip, then I
believe it wouldn't be long before people started making proposals
to speed it up - to make it be more like Ivip so it could support
mobility and user's own direct real-time control of multihoming
service restoration, rather than entrusting those decisions, and
the difficult business of reachability testing, to all the ITRs in
the Net, each acting alone.

So I stand by my characterisation of the mapping database of any
successful new architecture as being "massive".

For a given number of end-user mappings, Ivip's would be less
massive, since each entry in the database consists of a starting
address and a range (more flexible than a prefix length) to
specify the Ivip-mapped addresses with a particular mapping -
followed by the mapping data which consists a single IP address.

With LISP or eFIT-APT, the mapping data consists of multiple RLOC
addresses, each with a Weight and a Priority - and perhaps some
other stuff.

So I stand by what I wrote about the contrast between Ivip and the
other proposals in that Ivip has a simple database which needs a
rapid response time from the user altering the mapping to the ITRs
following the new mapping, while the others assume this can't be
done, and so make the database more complex and voluminous, in
order to give the ITRs the information they need to make their own
multihoming service restoration decisions.

- Robin