TOC 
Please be sure to read the errata/ page, particularly concerning ICMP messages, "inner SA = outer SA" and the problem of MTU limits with tunneled packets and Path MTU Discovery.

 

Network Working GroupR. Whittle
Internet-DraftFirst Principles
Intended status: ExperimentalJuly 15, 2007
Expires: January 16, 2008 


Ivip (Internet Vastly Improved Plumbing) Architecture
draft-whittle-ivip-arch-00.txt

Status of this Memo

By submitting this Internet-Draft, each author represents that any applicable patent or other IPR claims of which he or she is aware have been or will be disclosed, and any of which he or she becomes aware will be disclosed, in accordance with Section 6 of BCP 79.

Internet-Drafts are working documents of the Internet Engineering Task Force (IETF), its areas, and its working groups. Note that other groups may also distribute working documents as Internet-Drafts.

Internet-Drafts are draft documents valid for a maximum of six months and may be updated, replaced, or obsoleted by other documents at any time. It is inappropriate to use Internet-Drafts as reference material or to cite them other than as “work in progress.”

The list of current Internet-Drafts can be accessed at http://www.ietf.org/ietf/1id-abstracts.txt.

The list of Internet-Draft Shadow Directories can be accessed at http://www.ietf.org/shadow.html.

This Internet-Draft will expire on January 16, 2008.

Copyright Notice

Copyright © The IETF Trust (2007).

Abstract

Ivip (Internet Vastly Improved Plumbing) is a proposed global system of routers and either collection of databases which control the tunneling of some of these routers. Database changes affect all Ingress Tunnel Routers (ITRs) within a few seconds, controlling which Egress Tunnel Router (ETR) they tunnel each packet to, depending on the packet's destination address. The ETR used by a host with an Ivip-mapped address is typically located in the same network as this destination host. The ETR decapsulates packets and forwards them to the destination host. A second type of ETR known as a Translating Tunnel Router (TTR) is used for mobile-IP, with the mobile node creating two-way tunnels to one or more nearby TTRs. Ivip enables a subset of IPv4 and IPv6 address space to be portable (used via any ISP which has an ETR) and to be suitable for multihoming (connection to the Net via two or more ISPs) - without involving BGP and without requiring any changes to host operating systems or applications. This is a form of "locator-ID separation" and is based on some principles derived from LISP (Locator/ID Separation Protocol). IP addresses in the subset of address space which is subject to being tunneled by ITRs are known as Destination Identifiers (DIDs). ITRs and ETRs are located on ordinary BGP Reachable IP (BRIP) addresses. The databases and ITRs map DID addresses to an ETR's BRIP address with a granularity of a single IPv4 address or a /64 prefix for IPv6. These two granularities are 256 and 64k times finer than is typically possible with BGP. This proposal is intended to resolve many of the problems discussed in the October 2006 Amsterdam IAB Routing and Addressing Workshop (RAWS). Ivip's primary goals include the more efficient utilisation of IPv4 space and enabling millions of end-users to achieve portability and multihoming without involving BGP, without fuelling the growth of the global BGP routing table, and without requiring these end users to have ASNs or to acquire conventional prefixes of PI (Provider Independent) BGP reachable address space.



Table of Contents

1.  Introduction
    1.1.  Brainstorming phase
    1.2.  Postal redirection analogy
    1.3.  LISP and ID/LOC separation
    1.4.  One way tunnels to a single ETR
    1.5.  Anycast ITRs
    1.6.  Types of ETR
    1.7.  Types of ITR
        1.7.1.  ITRD - full database (push)
        1.7.2.  ITRC - query, cache (pull) and notify
        1.7.3.  ITFH - Ingress Tunnel Function in Host
    1.8.  Initial deployment
        1.8.1.  Paths taken by packets
        1.8.2.  Multihoming when both links are working
        1.8.3.  External multihoming monitoring system
        1.8.4.  Multihoming after a link fails
        1.8.5.  Potential problems with internal routing systems
    1.9.  Ivip's intended benefits
    1.10.  Long term deployment
2.  Definition of Terms, Concepts and Functions
    2.1.  IMIP - Ivip-Mapped IP address
    2.2.  NIMIP - Non-Ivip-mapped IP address
    2.3.  BRIP - BGP Reachable IP address
    2.4.  UAIP - Un-Advertised IP address
    2.5.  DID - Destination Identifier
    2.6.  TELOC - Tunnel Endpoint Locator
    2.7.  IMAB - Ivip-Mapped Address Block
    2.8.  IMAB-DB - IMAB DataBase
    2.9.  IMAB-DBD - IMAB DataBase Dump
    2.10.  UMUC - User Mapping Update Command
    2.11.  SUMUC - Signed User Mapping Update Command
    2.12.  SH/SN - Sending Host/Node
    2.13.  RH/RN - Receiving Host/Node
    2.14.  IRH/IRN - Ivip-mapped Receiving Host/Node
    2.15.  MH/MN - Mobile Host/Node
    2.16.  UAS - Update Authorisation System
    2.17.  RUAS - Root Update Authorisation System
    2.18.  US-IMAB - Update Stream specific to one IMAB
    2.19.  US-Complete - Update Stream for the Complete Ivip system
    2.20.  Replicator
    2.21.  QSD - Query Server with full Database
    2.22.  QSC - Query Server with Cache
    2.23.  ITR - Ingress Tunnel Router
    2.24.  ITRD - Ingress Tunnel Router with Database
    2.25.  ITRC - Ingress Tunnel Router with Cache
    2.26.  ITFH - Ingress Tunneling Function in Host
    2.27.  ETR - Egress Tunnel Router
    2.28.  ETFH - Egress Tunnel Function in Host
    2.29.  TTR - Translating Tunnel Router for Mobile-IP
3.  The Crisis in Routing and Addressing
    3.1.  Interrelated needs and problems
    3.2.  Constraints on possible solutions
4.  Potential Solutions
5.  Comparison with LISP
    5.1.  LISP principles and mechanisms used by Ivip
    5.2.  LISP principles and mechanisms not used by Ivip
    5.3.  Additional principles and mechanisms in Ivip
6.  Ivip's goals, non-goals and challenges
7.  User Interface and Update Authorities
8.  Replicators
9.  Query Servers - QSD and QSC
10.  Ingress Tunnel (ITR) strategies
11.  Egress Tunnel (ETR) strategies
12.  Mobile-IP with TTRs
13.  IPv6 and longer term strategies
14.  Loose ends
    14.1.  ETRs checking src & dest addresses
        14.1.1.  Short version
        14.1.2.  ITR tunneled packet with source address of sending host
    14.2.  Scaling the Replicator network
    14.3.  Is fast, secure, Replication possible on the Internet?
    14.4.  TTRs and Mobility
15.  Security Considerations
16.  IANA Considerations
17.  Informative References
Appendix A.  Acknowledgements
Appendix B.  The Ivip acronym
§  Author's Address
§  Intellectual Property and Copyright Statements




 TOC 

1.  Introduction



 TOC 

1.1.  Brainstorming phase

The purpose of this Internet Draft is to contribute to the development of one or more proposals to resolve the problems of what might be called the Crisis in Routing and Addressing. Ivip is one proposal among potentially many. Ivip is at an early stage of development and this I-D is part of what I regard as a brainstorming effort on the RAM mailing list. Consequently this I-D contains more exploratory and speculative material than the architectural RFC it may one day become. Most of the discussion below focuses on IPv4 except where noted. The goal is to develop a practical, elegant, incrementally deployable model of Ivip for IPv4. Once a promising model for IPv4 has been developed, full consideration should be given to IPv6 and to what degree these two separate Ivip networks might be integrated. Consideration should also be given to how a globally deployed Ivip system might support IPv6 and the scalable tunneling of IPv6 traffic over IPv4 and vice-versa.

The remainder of this long Introduction is intended primarily for readers - such as members of the RAM list - who are already familiar with RAWS, LISP and with other proposals, including especially with the limitations of BGP routers which are the primary reason why the Internet needs a new routing and addressing architecture. Following this Introduction is a section describing in detail Ivip's major Terms, Concepts and Functions.

The three sections following this provide a fuller grounding for readers who are new to this field, introducing the RAWS report on the Crisis, other solutions, and a comparison with LISP. Once these sections have been read, the Introduction should make more sense to readers who were not yet familiar with these.

Following these are sections which contain further discussion and diagrams regarding various deployment scenarios and about how the ITR, ETR, TTR, Replicators and Query Server functions of Ivip can be implemented in conventional routers, in servers, and in some cases within hosts.

Finally, in the "Loose ends" section, is some material which I don't have time to refine and integrate smoothly into this version of the draft. This includes a section on ensuring ETRs are not a backdoor around security arrangements which prevent attackers sending packets with spoofed source addresses. There is also a section which questions whether the Internet itself is a suitable basis for building the fast, secure, high-volume system of RUAS servers and Replicators. Even if it was secured with cryptographic techniques, it would still be vulnerable to DoS attacks from botnets. The last "Loose ends" section describes Translating Tunnel Routers and how they may be used with Ivip's ITR system to provide much more efficient and flexible Mobile IP connectivity than is possible with current techniques.

While I believe the simple ITR and ETR behavior of Ivip is both satisfactory and superior to the more complex approach of LISP (although perhaps LISP 3 will involve simpler arrangements than described for 1 or 1.5 in the current LISP-01 I-D) - I don't feel I have a robust enough approach to pushing the mapping data out to ITRDs and QSDs all over the Net. If that problem can be solved, I think Ivip has a reasonable chance of satisfying the criteria set forth in the RRG's Design Goals for Scalable Internet Routing [I‑D.irtf‑rrg‑design‑goals‑01] (Li, T., “Design Goals for Scalable Internet Routing,” July 2007.).

Grave problems will arise if no suitable new architectural solution is found to the Internet's problems in routing and addressing. Ivip is intended to facilitate a much finer splitting of IPv4 address space than BGP allows - and therefore a much greater utilisation of this space than is currently possible. Ivip is also intended to provide a better approach to IP address portability and multihoming so that fewer end-users will want to gain conventional PI address space and further burden DFZ (Default Free Zone) routers with additions to the global BGP routing table.

The iplane.cs.washington.edu project indicates there are approximately 63,000 BGP routers. (Lists of alias clusters in [iPlane] (, “iPlane Datasets,” July 2007.).) Most of these will be transit and multihomed border routers. The remainder are singlehomed border routers. Transit and multihomed border routers are in the DFZ, and so need to develop a separate routing rule for each of the 220,000 or so prefixes which are advertised in the global BGP system. Every DFZ router needs to communicate with each of its peers about each of these prefixes, with messages about each prefix typically propagating across the entire BGP system. Iljitsch van Beijnum estimates that each prefix for each peer consumes between 60 and 240 bytes of router memory - and some routers have dozens of peers. [van‑Beijnum‑BGP] (van Beijnum, I., “Encoding routing information in bitmaps,” August 2001.) Problems with the load this places on routers, and difficulties with the stability of the whole BGP system, are the most serious and growing problem at present - and threaten to make many of these (probably) 50,000+ routers obsolete as the number of BGP routes grows.

The size of these problems means that considerable resources can justifiably be devoted to introducing a new system. So while the problems to be overcome are daunting, the author of any such proposal can invoke the expenditure of millions of dollars of resources with ease, since other competing proposals will involve similar expenditures and since inaction would result in far higher expenditures still. However, a successful proposal must be not only the most promising of the alternatives, but must also be incrementally deployable. As Noel Chiappa wrote on the RRG list on 2007 July 13:

"That is *the* problem in Internet engineering these days. Any old fool (well, sort of :-) can design a better network, or a jet airplane; but it takes a real genius to figure out how to turn a fabric biplane into a jet while it's flying! :-)

Ivip requires no changes to host operating systems or applications. Nor does it require changes to the BGP routing system. Ivip requires new functionality within, or closely connected to, some existing BGP and internal routers. The intention is that this can be implemented with firmware and/or configuration changes. In principle, the entire Ivip system could be introduced by adding specially programmed servers, with only configuration changes to the existing routers. However the most likely deployment scenarios involve additional router functionality, as well as the creation of some globally coordinated networks of servers.

Ivip ITR and ETR behavior is relatively simple. The real challenges are in allowing end-users to securely control their part of the mapping database, getting the database information to the ITRs quickly and securely, implementing the ITR functions efficiently (including in servers and sending hosts rather than routers), in ensuring ETRs can't be used to circumvent security measures - while ensuring that some networks will want to implement Ivip even when few people use it or know what it is.

Please use and adapt these ideas for your own proposals and suggest any improvements which could be made to this I-D, which was prepared in a hurry. I intend to create a better version 01 in mid to late August. In the meantime, please discuss this I-D on the RAM list - http://www1.ietf.org/mailman/listinfo/ram - or via private email. It is possible that discussions will be redirected to the RRG (IRTF Routing Research Group) list: http://www.irtf.org/charter?gtype=rg&group=rrg . I will attempt to list bug-fixes and planned improvements to this I-D at http://www.firstpr.com.au/ip/ivip/ .



 TOC 

1.2.  Postal redirection analogy

A simple and reasonably instructive analogy to Ivip is the Post Office's mail redirection system. Letters addressed to an original home address are redirected from the original destination's post office with a sticker (or within a new envelope) to a new address, which typically involves them being delivered via a second post office.

This often involves sub-optimal path lengths, for instance a letter sent from Boston to an original address in San Francisco being redirected to Manhattan. Optimal paths could be achieved - at a very high cost - if every sorting office recognised letters with redirected destination addresses, so the letter was redirected at its first point of contact with the sorting and forwarding system. Ivip does not involve every router being able to do this, but uses a subset of routers with additional ITR (Ingress Tunnel Router) functionality. ITRs recognise packets which need to be redirected. They encapsulate and tunnel packets to another router (an Egress Tunnel Router - ETR), using an address gained from a global databases for this particular block of Ivip-mapped address space.

Ivip doesn't encapsulate and tunnel packets which, in the postal analogy, were addressed to ordinary addresses in streets which physically exist. A postal system which is closely analogous to Ivip would redirect every letter with a destination address in one of multiple new artificial streets or towns, which have no physical existence. The Post Office would create multiple "streets" such as Twenty-seventh Virtual St in Virtualville. It then assigns, for years or indefinitely, numbers in such streets to individuals, families and organisations. A subset of sorting houses, through which every letter must pass before reaching a delivery office, would recognise every letter addressed to a street in Virtualville. For each such letter, using a central database, these specially upgraded sorting offices would place the letter in an envelope addressed to one of the Post Office's delivery offices. The database query consists of the full Virtualville address. The response consists simply of the postal address of whichever delivery office can best deliver the letter to its proper recipient. Whenever the proper recipient moves to a new locality, they use a username and password system via the Web, or via a post office, to update the central database so their letters will be redirected to the delivery office in the new locality. When the encapsulated letter reaches the delivery office, the sticker or outer envelope is removed and the office has local knowledge of how to deliver the letter to its intended recipient.

With Mobile IP and existing postal redirection systems, the destination typically has a physical "care-of" address (although the Post Office's "post restante" service does not require an address, just identification when picking up mail from a post office). With Ivip, the destination need not have any other IP address than its own Ivip-mapped IP address. In the postal analogy, the delivery office delivers packets to the correct recipient, which is not necessarily a house with an ordinary street address. In neither Ivip nor the new Virtualville postal routing and addressing architecture does the system specify exactly how the final router or delivery office should forward the packet or letter to its proper recipient. In all cases, however, the destination does have the Ivip-mapped address, or has the full Virtualville address emblazoned on it in some manner.

In the postal analogy, due to anti-terrorist security measures, every initial sorting office which processes letters posted in a locality, will not forward any letters which come from an unrecognised address. The local system needs to recognise that letters with particular, previously locally registered, Virtualville sender addresses should be delivered normally, and that letters with sender addresses from other Virtualville streets and numbers, or from any address outside the local area, should be quarantined in a safe location where they will await scrutiny by the Office of Homeland Security.

Packets sent from hosts with Ivip-mapped addresses only need to pass muster in respect of their source address being locally recognised. They don't require any special delivery system, unless of course the destination address is Ivip-mapped too, in which case the packet will be forwarded to an ITR, which tunnels it to an ETR which forwards it to the destination host.



 TOC 

1.3.  LISP and ID/LOC separation

Many proposals have been made regarding additional protocol layers to take the place of IP addresses, which are widely regarded as performing two functions: identifying the end-point of a communication and specifying, as part of the address, information about where the end-point is located. The primary goal of all these proposals is that upper layer protocols would work with the identifier, and continue a communication session with the end-point, even when it becomes accessible via a different locator - for instance due to end-point mobility or switching from one provider network to another in a multihoming setting.

This goes beyond the functionality of the current two level DNS and IP address system. The current system is fine for a human user or a piece of software always commencing a communication session with a FQDN such as www.example.org. What the current system cannot cope with is continuing a session, such as an HTTP session over TCP, with the remote server when that server becomes only reachable via a different IP address. ID/LOC separation proposals generally intend that higher layer protocols such as TCP can continue to operate on identifiers, with a lower, new, layer of protocol software translating these to whichever physical locators are needed to reach the server at each moment in time.

This would allow session continuity when a multihomed host becomes reachable only via a new IP address. It would also enable locators to be allocated to physical sites in accordance with the dictates of route aggregation, which makes life easier for routers, while the allocation of identifiers need not be constrained by route aggregation or any other constraint regarding the physical topology of the network. However, since physical routing is done on the lower level locators, which are still subject to topological constraints, ID/LOC separation doesn't necessarily allow complete portability of networks from one provider to another, since network's internal routing configuration is set in part by numeric locator IP addresses, and these can't be advertised at arbitrary providers without compromising route aggregation and/or adding a further route to the global BGP routing table.

While Ivip is based on LISP (Locator/ID Separation Protocol) [I‑D.farinacci‑lisp] (Farinacci, D., “Locator/ID Separation Protocol (LISP),” June 2007.), which is an ID/LOC separation protocol, I am not sure that Ivip meets all the formal requirements which proponents of ID/LOC might have for such a protocol. In a later section (Comparison with LISP) I attempt to list what Ivip takes from LISP, what it leaves out and what it adds.

Some ID/LOC proposals require non-backwards compatible changes to operating system and/or application software. Some use conventional IP addresses for both "identifier" and "locator". For instance SHIM6 [I‑D.ietf‑shim6‑proto] (Bagnulo, M. and E. Nordmark, “Shim6: Level 3 Multihoming Shim Protocol for IPv6,” April 2007.) (which is still being developed) works between IPv6 hosts with upgraded TCP/IP stacks and achieves multihoming, but not portability, on a purely host-to-host basis without any changes to routers or the addressing system.

LISP also uses some ordinary IP addresses for identifiers and others for locators. LISP requires no changes to hosts, BGP routers or the BGP routing system. It achieves its goals of portability, multihoming and Traffic Engineering (TE) with special ITR and ETR (Ingress and Egress) Tunnel Routers inside provider and end-user edge networks. In the LISP variants which are most suitable for adoption, a centralised or distributed database controls the ITRs.

Ivip is based on some of LISP's principles, including ITRs, ETRs and using a subset of the existing address space as identifiers with the remainders being usable as locators. Ivip does not attempt LISP's communication between ITRs and ETRs. Nor does it involve LISP's explicit TE functions. Ivip has a very different method of distributing ID-LOC mapping information (instructions to ITRs on where to tunnel packets based on their original destination address) than is proposed in current LISP I-Ds.

Some of Ivip's ITRs are "anycast ITRs in the core" (meaning outside provider and AS-end-user edge networks) with the mapped addresses (identifiers, or EIDs in LISP terminology) being part of BGP advertised prefixes. In this way, packets sent by hosts in networks without ITRs will still be tunneled by an ITR and find their way to hosts with Ivip-mapped addresses. This "anycast ITRs in the core" system is an unusual form of anycast, and supports TCP and all other protocols, because all packets are tunneled to the one destination host. This system is believed to make Ivip much more incrementally deployable than LISP, because without these "anycast ITRs in the core", hosts with LISP/Ivip-mapped addresses would not be reachable from hosts in networks which have not installed an ITR.

Ivip may have more ambitious goals than LISP regarding the fine division of address space to serve the needs of millions of end-users and regarding how quickly the database(s) and ITR system can respond to user commands to change mapping of their addresses. Ivip has no explicit TE functions, but it is intended that some TE be achievable. For instance to achieve load balancing over two or more links to a multihomed site which has traffic arriving on multiple Ivip-mapped addresses, the end-user would choose, for each such Ivip-mapped address, which ISP's ETR the packets are tunneled to and therefore which link these packets travel on.



 TOC 

1.4.  One way tunnels to a single ETR

If host HA has a normal BGP-reachable IP address and host HB has an Ivip-mapped address, Ivip is only involved in tunneling packets sent by HA to HB. The typical arrangement is for the packet to be forwarded to an ITR which uses the packet's Destination Address (DA) as a key to its local copy of the mapping database, with the result being an IP address to which the packet will be tunneled. IP-in-IP tunneling is used, with a single outer IP header added, using the original source address. The destination address is that of an ETR, and is provided by a copy of the database which the ITR either contains or can query. The end-user (who runs host HB) has previously set the database so all ITRs in the world will tunnel packets which are addressed to host HB's Ivip-mapped address, to whichever ETR the end-user chooses.

When the encapsulated packet arrives at the ETR, the outer IP header is removed, and the original packet, as HA sent it and as the ITR received it, is forwarded to host HB. (The ITR typically copies the hop-count value from the original packet to the outer IP header and the ETR copies it from the outer IP header to the decapsulated packet.)

 ................                  ................
.       N1       .                .       N2       .
.                .                .                .
. HA-----ITR~~~~~BR~~~~~~TR~~~~~~BR~~~~~ETR-----HB .
.                .                .                .
 ................                  ................

Figure 1: Basic left to right packet flow - ITR in N1.

Figure 1 depicts left to right flow of a packet from host HA (7.7.7.7) to host HB (22.22.22.22). The "raw" packet, with DA = 22.22.22.22 is forwarded to the ITR in network N1. 22.22.22.22 is part of the 22.22.0.0/16 prefix, which is one of the Ivip-Mapped Address Blocks (IMABs) which all ITRs advertise. This means that every ITR which is a BGP router advertises itself as the destination for this prefix, and that every ITR which is an internal router will inject this route into the local routing system. The one /16 prefix, in this example, burdens the BGP system with one extra route, but can be used to support the portable and multihoming address needs of hundreds or thousands of end-users.

Without Ivip or a similarly effective system, some or all of these hundreds or thousands of end-users would get their own PI space, totalling far more than the 65,536 addresses of 22.22.0.0/16, and adding hundreds or thousands of routes to the global BGP routing table.

Ivip's capacity to reduce the growth on the BGP routing table and to enable the efficient use of IPv4 space by giving end-users precisely the number of addresses they need - not 256, 512, 1024 etc. addresses - rests on the RIRs developing an address management policy for Ivip-mapped address space which generally ensures that large blocks of addresses are assigned to the Ivip system, with each block being used to serve the needs of many end-users. It should not be difficult to develop implement such policies.

Ivip is intended to serve the needs of end-users who need portability, multihoming and perhaps TE. Some of these end-users have already gained - or in the absence of a new routing and addressing architecture, would soon gain - an ASN and PI space to add to the BGP routing system. Ivip is also intended to serve the needs of end-users who do not have the resources to become an AS, gain a PI prefix etc., but who nonetheless need portability, multihoming and perhaps TE over multiple links to providers.

Portability of a single IP address between providers is not ordinarily considered a high priority goal, since a single host or NAT router and its DNS entry can easily be manually configured to a new IP address whenever a new provider is used. However there may be instances where an organisation has hundreds of branch offices, each with a single or a few IP addresses, which it wishes to remain fixed despite changing each office's singlehomed connection to one local provider or another, so its country-wide routing system does not need to be reconfigured frequently.



 TOC 

1.5.  Anycast ITRs

Multiple routers, usually each with an associated server, advertising the same prefix is known as "anycasting" [RFC1546] (Partridge, C., Mendez, T., and W. Milliken, “Host Anycasting Service,” November 1993.) [ISC‑Anycast] (Abley, J., “Hierarchical Anycast for Global Service Distribution,” March 2003.). Ivip's use of multiple anycast routers may be novel: tunneling packets to a single tunnel endpoint, which forwards the packets to a single host.

Each ITR either has a copy of the Ivip database (ITRD) or queries (ITRC) a QSD server (perhaps indirectly through one or more caching QSC servers) which does have a copy. The database's array for the 22.22.0.0 IMAB has 65,536 elements - one for each IP address. Each element contains a 32 bit IP address. The element for 22.22.22.22 has been set by the end-user to contain the address 54.32.1.0, which is the address of the ETR in Network N2.

The ~~~~ path in Figure 2 depicts the encapsulated packet being forwarded from the ITR, to N1's border router, to a transit router, to N2's border router and then to its destination, the ETR in N2. This transport of the encapsulated packet has been entirely with the standard BGP system and N2's internal routing system.

In this example, the BGP system sees only a packet with the Destination Address (DA) of 54.32.1.0. If there had been no ITR in N1, but the TR transit router in Figure 1 was an ITR - as shown in Figure 2 - then the BGP system would handle two different packets. The first is a "raw" packet with DA = 22.22.22.22, which was forwarded to the ITR function of this transit router. The second is the encapsulated packet leaving this ITR transit router for the border router of N2, with its outer IP header having DA = 54.32.1.0.

 ................                  ................
.       N1       .                .       N2       .
.                .                .                .
. HA-----IR------BR-----ITR~~~~~~BR~~~~~ETR-----HB .
.                .                .                .
 ................                  ................

Figure 2: Basic left to right packet flow - anycast ITR in core.

In both Figures 1 and 2, the ETR removes the outer IP header, revealing the original packet. After updating its hop-count, the ETR forwards the decapsulated packet to the destination host. This requires either a direct connection to the destination host or support from N2's internal routing system. The latter involves the routing system recognising packets with DA = this particular IP address - 22.22.22.22 - as needing to be forwarded to this host, while (assuming N2 has no other hosts using Ivip-mapped addresses from this IMAB) other addresses within 22.22.0.0/16 are forwarded as usual. "As usual" means towards any ITR inside N2 or failing that, to N2's border router, because this prefix is one which is advertised in BGP by multiple anycast ITRs in the core.

A border router of a provider or AS-end-user network which is an ITR and advertises 22.22.0.0/16 to its BGP peers in other ASes also functions as an "anycast ITR in the core" because "raw" packets emerging from networks with no ITR will be forwarded to this ITR border router and be encapsulated and tunneled from there. Exactly why a network would provide this service for packets not associated with its network is a separate question.

A border router in N1 may be a convenient location to install ITR functionality. A more likely arrangement is that it would not advertise 22.22.0.0/16 or any of the other IMABs in the Ivip system to its BGP peers outside N1 (so as not to attract packets originating from non-ITR networks). The border ITR would internally advertise 22.22.0.0/16 and the other IMABs so that all packets addressed to an Ivip-mapped IP address (IMIP) would be forwarded internally to this border router ITR.

To the picture given by Figures 1 and 2 three other concepts need to be added.

Firstly, packets sent from hosts all over the Net to 22.22.22.22 are tunneled by ITRs to the one ETR at any one time.

Secondly, the address of the tunnel endpoionts for all the ITRs can be changed within a short time, globally - ideally within a few seconds - by the end-user who controls the Ivip-mapping of 22.22.22.22.

Thirdly, if the network in which the sending host is located does not have an ITR, the raw packet will be forwarded internally to the border router and then forwarded through the BGP system to the "nearest" (in BGP terms) ITR, which tunnels it to the end-user's chosen ETR.

Packet's flowing from HB to HA do not require any involvement of Ivip. Each ITR and ETR shown in the previous and the following diagrams also performs whatever functions an ordinary router in its position performs. A packet sent from HB to HA is forwarded internally to the N2's border router and then through BGP routers to the border router of N1, after which it is forwarded internally to HA. The packet may well pass through N2's ETR, but since its DA is not one of N2's ETR's IP addresses, that ETR forwards it normally. The packet may well pass through N1's ITR (Figure 1) or the core-ITR in Figure 2 - but since its DA is not within one of the Ivip system's IMABs, both of those ITRs behave like an ordinary internal router (Figure 1) or transit router (Figure 2) and forward the packet normally towards HA.

If HA had an Ivip-mapped address too, then packets sent to it from HB would also need to go via an ITR and an ETR. These are not shown in the previous two diagrams.



 TOC 

1.6.  Types of ETR

ETRs (with the exception of some TTRs - Translating Tunnel Routers, for mobile destination hosts) are always located in provider or AS-end-user networks. It is also possible for the destination host to perform its own ETR function, which requires it to have suitable software and a BGP-reachable care-of address.

TTRs are discussed in a section below concerning mobility.



 TOC 

1.7.  Types of ITR

ITRs are typically located in provider or AS-end-user networks. ITRs outside those networks - "anycast ITRs in the core" - handle packets sent from networks which have no ITR. It is also possible to perform the ITR function in the sending host, provided that host is not behind NAT. The NAT router itself, assuming it is not behind NAT, is a good place to perform the ITR function.

In this introduction, I assume each ITR handles the full range of IMABs in the Ivip system. However, to spread load over multiple ITRs in a single location, several could be configured so they each cover a fraction of the total Ivip-mapped address space.



 TOC 

1.7.1.  ITRD - full database (push)

An ITRD is an ITR which has a real-time updated copy of the full Ivip-mapping database (or multiple databases, one for each IMAB). Its FIB is always up-to-date, instantly tunneling all packets received whose DA is within any one of the IMABs. An ITRD requires a very extensive FIB and a large amount of CPU RAM. An ITRD could be implemented in a server - but the highest performance ITRDs would always be those with a full ASIC-based router FIB hardware system.



 TOC 

1.7.2.  ITRC - query, cache (pull) and notify

An ITRC does not keep a full copy of the database, but queries a nearby (ideally) Query Server which does have a full copy. Query Servers are not described in detail in this introduction. The ITRC's FIB only tunnels packets for which the ITRC has recently received mapping information.

ITRCs are informed by Query Servers if the mapping changes for any IMIP (Ivip-mapped IP address) for which it recently received mapping information. This cache invalidation message is known as Notification, and is initiated by the Query Server which has a real-time updated copy of the full database for each of the IMABs in the Ivip system. ITRCs could be implemented with a server, but the highest performance ITRCs will generally be routers with additional capabilities, using their existing FIB hardware to encapsulate packets.



 TOC 

1.7.3.  ITFH - Ingress Tunnel Function in Host

An ITFH (Ingress Tunnel Function in Host) is an operating system implementation of an ITRC. As such, this is an additional layer of TCP/IP software in the upper part of the IP Layer 3 code, at the same level chosen for SHIM6. [I‑D.ietf‑shim6‑proto] (Bagnulo, M. and E. Nordmark, “Shim6: Level 3 Multihoming Shim Protocol for IPv6,” April 2007.) This is suitable for hosts which are not behind NAT or for a NAT router itself, provided it is not behind NAT. There is absolutely no requirement for ITFH in Ivip, but in the longer term, if Ivip or something similar becomes widely deployed, the most cost-effective location to perform most or all encapsulation may be in the sending host or the NAT router.

Both ITFHs and ITRCs may not be able to gain mapping information fast enough to correctly tunnel all packets whose destinations are Ivip-mapped. Also, they may not be able to store all this information in their RAM, or implement all the mapping in their limited FIB functions. These "unmatched" packets (including those which are not novel, but which for one reason or another should be encapsulated but have not been) may be simply forwarded normally, in which case they will find their way to an ITRD. Alternatively, the ITRC or ITRH may be able to identify these packets and explicitly forward or tunnel them to a nearby ITRD.



 TOC 

1.8.  Initial deployment

The simplest initial deployment of Ivip involves a single database, multiple anycast ITRs in the core, and one or more ETRs in each of multiple provider networks. Better performance would be achieved with ITRs in provider and AS-end-user edge networks. The diagram below assume a single or distributed database system which controls all ITRs. In later sections I describe how multiple databases, one for each IMAB, are distributed over multiple systems and their updates combined and distributed by a global Replicator system.

 .........                     ..........
.   N1    .                   .   N3     .
.         .                   .          .
.         .                   .  /-IH5   .
. H1----\ .                   . /        .
.        BR1------ITR1-------BR3--ETR1   .        Multihomed
. H2----/ . \      / \      / .\    \    .        end-user
.         .  \    /   \    /  . H6   \- PE1-\    ...........
 .........    \  /     \  /    ..........    \  .    N5     .
               \/       \/                    \ .           .
               /\       /\                     CE1---IH9    .
 .........    /  \     /  \    ..........     / . \         .
.         .  /    \   /    \  .          .   /  .  \-IH10   .
. H3-ITR2 . /      \ /      \ . /------ETR2-/   .           .
.        BR2-------TR1-------BR4---H7    .       ...........
. H4----/ .                   . \-IH8    .
.         .                   .          .
.   N2    .   BR4 = ITR & ETR .    N4    .
 .........                     ..........

Figure 3: Simple multihoming scenario.

The following discussion relates to Figure 3. This represents a small section of the Internet, but we can assume it is the entire Internet for these examples.

Networks N1 to N4 are provider (ISP) networks. N5 is the network of an end-user. Current multihoming practice requires the end-user to have their own PI (Provider Independent) address space, which typically requires them to be an Autonomous System. This means they could run BGP routers, but this is not actually required. All that is required is that both N5's two providers have links to N5's CE1 (Customer Edge) router and that one or the other advertises N5's PI prefix at its border routers and forwards those packets to CE1.

I assume the reader is fully familiar with this approach to multihoming, and that it is understood that the central challenge in devising a new routing and addressing architecture for the Internet involves achieving multihoming without N5 having an unnecessarily large number of IP addresses assigned to it and without burdening the BGP system both with an extra advertised prefix and when changes are made to this advertisement when, for instance, the link to N3 fails and N4's BR4 advertises the prefix instead. The sections following this Introduction provide more background information on these matters.

N1 is an unaltered provider network - it has no ITRs or ETRs. Therefore it is not possible (except via a TTR inside or outside N1) to have any hosts there using Ivip-mapped addresses.

N2 has an ITR but no ETR. Without an ETR (and ignoring TTRs for the rest of this discussion) N2 cannot have any hosts with Ivip-mapped addresses.

N3 has an ETR. The diagram shows one host IH5 with an Ivip mapped address. In this discussion I will assume that each host has a single Ivip-mapped address, but it is perfectly possible for a host to have multiple such addresses, prefixes of such addresses etc. as well as having ordinary BGP-reachable non-Ivip-mapped addresses. N3 has a PE1 (Provider Edge) internal router which has a link to the end-user's site.

N4's border router BR4 is both an ITR and an ETR. N4's Provider Edge router (ETR1) with a link to the end user's site is also an ETR. N4 has a host H7 with an ordinary address and IH8 with an Ivip-mapped address.

N5 has an Ivip-mapped prefix: 22.22.2.0/28 - 16 IP addresses. These are effectively PI addresses, because they have been obtained either from the Ivip system itself, or from whichever company (perhaps an ISP) is participating with the Ivip system and which has assigned the IMAB 22.22.0.0/16 to the Ivip system. N5 probably pays a small annual fee for these addresses, and may need to justify its use of them, as pressure mounts to use IPv4 space efficiently.

N5's Ivip-mapped prefix consists of 16 contiguous IP addresses which happen to fit on binary boundaries. Initially we will consider multihoming for robustness, with all these 16 addresses treated as a prefix, and all tunneled to one ETR or another. In practice, the Ivip-system and the ITRs will tunnel packets to whatever address the end-user chooses, subject to some areas of the address space being off-limits for tunneling and also subject to packets never being tunneled to any address which is Ivip-mapped. In this example, I assume that the end-user ensures that packets addressed to their addresses are always tunneled to an ETR of a provider they have a commercial relationship with.



 TOC 

1.8.1.  Paths taken by packets

Here I will give examples of packet flows in Figure 3.

Packets to and from hosts with ordinary BGP Reachable IP (BRIP) addresses follow predictable paths, for instance: H1, BR1, ITR1 (acting as an ordinary transit router), BR3, H6. Packets sent by H6 to H1 follow the same path in reverse.

Packets sent by H1 to IH5 (with an Ivip-mapped address) follow this path: H1, BR1, ITR1 (which encapsulates the packet with IP-in-IP, DA = ETR1's IP address), BR3, ETR1 (decapsulates the packet), BR3 (assuming N3's internal routing system has an appropriate route to handle packets with DA = IH5's Ivip-mapped address), IH5.

Packets sent from IH5 to H1 follow a simpler path, because destination address is an ordinary BRIP - so the packet is handled by the usual internal and BGP systems, without involving Ivip mechanisms: IH5, BR3, ITR1 (acting as a conventional transit router), BR1, H1.

A packet from H3 to IH5 does not use the core-ITR ITR1, because its network N2 has its own ITR. The path is: H3, ITR2 (which encapsulates the packet with DA = ETR1's IP address), BR2, TR1 (or perhaps ITR1, depending on BR2s choice of best path for the prefix in which ETR1's BRIP matches), BR3, ETR1 (which decapsulates it, to restore the original packet with DA = IH5's Ivip-mapped address), BR3, IH5. Packets from IH5 to H3 involve no Ivip handling and follow a path such as IH5, BR3, TR1, BR2, ITR2 (acting as an ordinary internal router, since the packet's DA is not part of an Ivip-mapped address block - IMAB), H3.

A packet from H4 to IH5 would follow a similar path to that just described, but initially it would travel to BR2, and then to ITR2. ITR2 advertises (injects?) the routes for all the IMABs into N2's internal routing system. BR3 forwards the packet to ITR2 for this reason. If there was no ITR in N2 (like the situation in N1) then BR2 would have forwarded the packet to one of its BGP peers, probably ITR1, which also advertises the same set of IMABs. I assume that the internal routing system route for packets addressed to any one of these IMABs takes precedence for BR2. Once the packet reaches ITR2, it is encapsulated and forwarded as previously described to ETR1, where it is decapsulated and forwarded to IH5.

A packet sent from H6 to IH5 would presumably be handled by N3's internal routing system, which presumably has a route specific for IH5's Ivip-mapped address. If not, then the packet will be forwarded out of N3, because N3 has no ITR, and will reach the nearest ITR, which is the core-ITR ITR1. There is it will be encapsulated and forwarded to ETR1, to be decapsulated and forwarded through BR3 to IH5.

Similarly, a packet from H6 to IH9 will either be handled by N3's internal routing system - forwarded directly as a raw packet through BR3, ETR1 (as a normal internal router), PE1, CE1 and IH9 - or be forwarded out to ITR1, where it is encapsulated, forwarded to BR3 and ETR1, decapsulated and forwarded to PE1, CE1 and IH9.

N2 has its own ITR, so its hosts do not rely on external ITRs such as ITR1 when sending packets to hosts with Ivip-mapped addresses. N2 has no ETR, so it can't have any hosts with Ivip-mapped addresses. N4 has one ITR and at least one (two) ETRs, so it can have hosts with Ivip-mapped addresses. N4's host's don't rely on external ITRs either when sending packets to Ivip-mapped addresses.

The hosts in N5 are all on Ivip-mapped addresses. When they send packets to hosts with Ivip-mapped addresses which are outside N5, these packets will need to be handled by an ITR - unless the destination host is within whichever provider network N3 or N4 CE1 sends the outgoing packets to and if that provider network's internal routing system has routes for that destination host. If IH9 sends a packet to IH8, while CE1 is sending outgoing packets along the link to N3, then the raw packet will be forwarded out of N3, since N3 has no ITR. The raw packet will be forwarded to ITR1, which will encapsulate it and tunnel the packet to BR4, which decapsulates it and forwards it to IH8.

If there was no core-ITR such as ITR1 nearby, these packets would have to travel to the nearest core-ITR. This is assuming that N4's BR4, which is an ITR, is not advertising the Ivip IMABs to its BGP peers. If there was no nearby core-ITR and N4's BR4 was advertising the Ivip IMABs, then in the previous example, the raw packet would be forwarded out of BR3 and find its way to BR4, which is acting like a core-ITR. BR4 could respond in two ways. Firstly, BR4 would look into its database (if it was an ITRD - or use a Query Server if it was an ITRC) and find that the Ivip mapping for this address (IH8's) is to tunnel it to one of BR4's own addresses. It could encapsulate it, forward it to itself and decapsulate it. Secondly, before testing the packet against the Ivip database, BR4's FIB could first apply local routing rules to the packet, in which case the packet would be forwarded directly to IH7. This would be a rare, but perfectly valid, case where a packet sent to a host with an Ivip-mapped address completes the journey, in this case via three networks, without actually being tunneled.

It would be a public-spirited act for N4 to make its BR4 ITR functions available to packets arriving from its BGP peers. There could be a number of reasons why N4 does this, including simply wanting to encourage Ivip adoption, in the hope of saving a bunch of money by not having to upgrade its DFZ routers as quickly as would be required without something like Ivip. Perhaps there could be some central collection of funds and subsidisation of core-ITRs - which BR4 would effectively become - if it advertised the Ivip IMABs to its BGP peers.

However, any ITR which does this MUST forward all decapsulated packets without restriction. For instance if ITR1 was an ordinary transit router and there was no other core-ITR anywhere close, then BR4, acting as a core-ITR, could be handling packets which have nothing directly to do with N4 or its customers. For instance, a packet from H2 to IH5 would follow this path: H2, BR1, TR1, BR4 (acting as an ITR, encapsulates it), TR2 (a new name for the transit router where ITR1 was), BR3, ETR1 (which decapsulates it), BR3 and IH5. It would not be acceptable for N5 to make BR4 an anycast ITR for its BGP peers and only forward encapsulated packets received from those peers where the final destination was within N4.

N5 could have its own ITR, which would get the raw packet and encapsulate it - but perhaps N5 doesn't want to run an ITR, due to the capital cost, due to the high traffic volume of database updates for an ITRD, or due to the slow response times and extra traffic over its link for an ITRC due to the slow nature of its link to the Query Server(s) the ITRC would depend on in whichever provider network CE1 is currently sending outgoing packets to.

This discussion has involved a lot of low-level detail but I hope it has helped the reader understand various ways packets can flow with Ivip.



 TOC 

1.8.2.  Multihoming when both links are working

The end-user arranges with N3 and N4 to configure their ETRs, internal routing systems etc. ready to accept encapsulated packets for its 22.22.2.0/28 prefix. This also involves N3 and N4 allowing packets with Source Addresses (SAs) from this prefix to be forwarded normally, including out of their border routers to the BGP system. In the case of N4, BR4 must accept outgoing packets with SAs within 22.22.2.0/28 to be forwarded to its BGP peers and to be accepted into its ITR function (if their DA matches one of the Ivip IMABs).

N5's CE1 router accepts incoming packets with DA matching 22.22.2.0/28 on either link, and forwards them to the local network, which is shown with two hosts IH9 and IH10 which both have Ivip-mapped addresses.

The administrators of N3 and N4 tell the end-user (the administrator of N5) the IP addresses of their two ETRs: ETR1 and ETR2. It would also have been possible for N4 to have the packets decapsulated by its BR4 which is N4's second ETR, as well as an ITR and border router. However, this is a busy router and it makes more sense to have ETR2 do the decapsulating work. In this case - ETR2 doing the decapsulation - N4 doesn't have to alter its internal routing system to forward packets for N5's prefix, because the link to CE1 is connected directly to an interface of ETR2. N3 does need to configure its internal routing system to handle CE1's prefix, unless some special tunneling is used to get the decapsulated packets from ETR1 to PE1.



 TOC 

1.8.3.  External multihoming monitoring system

Not shown in this diagram is some kind of commercial monitoring system, which the end-user hires to keep a constant watch on the status of their multihoming arrangement.

Monitoring of link failure etc. is not part of Ivip. There may be an argument for one or more IETF standardised protocols etc. for such a monitoring system. Here we assume there is a monitoring system which can rapidly and reliably detect any failure which affects N5's multihoming arrangement, including for instance the failure of the link to either ISP, the failure of either ISP's PE router, its ETR, or the ISP's entire connection to the Net.

The monitoring system probably needs to be located entirely outside N3, N4 or N5. In principle, it might be possible to locate it in N5, but the whole purpose of a monitoring system is to change the Ivip database once a fault occurs, so that the ITRs tunnel packets to an alternative ETR which has a working link to CE1. Any such commands need to be cryptographically secured, and a unidirectional system for such commands to whatever accepts commands to alter the mapping database might be vulnerable to a replay attack. I assume that the monitoring system needs a reliable two way link to whatever Update Authorisation Server (UAS) the end-user uses to alter the mapping of their Ivip-mapped addresses. In that case, it is best that it the monitoring system not be in the end-user network, because at the time of N3 link failure, two-way communication can't occur using the current ITR tunnels, which are still to N3's ETR.

It is conceivable that the UAS could be preconfigured to communicate with a monitoring system at the end-user site via its own tunneling of packets to one or more ETRs which are not currently tunneled to by the Ivip ITRs. That would best be achieved by an IETF standardised protocol. It is also conceivable that an external monitoring system might accept prompt, cryptographically secured, messages from some router or server in the N3 network the moment the link to the CE1 went down. This too could be the subject of an IETF standardised protocol, but it would not directly involve ETRs or Ivip.

Ignoring for the moment packets sent by hosts in N3, N4 or N5, initially, all packets sent by hosts all over the world to any of N5's prefix of Ivip mapped addresses are sent via ETR1, because that is where the end-user and/or the monitoring system has configured the database to map these 22.22.2.0/28 addresses to. In this example, the end-user has given the monitoring system the private key, username and password etc. which is necessary for the monitoring system to automatically change the mapping, via the UAS which handles the end-user's Ivip-mapped addresses.



 TOC 

1.8.4.  Multihoming after a link fails

The monitoring system sends frequent probe packets to CE1, by tunnelling packets to both ETR1 and ETR2. The monitoring system might also monitor the current state of the mapping of the end-user's 16 Ivip-mapped addresses. It could do this either by gaining a real-time feed of database changes, or by querying a Query Server (which would use Notify to instantly inform the monitoring system of any change). At some point in time, the inability of the monitoring system to receive responses to probe packets sent via ETR1 causes it to decide this link has failed. It uses the credentials supplied by the end-user and initiates a session with the UAS by which this end-user controls the mapping of its addresses.

Once logged in, the monitoring system could issue separate commands to change the mapping for each of the 16 IP addresses, or a single command for all 16 together. It changes their mapping from the IP address of ETR1 to the IP address of ETR2. There is no particular reason other than N5's internal networking convenience why its addresses should be a conventional prefix on binary boundaries, as they are in this example. The Ivip system can handle individual addresses and arbitrary ranges of addresses with equal ease.

The precise details of the UASes, databases, update streams, database dump files, Replicators, ITRs and QSD/QSC Query Servers are detailed in later sections of this I-D. The discussion here gives a rough idea of what is achieved by these systems.

The command from the monitoring system - or from any other system or a web-browser human interface session with anyone or anything with credentials accepted by the UAS - will cause the UAS to hand down a User Mapping Update Command (UMUC), with its signature (therefore a SUMUC), to another UAS which delegated it the responsibility for whatever ranges of Ivip-mapped addresses it is authoritative for. This command causes a change in the database for the particular IMAB which the end-user's addresses are part of. This results in the change being incorporated within a second or so into multiple identical UDP packets which are sent to 30 or so "Level 1 Replicators". (There may be other ways of achieving the same results, but this is the plan I am pursuing at present.) Three or four levels of replicators reliably propagate the changes to the global network of ITRs and QSDs (Query Servers with a full copy of the Database).

This causes a nearly instant (say a few seconds delay, but ideally a fraction of a second) change in the FIBs of the ITRDs all over the Net - so that all packets arriving with DA matching 22.22.2.0/28 will now be tunneled to ETR2, instead of to ETR1. All ITRCs which recently (perhaps some standard caching time, such as 600 seconds) requested from a QSD (perhaps via one or more QSCs - Query Servers with Cache) mapping for an IP address which resulted in a response which concerned any one of the 16 addresses which have just had their mapping changed, will quickly (fraction of a second?) receive a Notification from the QSD (plus chain of 0 or more QSCs) which provided the response. The notification causes all these ITRCs to change their tunneling to ETR2 as well. ITFH functions in hosts behave and are notified in exactly the same way as just described for ITRCs.

Connectivity is restored, as long as N4, its ETR4, the link to CE1 etc. are still working. CE1 also needs to have changed its outgoing packet path to be via ETR2. Perhaps the monitoring system could inform it of the change, if CE2 had not already determined that there was a problem with the link to N3.



 TOC 

1.8.5.  Potential problems with internal routing systems

There are some potential problems during this failure and changeover time which I will briefly mention. I would appreciate any assistance understanding the likely behavior of provider internal routing systems in this situation. I understand that typically, the internal routing system will rapidly respond to the broken link, but would like to know more about all this.

When the link to CE1 fails (which could be due to any failure in CE1, the link, PE1, the internal routing system etc.) can the internal routing system of N3 be relied upon to quickly cancel the special route it has for forwarding packets whose DA matches 22.22.2.0/28 to PE1?

If not, then there is a potentially serious problem with hosts within N3 not being able to send packets to N5. If N3 can't guarantee that its internal routing system will quickly remove any such routes, and so allow packets addressed to 22.22.2.0/28 to find their way out of N5 like the packets addressed to the rest of the 22.22.0.0/16 IMAB (where they will find their way to an ITR such as ITR1, or any ITR within N3), then perhaps it would be better if N3 never made such a route in its internal routing system. In this scenario, all packets from hosts inside N3 to 22.22.2.0/28 would need to go via an ITR, and ETR1 would use an explicit tunnel to get decapsulated packets to PE1.



 TOC 

1.9.  Ivip's intended benefits

From the above examples, it can be seen that a global Ivip system, or something similar, is capable of having large amounts of address space assigned to it, where it can slice and dice it with very fine resolution (single IPv4 addresses) with very rapid response times (probably a few seconds, but perhaps less with ideal arrangements) so that the addresses can be portable between any ISP with an ETR. This portability directly supports multihoming which can be controlled at a "site level" (range of IP addresses all at once) or down to an individual host (single IP address) level. For IPv6, I envisage Ivip mapping each /64 to a particular ETR.

This portability and multihoming - and whatever TE is possible with Ivip - requires no changes to host operating systems or applications. The ITFH function is a strictly optional concept, which would be attractive for some hosts and NAT routers in the longer term but which is not required at any time, including initial introduction.

The use of "anycast ITRs in the core" means that hosts in unaltered provider and AS-end-user networks are all capable of sending and receiving packets to and from hosts with Ivip-mapped addresses.

There are cost and administrative challenges in deploying the entire Ivip system, including especially the anycast core-ITRs. However, these costs and difficulties are arguably far less challenging than what may be the two remaining alternatives: firstly to pay for and ensure the installation of ITRs in every provider and AS-end-user network as LISP is widely believed to require, or secondly, to do nothing and allow all the routers in the DFZ to become swamped by continued growth in the global BGP routing table, and so need replacement with new, more expensive, models.

Ivip or something like it seems to offer the only chance we have for efficiently using limited IPv4 address space. Ivip is unconstrained by binary boundaries, "route aggregation" etc.

Only when addresses can be assigned according to direct need, rather than in large chunks as they have been to date, can the address space be used efficiently. For instance to have the majority of the 3.7 billion available IP addresses ((0 to 223 inclusive, except 10 and 128) * 256 * 256 * 256 = 3.724 billion) actively used either for an individual host or for a NAT device which supports multiple hosts on a private network. There are no reliable estimates of actual usage of IPv4 utilisation, but in early 2007, a random ping survey indicated there were about 108 million ping-responsive hosts, with much higher densities in some advertised prefixes. [RW ping survey] (Whittle, R., “Probing the density of ping-responsive-hosts in each /8 IPv4 prefix and in different sizes of BGP advertised prefix,” March 2007.)

Ivip can also be used to achieve some TE benefits, by steering traffic of individual Ivip-mapped addresses to one ETR or another.

Ivip's ability to support highly efficient mobile-IP is discussed in a later section. So to is the possibility that it could be used to greatly facilitate highly scalable IPv6 tunneling over the existing IPv4 system.

None of this places any further burden on the BGP system. Ivip's benefits should greatly reduce the impetus for end-users and perhaps providers for gaining and advertising PI addresses in the global BGP system.

This I-D proposes changes which are pervasive and unprecedented. There are many questions to be explored, security problems to be resolved etc. The scope of this project goes beyond the IETF developing protocols and recommended procedures, since it requires cooperation amongst providers, end-users and RIRs, who must approve of address space being used for this novel purpose.

There is nothing technically preventing one or more Ivip systems being created today, perhaps as profitable enterprises hiring out their IP addresses to customers - as long as RIRs approve. Although it may be impossible and/or undesirable to prevent the creation of multiple independent Ivip systems which behave as described here, the rest of this I-D concentrates on the establishment of a single global Ivip system. (Multiple Ivip systems need not know about each other - it is not disastrous if an ETR tunnel end-point of one Ivip system's mapping is actually an address which is Ivip-mapped in another system.)

This introduction has provided a good general overview of Ivip, for those with some familiarity with the crisis in routing and addressing. Sections below contain a more comprehensive statement of the problem space, goals and potential solutions. Following that I explore in greater detail the various aspects of the Ivip system. This is a very early stage of development and I hope many people will point out faults, suggest improvements, and be inspired to create their own proposals to these challenging problems. One luxury this field enjoys is that we can invoke large resources and make uncommonly bold plans - because there is a dearth of easy alternatives and the costs of doing nothing are expected to be so high.



 TOC 

1.10.  Long term deployment

The above discussion primarily relates to Ivip's capacity to provide important benefits to those who adopt it, while maintaining reachability from hosts in networks which have made no changes, such as installing ITRs or ETRs. The most likely deployment actions will involve the networks of Update Authorisation Servers, Replicators, ITRDs, ITRCs and Query Servers. Although all these functions should be capable of being implemented in software on ordinary servers (albeit with many gigabytes of RAM for the QSDs and ITRDs) it is likely that most network operators will require the ITRD and ITRC functions to be performed on existing or future router systems.

In the longer term, assuming Ivip or something similar is widely adopted, it can be expected that there will be widely available, auto-discovered, QSC and QSD services which can support queries from ITRCs and the ITFH functions in hosts.

An ITFH function in a host operating system is the most cost-effective way of performing the Ingress Tunneling function of Ivip. The cost will be essentially zero for the software, and there is generally plenty of CPU power and RAM available to do the work.

Assuming the Replicator network will be largely built by and shared by providers and AS-end-users and assuming this system propagates updates throughout the world in a few seconds, then it is possible that the Notification arrangement will make the cheaper ITRC routers an attractive alternative to the full database feed, large RAM, very large FIB ITRD routers (or their server-based alternatives). If an ITRC can get an up-to-date response to a query about any IP address from a local QSC - in a fraction of a second - then it may be acceptable for it to do this for every novel packet it receives. In that case, the ITRC handles all packets without delay, providing the performance of an ITRD without the need for a full database feed and without the same large FIB and RAM requirements (assuming of course that the ITRC is not attempting to handle packets addressed to millions of Ivip-mapped addresses at once).

If ITRCs can be so successful, then so can ITFHs which have sufficient RAM and CPU power. An ITFH costs nothing and always achieves optimal paths, since there is no deviation from the shortest path towards a separate ITR. An ITFH function would probably become mandatory in any web server at a hosting company. The alternative would be a large investment in ITRCs and/or ITRDs.

Similarly, ITFH functions in the NAT functions of DSL and HFC cable modems would also be an effectively zero cost alternative to the provider network deploying large numbers of ITRDs and ITRCs. The provider would still need to maintain a responsive QSD and QSC network. (I tend to think of this being an "in-host" function because these modems, although technically routers, have no hardware FIB and the ITRC function is performed entirely in software.)

The proliferation of peer-to-peer filesharing and other applications presents something of a challenge for ITRCs and ITRHs. An ITRD has no difficulty with this traffic, since its large FIB is ready to encapsulate packets with any Ivip-mapped destination address. However, a smallish ITFH function in the NAT router section of an ADSL modem will have some limitations on memory for its cached mapping information. A large number of hosts behind the NAT, each firing off packets to thousands of separate Ivip-mapped host addresses, would place a significant burden on the ITFH, including a frequent need to contact the nearest Query Server. However, hopefully most users behind a NAT firewall, including especially the hundreds of millions of DSL, HFC cable and fibre home and SOHO end users, will have no need to have their NAT on an Ivip-mapped address.

This is a highly speculative and optimistic vision for a proposal which is less than a month old. If such widespread deployment eventuated, the long-term stable outcome might resemble what the proponents of ID-LOC separation have long preferred: a new layer (ITFH) of software in the TCP/IP stacks of many hosts. However, such changes to hosts would be purely to increase efficiency and reduce costs, not to ensure reachability - which is already provided by a sufficiently widely distributed system of core-ITRs.

ETR functions can also be performed in hosts, or at least in NAT devices for hosts behind NAT. The NAT device could be an ETR for specifically identified hosts, each with a care-of address in the private network. In this case, the NAT ETR somewhat resembles a TTR, since the destination host sends its outward-going packets through the same device.

These visions of ubiquitous Ivip adoption are probably unnecessary and unrealistic. Only a subset of hosts or end-user networks will benefit from real portability and multihoming.

Future versions of this I-D will more fully explore the highly promising use of the ITR system to beam packets to TTRs for mobile IP.

Future versions of this I-D will more fully explore the potential for using the IPv4 Ivip system for tunneling IPv6 packet in a highly scalable fashion, for using Ivip with IPv6, and for using IPv6 Ivip to tunnel IVv4 packets.



 TOC 

2.  Definition of Terms, Concepts and Functions

In the context of the extensive Introduction, this is a comprehensive set of definitions not just of new terms, but of the main concepts and functions which make up the current Ivip proposal. I explore in greater detail in sections below how the various forms of ITR etc. are used, but have included considerable detail here. There is some repetition of material from the Introduction.

Some of the terms defined here are identical or similar to those used in LISP and in general discussion. Others are different from roughly equivalent terms used in LISP. There has been a long discussion on the RAM list about the precise meaning of the terms "Identifier" and "Locator". I am trying to avoid these terms as much as possible with Ivip, because of the evident confusion they cause. Whether an item of information such as an IP address should be considered or referred to as an "Identifier" or a "Locator" depends very much on the context in which it is used - so these terms tend to describe usage, rather than any intrinsic quality of the item.

The long Introduction above has used some of these terms, but not all. Eventually the Introduction may be rewritten to use all these terms consistently, and this section moved in front of that introductory material. For now, I want the Introduction to be accessible to readers without learning much new terminology. However, for the more detailed description of Ivip principles and mechanisms below, we need to use the new terms extensively.

This is quite a detailed definition of terms, which gives some insight into the operation of whole the Ivip system.

[To do: references for LISP, APT etc. in definitions below.]



 TOC 

2.1.  IMIP - Ivip-Mapped IP address

Within the global unicast address space of IPv4 or IPv6, a subset of these addresses are covered by one of the one or more IMABs (Ivip Mapped Address Blocks, as described below). Every such address is an IMIP.

The fact that the relevant part of the Ivip database system (the particular IMAB-DB as defined below) may contain a null entry (zero) for this particular address (meaning to drop the packet, rather than tunnel it somewhere) does not alter the fact that this address is an IMIP. Similarly, if current mapping is to an unreachable address, or to the wrong ETR, or to no ETR etc. the address is an IMIP simply because it is within the range of one of the Ivip system's IMABs.



 TOC 

2.2.  NIMIP - Non-Ivip-mapped IP address

Within the global unicast address space of IPv4 or IPv6, every address which is not an IMIP (is not within one of the IMABs) is a NIMIP.



 TOC 

2.3.  BRIP - BGP Reachable IP address

A BRIP is an ordinary IP address which is within one of the currently advertised BGP prefixes, excluding those prefixes which are for IMABs, meaning they are used to advertise Ivip mapped addresses (IMIPs).

Whether or not there is actually a host or router at this address is not important. The criteria is that the global BGP system has an advertisement for it, and that therefore ordinary BGP routers will forward packets with this DA to whichever router advertises the relevant prefix. BRIP addresses include those which are anycast by all systems other than Ivip. For instance, I understand that some root nameservers are implemented with multiple servers using anycast. Those addresses are BRIPs too. (This discussion assumes a single global Ivip system. How to define this term when there are multiple Ivip systems, including those which are not known publicly, would be trickier.)



 TOC 

2.4.  UAIP - Un-Advertised IP address

Any global unicast IP address which is not part of a currently advertised BGP prefix is a UAIP. UAIPs include addresses which have not been allocated by the IANA to any RIR, and which have not been assigned by an RIR (or other address assignment authority) to any end-user. The remainder of the UAIPs are in regions of the address space which has been assigned to a provider or AS-end-user but with they are not, at the moment, advertising. (This assumes that no router ever advertises a prefix its operators are not entitled to advertise, by virtue of that prefix not having been allocated or assigned.) [To do: link to Geoff Huston's site and my ping survey page's table.]



 TOC 

2.5.  DID - Destination Identifier

This is roughly synonymous with LISP's "EID" (Endpoint ID). A DID is an IP address which is an IMIP. "IMIP" is a subset of all the possible IP addresses. We can know that a packet's DA is within this IMIP set, so we know this specific address refers to a DID, of some particular IRH/IRN (Ivip-mapped Receiving Host/Node). A host or any non-ITR router doesn't recognise this. It is one of the tasks any kind of ITR must perform to recognise that the packet's address is in the IMIP set, and therefore is a DID which must be used to look up mapping - in an internal set of copies of the IMAB-DBs or via some external Query Server.



 TOC 

2.6.  TELOC - Tunnel Endpoint Locator

A TELOC is a BRIP address which we, or an ITR, reasonably believes is the address of an ITR - because this address is found in the database as the mapping for one or more IMIPs.

The ITR will encapsulate the packet, using the appropriate TELOC as the DA of the outer IP header.

To all routers, the packet is just an ordinary packet addressed to some BRIP address. When it arrives at its destination, the idea is that this will be an ETR which decapsulates the original packet and forwards it to the host with the orginal DID address. However, the ITR doesn't know for sure this will happen. It simply tunnels the packet to the TELOC.

"TELOC" is related to LISP's "RLOC" (Routing Locator), except I think that some LISP material uses "RLOC" to refer to any IP address which is not an EID. I think this is rather too loose a use of a single term, so for Ivip, "BRIP" means any advertised address which is not an IMIP. "DID" refers to the specific address of a packet, which is an IMIP, and "TELOC" refers to a specific address to which a packet is tunneled.



 TOC 

2.7.  IMAB - Ivip-Mapped Address Block

(This is what I previously referred to as a "master-subnet".) An IMAB is a contiguous range of address space for which a single RUAS (Root Update Authorisation System) is authorised to control the mapping for, and for which it does so via a single stream of update packets (US-IMAB) and a single IMAB-DBD (IMAB DataBase Dump) file.

While the database structure, update messages etc. work fine for arbitrary starting and ending points for an IMAB, it is important that the IMAB can be advertised as a single BGP prefix. A straightforward prefix on binary boundaries can be an IMAB, such as 29.0.0.0/20. Assuming IPv4 for the rest of this definition, and assuming a /24 limit on the longest prefix which is admitted to the BGP system, all IMABs need to be on /24 boundaries. They should not involve a prefix any shorter than /8.

An IMAB may straddle simple binary boundaries, as long as it is still acceptable to be advertised within BGP. For instance 29.0.1.0/20 is also a valid IMAB, covering 29.0.1.0 to 29.0.16.255. 29.0.1.128/20 would not do, because it straddles a /24 boundary.

It is not permissible to use a range such as 29.0.1.0 to 29.0.15.255 as an IMAB, since this does not match a full /19, /20 or /21 range.

The reason for these restrictions is that when an ITRD (full "push" database ITR) downloads an IMAB-DB, decodes it and applies all real-time updates to it, it is then able to handle packets for the address range of the IMAB. At that point in time, it advertises the IMAB's prefix to its BGP peers. In order to reduce the number of advertised BGP routes and to reduce churn in the way they are advertised, it is desirable for every area of address space covered by a single database dump and by a single stream of update packets to match a single prefix which can be advertised in BGP.

Where a single large range of contiguous addresses is for some scaling reason handled with separate database dumps and update streams, it should be divided into separate IMABs. This increases the number of BGP advertised prefixes, but may be justifiable, for instance within a large (eg. /8) prefix of IMIP space, so that ITRs can load share by each handling a subset of the entire /8.



 TOC 

2.8.  IMAB-DB - IMAB DataBase

This refers to the body of data which specifies the Ivip mapping of the individual IPv4 addresses (or /64s for IPv6) for a single IMAB. Within a RUAS (Root Update Authorisation System) there exists one or more copies of the Master IMAB-DB for each IMAB this RUAS is authoritative for. This is updated in real-time by Update Commands directly from end-users or from branch and leaf UASes (Update Authorisation Systems).

ITRDs (full database ITRs) and QSDs (Query Servers with the full Database) maintain as best they can a real-time updated copy of each IMAB-DB for each IMAB in the Ivip system. This is a Slave copy of the IMAB-DB. The state of the slave copy is that it lags behind the master, ideally by only fractions of a second, but in practice probably by a few seconds - or more if there is congestion or lost packets in the Replicator system.

The slave copy of the IMAB-DB directly controls the FIB of the ITRD, and how the QSD responds to queries. (In a server-based ITRD, the array which contains the raw mapping data is the FIB, because the packet handling code simply indexes into the appropriate location in the array for the appropriate IMAB, and reads the 32 bit result there.) Changes to the IMAB-DB may cause the QSD to send Notifications to child QSCs, ITRCs or ITFHs which previously received query responses concerning one or more IMIPs for which the mapping has changed.

Whereas LISP and APT carry a potentially large amount of information for each IP address or prefix within their database system (eg. multiple ETR addresses, TE parameters for choosing dynamically between multiple ETRs and in the case of APT, the end-user's public key), the Ivip database structure is extremely simple. Each element of the database contains a single IP address: 32 bits for IPv4 or 128 bits for IPv6. Typically, this is the address of an ETR, but in fact it could be any address, subject to certain off-limits ranges, including the prohibition of any addresses which is an IMIP. In practice, the value of the IP address would always point to a BRIP address, not to an unadvertised UAIP address.

Consequently, the dump and the update messages for this database can be highly compressed and easily interpreted. (Any protocol handling these dumps or update messages should be backwards compatible extendable to incorporate further elements, but I can't think of a use for them at present.)

The easiest way to think of this database is an array, where location 0 refers to the first IMIP in the IMAB. It is also possible to structure the database as a series of prefix rules, so for instance 16 contiguous addresses on binary boundaries with the same mapping could be specified by a rule to this effect, rather than with 16 separate IP addresses. For IPv4, I will assume the database is simply an array. For IPv6, it would probably be best to structure the database as prefix rules, since so many more address bits may vary over the range of the IMAB. (I guess IPv6 was designed by people who wrote programs in high level languages, rather than electronic hardware engineers!)



 TOC 

2.9.  IMAB-DBD - IMAB DataBase Dump

This is a file, typically compressed, which carries the full contents of the master IMAB-DB at some point in time. It is made available quickly at multiple servers so ITRDs and QSDs can download a copy when they boot up, or periodically afterwards.

The dump file format needs to be carefully standardised. It should have an extendable format, and be compact for all typical data patterns. Probably a series of binary elements followed by a long array would be fine, all gzipped. However, maybe a specialised compression algorithm would be more efficient, be easier to implement at the ITRD or QSD, or provide some other benefits.

The dump file needs to specify: the format of the file, such as by the RFC version it adheres to; the time and date it was created; a number identifying the RUAS which generated it; a sequence number matching such a number in an update stream packet which signifies a dump was made at that instant; the AFI (Address Family Identifier) of the address space covered; the starting address and range of the address space covered; the BGP prefix which will be advertised once the ITRD has this data loaded and fully updated (perhaps this is redundant); finally, the array of addresses in some compressed form.

There probably needs to be a CRC as well, with the ITRD or QSD able to ensure by some cryptographic means that the data is valid and really originates from the RUAS.



 TOC 

2.10.  UMUC - User Mapping Update Command

A UMUC is whatever action the end-user performs on one or more different user-interfaces of whatever UAS (Update Authorisation System) they use to change the mapping of their one or more IMIPs. The system would be able to tell the user the current mapping and also confirm that a requested change to the mapping was to an acceptable address.

For now, I will assume that all UMACs are for valid mapping addresses - so a UMAC is a successfully accepted update command from the end-user, or some person or system or with the end-user's credentials. There probably needs to be a protocol by which a request to change to an invalid address, for example a UAIP, is rejected with an error message.

The command takes the form of a starting IMIP, a range, and a single IP address to which this one or more IMIPs will have their mapping changed to. The UMUC exists only after the UAS has verified the credentials, the addresses and the new mapping address as being valid. The UMUC is then ready to be handed down either to alter the IMAB-DB itself, or to another UAS which achieves the same outcome.



 TOC 

2.11.  SUMUC - Signed User Mapping Update Command

This is the information contained in a UMUC, signed by the UAS which accepted it from the user (or by some lower UAS in the tree), being handed down the tree to another UAS, perhaps the RUAS of the tree, so that the recipient UAS can verify the signature and regard the UMUC as authoritative.



 TOC 

2.12.  SH/SN - Sending Host/Node

The host computer, or a router, which sends the packet in question. Other than the local network's checking the SA (Source Address) of the packet to decide whether it is from an authorised address, there is no difference in Ivip whether the sending host or node has an IMIP or a BRIP address.



 TOC 

2.13.  RH/RN - Receiving Host/Node

The host computer, or a router, with an ordinary BRIP address (or prefix) which is intended to be the final recipient of the packet in question.



 TOC 

2.14.  IRH/IRN - Ivip-mapped Receiving Host/Node

The host computer, or a router, with an IMIP address (or prefix of IMIP addresses) which is intended to be the final recipient of the packet in question. An IRH or IRN does not need any address or prefix other than the one it has via the Ivip system. However it may have one or more addresses in the local network and it may have more than one IMIP address or prefix, each perhaps using a different ETR.



 TOC 

2.15.  MH/MN - Mobile Host/Node

A host computer, or a router, with an IMIP address (or prefix of IMIP addresses) which is using via one or more two-way tunnels it establishes with one or more TTRs (Translating Tunnel Routers). A Mobile Host or Node typically [To do - what is the proxy MIPv6 mode where this is not true??] has a "care-of" address in the one or more networks it is currently connected to. It also needs special software which operates from the care-of address, running the tunnel to and from the TTR, and connecting that tunnel with the main TCP/IP stack in the host or node. Please see the "Loose ends - TTRs and Mobility" section for a fuller description of how Ivip can help with Mobile IP.



 TOC 

2.16.  UAS - Update Authorisation System

This is a general term for a system which is operated by an organisation and plays some role between the user making a UMUC and the actual IMAB-DB being changed.

Some UASes accept UMUCs as their inputs. Those which do not must accept SUMUCs from other UASes. A UAS may have end-user interfaces and links to branch or leaf UASes higher in the tree.

Leaf UASes are at the ends of branches of a tree composed of UASes, with a single Root UAS at the base. Each UAS SHOULD be implemented as two or more linked but redundant servers, similar to the master and one or more slave arrangement of nameservers, with all of them being authoritative in terms of their interactions with other UASes and with end-users.



 TOC 

2.17.  RUAS - Root Update Authorisation System

A RUAS is the authoritative UAS for one or more IMABs. Therefore, it periodically generates - say every 10 minutes - an IMAB-DBD file. It also continually produces a stream of updates. The RUAS MUST be implemented as two (three?) or more redundant servers in geographically and topologically well-separated locations.

The interactions between the RUAS and its branch and leaf UASes SHOULD be governed by some new IETF standards to ensure it is easy and robust to run these systems and have them interoperate securely.

The set of other UASes each RUAS may interact with may be different for the authorisation tree for each of the potentially multiple IMABs it handles. The branch and leaf UASes in each such tree may also be members of other trees of this RUAS (for other IMABs) and of trees rooted in other RUASes. An RUAS may be a leaf or a branch in some other RUAS's tree, but in that role the system and its servers only behave as an ordinary UAS.



 TOC 

2.18.  US-IMAB - Update Stream specific to one IMAB

This is a stream of data, at present assumed to be UDP packets (but perhaps implemented in another way, such as a multicast system) by which the real-time updates to the mapping data for any one IMAB are conveyed.

One or more identical US-IMAB streams are generated for each IMAB for which the RUAS which is authoritative. So each RUAS could be generating these streams for multiple IMABs. As described in a section below, these streams are replicated and delivered, with high reliability, to ITRDs and QSDs all over the Net - ideally within a second or so.



 TOC 

2.19.  US-Complete - Update Stream for the Complete Ivip system

This is the combined set of all US-IMAB streams which each ITRD or QSD needs. To what extent it is simply the sum of all US-IMAB packets simply replicated, or to what degree the first level of replicators compacts the data to reduce the number of packets, is yet to be determined. There are also problems to be solved when this US-Complete is missing packets.

Theoretically, all Replicators get two copies of all US-IMAB streams, for redundancy. Ideally, each ITRD and QSC will get two separate US-Complete streams from two separate Replicators in widely topologically distinct locations on the Net, to enhance robustness. This is a crude doubling of bandwidth, but it might be better than something more complex with lower bandwidth.



 TOC 

2.20.  Replicator

A system of Replicators form a redundant, reliable, high-speed distribution system for update streams. The Replicator system is only roughly described in this I-D. Its job is to get packets which together make up at least one US-Complete stream to every ITRD and QSC which needs it.

Replicators could be implemented in routers, but are probably best implemented in ordinary software on a Linux/BSD etc. server. They don't need hard drive storage and do no caching of data.

Replicators could be located within, or as stubs to, transit routers or border routers. Within large provider or AS-end-user networks, they would be servers or perhaps implemented in internal routers.

An ITRD or QSD could also operate as a Replicator.



 TOC 

2.21.  QSD - Query Server with full Database

Like ITRDs, QSDs get a full feed of updates (at least one copy of US-Complete) from one or more Replicators. Like ITRDs, when they boot, they download individual IMAB-DBD files for each IMAB in the Ivip system. I write more about this in a section below on ITRs. Once their slave copies of the complete set of IMAB-DBs is up-to-date and being continually updated, they are ready to respond to queries.

The query protocol needs to be defined, and is the same for queries from ITRCs, ITFHs and QSCs - Query Servers which Cache.

The QSD needs to keep a record of responses sent out, and cache times (which ideally might be a single fixed time, to make it easy to implement). It keeps a watch on incoming changes to the many IMAB-DBs, and if any change affects IMIPs which were covered by a response it sent out which could be cached by another device, it sends out a Notification to that device, with the new information.

A QSD could be integrated with a Replicator function, and perhaps an ITRD function - or for that matter an ETR function too.

QSDs have no routing functions, so it would be overkill to implement this in a router. They need a lot of memory, so the best way to implement a QSD is probably on an ordinary server with one or more gigabit Ethernet interfaces. No hard drive is required, except perhaps for logging purposes.



 TOC 

2.22.  QSC - Query Server with Cache

A QSC could be implemented in a router. It does not route packets, but its memory and computational requirements are likely to be modest compared to those of a QSD. There is no need for a full feed of US-Complete data. However, there must be one or more upstream QSDs - or perhaps QSCs with upstream QSDs.

The easiest way to implement this would be software on a modest server, which would only need a hard drive for logging purposes.

In addition to handling queries from cache or by passing the query to one or two or more upstream QSDs or QSCs, the QSC needs to keep a record of responses sent out to this queriers - which are ITRCs, ITFHs or other QSCs. When it receives a Notification from its upstream QSD/QSC, it needs to look at those records and decide which of its queriers to send the Notification to.

Small sites could use one or more QSCs for local ITRCs and ITFHs, relying on one or more external QSD to answer all queries. This saves bringing a full US-Complete feed into the site and it saves on the RAM needed for a full QSD.



 TOC 

2.23.  ITR - Ingress Tunnel Router

A general term for a router or server which accepts packets with DA = an IMIP and which encapsulates the packet, with the outer IP header having a DA of some BRIP address the end-user chose as the mapping for this IMIP. That address will presumably cause the packet to arrive at an ETR, which decapsulates it and forwards the packet to the Destination Node.

The ITR has a locally configured set of limits which prevent it from tunneling packets to certain ranges of addresses, including those defined for protecting critical infrastructure against Ivip malfunction, and including all IMAB addresses. This set of limits is downloaded regularly and securely, so that over time, these limits can be altered.



 TOC 

2.24.  ITRD - Ingress Tunnel Router with Database

An ITR with a full copy of all IMAB-DBs, updated in real time by the US-Complete it gets from one or ideally two Replicators. The updates alter the local copy of each IMAB-DB and cause a corresponding change in the FIB of the router, which finds and tunnels every incoming packet with an IMIP DA. (Unless the address in the database for that IMIP is zero or within a banned region, in which case the packet is dropped.)

ITRDs can be implemented in a suitable router with lots of RAM, CPU power and a very high capacity FIB, in terms of the ability to tunnel packets and in terms of how many rules can be applied, down to potentially millions of /32 (IPv4) or /64 (IPv6) prefixes.

I explore in a section below how an approximately 1 gigabit ITRD could be built using commonly available server hardware. For a well developed Ivip system, this will require quite a few gigabytes of RAM - since the best way to implement the database and FIB is as a series of arrays with 32 bits (128 bits for IPv6 - urrgh!) for each mapped address (or /64 for IPv6).

An ITRD might also implement the Replicator, QSD and/or ETR functions.



 TOC 

2.25.  ITRC - Ingress Tunnel Router with Cache

An ITR without a full copy of all the IMAB-DBs - and so not requiring a US-Complete stream from one or more Replicators.

The ITRC gains mapping information from a nearby QSD, perhaps by one or more intermediate QSCs. It may hold every packet it receives with an IMIP DA until it requests and receives mapping information. In this case, it handles every packet with DA within an IMAB - generally as quickly as a full ITRD.

Whenever an ITRC chooses to request mapping information from the one or more QSD/QSC systems it relies upon (two separate systems might be more robust, especially if the query and response is sent via UDP), its request specifies a single IP address, the DID of this packet, which it already knows is an IMIP address.

The response it receives will concern that DID address, and potentially one or more IMIP addresses above and below this address - all of which have the same mapping. So the response will consist of a starting address, a range, and a TELOC IP address which will become the DA for the encapsulated packet for any incoming packet with a DA within this range. There may also be an explicit caching time for this response, or perhaps a default, system-wide, constant caching time such as 600 seconds.

The ITRC uses this mapping information, updating its FIB accordingly, for the caching time. At the end of that time, it may choose to make another query - which it would ordinarily only do if it is still receiving packets within that range.

At any time during the caching period, if the QSD which answered the query (or provided an answer to a QSC which actually answered this ITRC's query) recognises a change in the relevant IMAB-DB which affects the range of addresses in the response this ITRC received to its query, then the QSD will send a Notification. The Notification may pass through multiple QSCs, but will reach this ITRC and any other ITRCs which received similar responses.

ITRCs do not need a massive FIB, but if they are a router, their FIB needs to be able to encapsulate packets and handle a substantial number of rules, depending on the volume and nature of the traffic. CPU involvement would be modest to substantial.

An ITRC could be implemented in a server with modest memory requirements. It requires only modest bandwidth (compared to a full US-Complete feed) for the queries, responses and Notifications with its one or more parent QSDs or QSCs.

An ITRC faces some choices regarding which packets to try to gain mapping information for. Firstly, it needs some way of identifying incoming packets as having a DA which matches one of the IMIPs or ranges of IMIPs which it already has mapping information for. Those packets should be encapsulated immediately according to that mapping information.

Secondly, the FIB needs a way of detecting which packets arrive with IMIP DAs, but which are not currently matched by one of the existing encapsulation rules. I guess the most advanced routers such as the CRS-1, M120 and MX960 have such flexible ASIC and RAM FIBs that with suitable firmware, they could do this sort of thing. I would be surprised if lesser routers could be programmed to do this sort of thing efficiently.

Also the router needs to reliably monitor which of its currently cached rules are still being used by packets. Furthermore, the router may need an efficient way of only requesting mapping information for packets whose DA appears more than once.

If the ITRC doesn't quickly (fraction of a second) gain the mapping information for every IMIP packet it receives, and/or if its RAM or FIB can't hold all these rules and mappings, then it has to decide what to do with packets which it cannot at present tunnel to the correct address.

One option is to drop the packets - but this is unlikely to be acceptable. Another is to let the packet be forwarded towards a peer router which also advertises the complete set of IMAB prefixes. If that peer is an ITRD, or this path leads to some ITRD in the core, then it is probably acceptable to let a small proportion of packets pass like this.

Alternatively, these untunneled packets, assuming the router can identify every one, could be forwarded or tunneled to a nearby ITRD. A bunch of ITRCs could therefore take most of the load, with the ITRD instantly tunneling a fraction of the network's total DA=IMIP packets.

An ITRC might also implement the QSC and/or an ETR function.



 TOC 

2.26.  ITFH - Ingress Tunneling Function in Host

A host which is not behind a NAT could have additional software in its TCP/IP stack to perform the ITRC functions described above. It needs a good link to a nearby QSD/QSC system - so this would not be suitable over a dialup modem or radio link.

Host software, CPU power and RAM is free, provided there is enough of it. This would greatly reduce the load on any ITRCs and perhaps ITRDs in the rest of the network. An ITFH function would be highly desirable in every web server in a hosting company.

As with ITRCs, ITFHs need to have some kind of backup ITRD to handle packets they can't tunnel. As with ITRCs, ideally the location of two or more nearby QSDs or QSCs should be auto-discovered. Likewise the location of two or more ITRDs if there is a way of explicitly tunneling packets to them when the ITFH doesn't have the mapping or FIB capacity to tunnel them itself.

The ITFH device doesn't need to be on a BRIP address (neither does an ITRD or ITRC, but I usually assume "routers" are on BRIP addresses), but it cannot be behind a NAT.

A host performing NAT functions for some hosts on a private network is a good place to implement ITFH, as long as this host is not behind NAT itself. The most common NAT situation is a DSL or cable modem (or an optical home/SOHO adaptor too). I have referred to performing Ingress Tunnelling functions in such a modem as ITFH, but I guess they are formally a router, not a host, so maybe it would be purely software-based ITRC function as a firmware upgrade.

ITRCs and ITFHs could easily be overwhelmed by a large number of different DA addresses inside the caching period, so they need to be able to drop old cached mapping data when their RAM or FIB can't handle it. They need to be in a network position where an upstream ITRD will always find their packets. In principle, with Ivip, this is always the case, depending on how congested the nearest "anycast core-ITR is".



 TOC 

2.27.  ETR - Egress Tunnel Router

An ETR is a router or a server which receives encapsulated packets on one of its one or more BRIP addresses, strips off the outer IP header, copying its hop-count to the internal packet, and then by some means ensures the resulting packet is delivered to the IRH/IRN (the receiving host/node with an Ivip-mapped address).

There needs to be some local network management system which can tell the IRH/IRN - or at least the end-user by some means, where the one or more usable ETRs are. This management system may also need to ensure the local routing system can deliver decapsulated packets with DA=DID to the IRH/IRN. The ETR is not necessarily the device to be responsible for this, because ETRs can die and there should be another available to select by the end-user changing the Ivip-mapping of their IMIP.

Ivip ETRs don't need any fancy functions, management or protocols - they just accept any IP-in-IP packet they get on one or more of their BRIP addresses, decapsulate it, and - if the DA matches an address the ETR and the local routing system is ready to handle - forward the packet to its destination host or link to the end-user's site.



 TOC 

2.28.  ETFH - Egress Tunnel Function in Host

I haven't given much thought to this. Maybe it would be useful for a host with a local care-off address to do its own ETR functions, rather t