Some actual packets - DF=0 and
DF=1
Robin
Whittle rw@firstpr.com.au 2008-08-13
Back
to the point in the parent page from which this page is a sidebar.
Contents
>> Most hosts send packets with a non-zero
Identifier, including those with DF=0.>> Solved: This was illusory - the
TCP/IP stack was sending larger than MTU packets to the Ethernet
driver, because the Ethernet chip (Broadcom BCM5751) can break the
oversize TCP packet into smaller separate TCP packets, using "Large
Send Offload" AKA
TSO See my message to the RRG list: http://psg.com/lists/rrg/2008/msg02175.html. My
server
regularly sends out TCP HTTP
packets to web clients way longer than 1500 bytes - up to 8.8k bytes or
so, but I can't figure out how it can do this, since the MSS values at
the start of the TCP connection were both less than 1500 bytes.>> Google servers have an MSS of 1430 and
(assuming the client's MSS is this or higher) sends only DF=0 packets,
launching straight into maximum sized packets of 1470 bytes, with no
attempt to do DF=1 RFC 1191 style PMTUD.
Most hosts send packets with a non-zero Identifier,
including those with DF=0.
Let's
have a look at how the Identifier is typically set:
tcpdump
-x -n -i ppp0 | grep 0x0000: > tcpdump.txt
results
in a hex dump of the first 16 bytes of packets coming into my home
server's DSL line.
45 = Protocol and
header length - every packet is the same.
||
||** DiffServ & ECN
||||
|||| LLLL = Total Length
|||| | |
|||| | | IIII
= Identification
|||| | | | |
4500 007e 1c07 0000 6e11 c13d 4533 f116
Evil bit__/|\\\
DF _/| \\\
MF Fragment
Offset
Next
Protocol
TTL || Checksum
\\|| ////
4500 007e 1c07 0000 6e11
c13d 4533 f116
|||| ||||
Part of
Source Address
The first hex nybble of the Flags
(Evil bit, DF, MF and MSB of Fragment Offset is either:
0
means Don't Fragment = 0, More Fragments = 0
or
4
means Don't Fragment = 1, More Fragments = 0
In
several thousand packets I captured, none were fragments.
There
were plenty of long packets with DF=0, which surprised me.
More
than half the packets were like this:
4500
05ac 38c0 0000 3606 5838 d071 e501
which I think were
mainly from Google. These are 1452 bytes long (hex 05ac) and have
DF=0. This shows more about the packets:
tcpdump
-x -i ppp0
Here is an edited example:
22:42:04.727180 IP
cf-in-f99.google.com.http > my-host.4879:
0x0000: 4500 05ac
4e74 0000 3906
9717 4a7d 1363
0x0010: 9665 a27b 0050 130f f509 2f31 2b14 f681
0x0020: 5010 46e0 d37f 0000 f070 7460 0d8e f60f
0x0030: 0770 3c18 8fd4 f110 b85e be9e 9834 3557
0x0040: cea3 b048 7db4 9d11 b5fd cef2 82e4 46e0
This is
quite interesting in itself. It seems Google figures it is OK to
send out packets of 1452 bytes, expecting the network to fragment them
if there are any MTU problems.
The
1452 bytes probably arises from my
ifcg-ppp0 file setting CLAMPMSS=
1412.
The 1452 byte packet length is the MSS 1412 plus 20 bytes TCP
header plus 20 bytes IP header.
See below where a Google server
sends out
1470 byte packets
with DF=0. Presumably they wouldn't do this if there were, in
general, any MTU problems. I guess it saves their servers having
to keep a record of recently sent bytes so they can be resent if there
is a PTB message.
Solved: This was illusory - the
TCP/IP stack was sending larger than MTU packets to the Ethernet
driver, because the Ethernet chip (Broadcom BCM5751) can break the
oversize TCP packet into smaller separate TCP packets, using "Large
Send Offload" AKA TSO See my message to the RRG list: http://psg.com/lists/rrg/2008/msg02175.html
My
server regularly sends out TCP HTTP packets to web clients way
longer than 1500 bytes - up to 8.8 bytes or so, but I can't figure out
how it can do this, since the MSS values at the start of the TCP
connection were both less than 1500 bytes.
See the false-alarm.html page in this
directory for the stuff I wrote about this.
Google servers
have an MSS of 1430 and
(assuming the client's MSS is
this or higher) sends only DF=0 packets, launching straight into
maximum sized packets of 1470 bytes, with no attempt to do DF=1 RFC
1191 style PMTUD.
What does Google
do with a "browser"
which can handle jumboframes? I found it hard to get Google to
send anything to my Texas server, from an image search URL - maybe
because wget
doesn't look like a browser to it. However, I was able to get
large images from Google's press center which was a simple HTTP image
download. I used a www.google.com IP address so I could access
the same machine from Australia without any funny DNS stuff pointing me
to an Australian server:
wget http://72.14.205.147/press/images/gallery/solarpanels1_lg.jpg
Here
the maximum size packets were DF=0 with a size of
1470 bytes (
05be):
4500
05be 0d1c 0000 3506 894e 480e
cd93
I would be surprised
if there was a 1500 byte MTU 100Mbps Ethernet link between my server
and Google, so my guess is that Google simply sends out DF=0 bytes with
a length of 1470, or less according to the TCP MSS reported by the
destination host.
Here the the lines from the first packets sent
by Google's server when it served the file:
4500 0034 0be1 0000
3506 9013 480e cd93
4500 05be
0be2 0000 3506 8a88 480e cd93
4500 05be
0be3 0000 3506 8a87 480e cd93
4500 05be
0be4 0000 3506 8a86 480e cd93
So Google's server goes straight
to 1470 bytes, all DF=0 - no trying DF=1 to test the Path MTU as a
well-behaved server should.
What TCP MSS did my server provide
when it opened the connection? 1460. That would allow for a 1500
byte packet size. The Google server responded with an MSS of
1430, which was used for
the session - hence the 1470 byte packets.
If there is a router
between the Google server and the client with an MTU of, for instance,
1400 bytes then the client should be configured to supply an MSS of
1360 bytes, which sets a maximum packet size of 1400 bytes for TCP.
In
the future, when more and more clients can be reached with jumboframes
(such as packets or 8k bytes) it will become increasingly tempting for
Google etc. not to limit their servers to 1470 byte packets.
Then, they will presumably need to do proper PMTUD, since there
will be situations where a host gives an 8k or so MSS, but that there a
possiblity that some router or data link en-route would have a 1500
MTU.
But whose problem would this be? Google's, or
the person whose host this is? It should be Google's problem, for
not using RFC 1191 PMTUD. However if Google and enough other
companies do this - just send packets as big as the client's MSS allows
- AND if there are problems for some clients, with the bottlenecks
being closer to the clients, then it will be the clients' ISPs who
probably cop the flak, since Google etc. in their server farms can say
it works fine for most people, apart from those at certain ISPs . . .
Here are
some lines of packets on my
home server The second column is the size, fourth column the
flags.
45 = Protocol and
header length - every packet is the same.
||
||**
DiffServ & ECN
||||
|||| LLLL = Total Length
|||| | |
||||
| | IIII
= Identification
|||| |
| | |
||||
| | | | 4 => DF=1
|||| | | | | |
4500
007e 1c07
0000 6e11 c13d 4533 f116
45c0 009a f4dd
0000 4001 159b 9665 a27b
4500 0083 870c 0000 6c11 af75 c9d9 152e
45c0
009f 4607 0000 4001 1baf 9665 a27b
4500
003c 6abf 4000 3811 d450 482e 8292
4500
00a0 0000 4000 4011 36ac 9665 a27b
4500
004a 0000 4000 4011 3b92 9665 a27b
4500
0084 0000 4000 3611 4558 86b2 3f7e
4500
003c 60cd 4000 3f06 a15e 9665 a27b
4500
0028 60cd 0000 3606 ea72 cb3f 3570
4500
0028 60ce 4000 3f06 a171 9665 a27b
4500
0028 3ad3 0000 3706 0f6d cb3f 3570
4500
0240 60cf 4000 3f06 9f58 9665 a27b
4500
0096 60d0 4000 3f06 a101 9665 a27b
4500
0028 db8a 0000 3606 6fb5 cb3f 3570
4500
0514 db8b 0000 3606 6ac8 cb3f 3570
4500
0028 60d1 4000 3f06 a16e 9665 a27b
4500
0514 db8c 0000 3606 6ac7 cb3f 3570
4500
0028 60d2 4000 3f06 a16d 9665 a27b
4500
0514 db8d 0000 3606 6ac6 cb3f 3570
4500
0028 60d3 4000 3f06 a16c 9665 a27b
4500
0514 db8e 0000 3606 6ac5 cb3f 3570
4500
0028 60d4 4000 3f06 a16b 9665 a27b
4500
0322 db8f 0000 3606 6cb6 cb3f 3570
4500
0028 60d5 4000 3f06 a16a 9665 a27b
4500
0514 db90 0000 3606 6ac3 cb3f 3570
4500
0028 60d6 4000 3f06 a169 9665 a27b
4500
0514 db91 0000 3606 6ac2 cb3f 3570
4500
0028 60d7 4000 3f06 a168 9665 a27b
4500
0514 db92 0000 3606 6ac1 cb3f 3570
4500
0028 60d8 4000 3f06 a167 9665 a27b
4500
033a db93 0000 3606 6c9a cb3f 3570
4500
0028 60d9 4000 3f06 a166 9665 a27b
4500
0514 db94 0000 3606 6abf cb3f 3570
4500
0028 60da 4000 3f06 a165 9665 a27b
4500
0514 db95 0000 3606 6abe cb3f 3570
4500
0028 60db 4000 3f06 a164 9665 a27b
4500
0514 db96 0000 3606 6abd cb3f 3570
4500
0028 60dc 4000 3f06 a163 9665 a27b
4500
0514 db97 0000 3606 6abc cb3f 3570
4500
0028 60dd 4000 3f06 a162 9665 a27b
4500
0514 db98 0000 3606 6abb cb3f 3570
4500
0028 60de 4000 3f06 a161 9665 a27b
4500
0514 db99 0000 3606 6aba cb3f 3570
4500
0028 60df 4000 3f06 a160 9665 a27b
4500
0514 db9a 0000 3606 6ab9 cb3f 3570
4500
0028 60e0 4000 3f06 a15f 9665 a27b
4500
0514 db9b 0000 3606 6ab8 cb3f 3570
4500
0028 60e1 4000 3f06 a15e 9665 a27b
4500
0514 db9c 0000 3606 6ab7 cb3f 3570
4500
0028 60e2 4000 3f06 a15d 9665 a27b
4500
0514 db9d 0000 3606 6ab6 cb3f 3570
4500
0028 60e3 4000 3f06 a15c 9665 a27b
4500
0514 db9e 0000 3606 6ab5 cb3f 3570
4500
0028 60e4 4000 3f06 a15b 9665 a27b
4500
0410 db9f 0000 3606 6bb8 cb3f 3570
4500
0028 60e5 4000 3f06 a15a 9665 a27b
4500
0514 dba0 0000 3606 6ab3 cb3f 3570
4500
0028 60e6 4000 3f06 a159 9665 a27b
4500
0514 dba1 0000 3606 6ab2 cb3f 3570
4500
0028 60e7 4000 3f06 a158 9665 a27b
4500
0514 dba2 0000 3606 6ab1 cb3f 3570
4500
0514 dba3 0000 3606 6ab0 cb3f 3570
4500
0028 60e8 4000 3f06 a157 9665 a27b
4500
0514 dba4 0000 3606 6aaf cb3f 3570
4500
0514 dba5 0000 3606 6aae cb3f 3570