Узбекистан, Бухара, Бухарский институт высоких технологий, 2013 |
Network debugging
traceroute
traceroute sends UDP packets to the destination, but it modifies the time-to-live field in the IP header (see page 280) so that, initially at any rate, they don't get there. As we saw there, the time-to-live field specifies the number of hops that a packet can go before it is discarded. When it is, the system that discards it should send back an ICMP destination unreachable message. traceroute uses this feature and sends out packets with time-to-live set first to one, then to two, and so on. It prints the IP address of the system that sends the ''destination unreachable'' message and the time it took, thus giving something like a two-dimensional ping. Here's an example to http://hub.FreeBSD.org:
$ traceroute hub.freebsd.org traceroute to hub.freebsd.org (204.216.27.18), 30 hops max, 40 byte packets 1 gw(223.147.37.5) 1.138 ms 0.811 ms 0.800 ms 2 free-gw.example.net (139.130.136.129) 131.913 ms 122.231 ms 134.694 ms 3 Ethernet1-0.way1.Adelaide.example.net (139.130.237.65) 118.229 ms 120.040 ms 118.723 ms 4 Fddi0-0.way-core1.Adelaide.example.net (139.130.237.226) 171.590 ms 117.911 ms 123.513 ms 5 Serial5-0.lon-core1.Melbourne.example.net (139.130.239.21) 129.267 ms 226.927 ms 125.547 ms 6 Fddi0-0.lon5.Melbourne.example.net (139.130.239.231) 144.372 ms 133.998 ms 136.699 ms 7 borderx2-hssi3-0.Bloomington.mci.net (204.70.208.121) 962.258 ms 482.393 ms 754.989 ms 8 core2-fddi-1.Bloomington.mci.net (204.70.208.65) 821.636 ms * 701.920 ms 9 bordercore3-loopback.SanFrancisco.mci.net (166.48.16.1) 424.254 ms 884.033 ms 645.302 ms 10 pb-nap.crl.net (198.32.128.20) 435.907 ms 438.933 ms 451.173 ms 11 E0-CRL-SFO-02-E0X0.US.CEL.NET (165.113.55.2) 440.425 ms 430.049 ms 447.340 ms 12 T1-CDROM-00-EX.US.CRL.NET (165.113.118.2) 553.624 ms 460.116 ms * 13 hub.FreeBSD.ORG (204.216.27.18) 642.032 ms 463.661 ms 432.976 ms
By default, traceroute tries each hop three times and prints out the times as they happen, so if the reponse time is more than about 300 ms, you'll notice it as it happens. If there is no reply after a timeout period (default 5 seconds), traceroute prints an asterisk (*). You'll also occasionally notice a significant delay at the beginning of a line, although the response time seems reasonable. In this case, the delay is probably caused by a DNS reverse lookup for the name of the system. If this becomes a problem (maybe because the global DNS servers aren't reachable), you can turn off DNS reverse lookup using the -n fag.
If you look more carefully at the times in the example above, you'll see three groups of times:
- The times to gw are round 1 ms. This is typical of an Ethernet network.
- The times for hops 2 to 6 are in the order of 100 to 150 ms. This indicates that the link between http://gw.example.org and http://free-gw.example.net is running PPP over a telephone line. The delay between http://free-gw.example.net and http://Fddi0-0.lon5.Mel-bourne.example.net is negligible compared to the delay across the PPP link, so you don't see much difference.
- The times from borderx2-hssi3-0.Bloomington.mci.net to hub.FreeBSD.ORG are significantly higher, between 400 and 1000 ms. We also note a couple of dropped packets. This indicates that the line between Fddi0-0.lon5.Melbourne.example.net and borderx2-hssi3-0.Bloomington.mci.net is overloaded. The length of the link (about 13,000 km) also plays a role: that's a total distance of 26,000 km, which take about 85 ms to transfer. If this were a satellite connection, things would be much slower: the total distance from ground station to satellite and back to the ground is 72,000 km, which takes a total of 240 ms to propagate.
Back to our problem. If we see something like the output in the previous example, we know that there's no reason to call up the people at http://example.net: it's not their problem. This might just be overloading on the global Internet. On the other hand, what about this?
$ traceroute hub.freebsd.org traceroute to hub.freebsd.org (204.216.27.18), 30 hops max, 40 byte packets 1 gw(223.147.37.5) 1.138 ms 0.811 ms 0.800 ms 2 * * * 3 * * * ^C
You've fixed your routing problems, but you still can't get data off the system. There are a number of possibilities here:
- The link to the next system may be down. The solution's obvious: bring it up and try again.
-
gw may not be configured as a gateway. You can check this with:
$ sysctl net.inet.ip.forwarding net.inet.ip.forwarding: 1
For a router, this value should be 1. If it's 0, change it with:
# sysctl -w net.inet.ip.forwarding=1 net.inet.ip.forwarding: 0 -> 1
See page 313 for further details, including how to ensure that this sysctl is set correctly when the system starts.
- You may be trying to use a non-routable IP address such as those in the range 192.168.x.x. You can't do that. If you don't have enough globally visible IP address, you'll need to run some kind of aliasing package, such as NAT. See "Firewalls, IP aliasing and proxies" , page 393, for further details.
- Maybe there is something wrong with routing to your network. This is a difficult one to check, but in the case of the reference network, one possibility is to repeat the traceroute from the machine gw: gw's external address on the tun0 interface is 139.130.136.133, which is on the ISP's network. As a result, they are not affected by a routing problem for network 223.147.37.x. If this proves to be the case, contact your ISP to solve it.
- Maybe there is something wrong with the other end; if everything else fails, you may have to call the admins at http://example.net even if you have no hard evidence that it's their problem.
But maybe the data gets one hop further:
$ traceroute hub.freebsd.org traceroute to hub.freebsd.org (204.216.27.18), 30 hops max, 40 byte packets 1 gw (223.147.37.5) 1.138 ms 0.811 ms 0.800 ms 2 free-gw.example.net (139.130.136.129) 131.913 ms 122.231 ms 134.694 ms 3 * * * 4 * * *
In this case, there is almost certainly a problem at http://example.net. This would be the correct time to use the telephone.