Network debugging
The chances are quite good that you'll have some problems somewhere when you set up your network. FreeBSD gives you a large number of tools with which to find and solve the problem.
In this chapter, we'll consider a methodology of debugging network problems. In the process, we'll look at the programs that help debugging. It will help to have your finger in "Networks and the Internet" while reading this section.
How to approach network problems
Recall from "Networks and the Internet" that network software and hardware operate on at least four layers. If one layer doesn't work, the ones above won't either. When solving problems, it obviously makes sense to start at the bottom and work up.
Most people understand this up to a point. Nobody expects a PPP connection to the Internet to work if the modem can't dial the ISP. On the other hand, a large number of messages to the FreeBSD-questions mailing list show that many people seem to think that once this connection has been established, everything else will work automatically. If it doesn't, they're puzzled.
Unfortunately, the Net isn't that simple. In fact, it's too complicated to give a hard-and-fast methodology at all. Much network debugging can look more like magic than anything rational. Nevertheless, a surprising number of network problems can be solved by using the steps below. Even if they don't solve your problem, read through them.
They might give you some ideas about where to look.
Link layer problems
To test your link layer, start with ping. ping is a relatively simple program that sends an ICMP echo packet to a specific IP address and checks the reply. ICMP is the Internet Control Message Protocol, is used for error reporting and testing. See TCP/IP Illustrated, by Richard Stevens, for more information.
A typical ping output might look like:
$ ping bumble PING bumble.example.org (223.147.37.156): 56 data bytes 64 bytes from 223.147.37.156: icmp_seq=0 ttl=255 time=1.137 ms 64 bytes from 223.147.37.156: icmp_seq=1 ttl=255 time=0.640 ms 64 bytes from 223.147.37.156: icmp_seq=2 ttl=255 time=0.671 ms 64 bytes from 223.147.37.156: icmp_seq=3 ttl=255 time=0.612 ms ^C --- bumble.example.org ping statistics --- 4packets transmitted, 4 packets received, 0% packet loss round-trip min/avg/max/stddev = 0.612/0.765/1.137/0.216 ms
In this case, we are sending the messages to the system bumble.example.org. By default, ping sends messages of 56 bytes. With the IP header, this makes packets of 64 bytes. By default, ping continues until you stop it—notice the ^C indicating that this invocation was stopped by pressing Ctrl-C.
The information that ping gives you isn't much, but it's useful:
- It tells you how long it takes for each packet to get to its destination and back.
- It tells you how many packets didn't make it.
- It also prints a summary of packet statistics.
But what if this doesn't work? You enter your ping command, and all you get is:
$ ping wait PING wait.example.org (223.147.37.4): 56 data bytes ^C --- wait.example.org ping statistics --- 5 packets transmitted, 0 packets received, 100% packet loss
Obviously, something's wrong here. We'll look at it in more detail below. This is very different, however, from this situation:
$ ping presto ^C
In the second case, even after some time, nothing happened at all. ping didn't print the PING message, and when we hit Ctrl-C there was no further output. This is indicative of a name resolution problem: ping can't print the first line until it has found the IP address of the system, in other words, until it has performed a DNS lookup. If we wait long enough, it will time out, and we get the message ping: cannot resolve presto: Unknown host. If this happens, use the IP address, not the name. DNS is an application, so we won't try to debug it until we've debugged the link and network layers.
If things don't work out, there are two possibilities:
- If both systems are on the same network, it's a link layer problem. We'll look at that first.
- If the systems are on two different networks, it might be a network layer problem. That's more complicated: we don't know which network to look at. It could be either of the networks on which the systems are located, or it could also be a problem with one of the networks on the way. How do you find out where your packets get lost? First you check the link layer. If it checks out OK, and the problem still exists, continue with the network layer on page 406.
So what can cause link layer problems? There are a number of possibilities:
- One of the interfaces (source or destination) could be misconfigured. They should both have the same range of network addresses. For example, the following two interface configurations cannot talk to each other directly, even if they're on the same physical network:
machine 1 dcO: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet 223.147.37.81 netmask 0xffffff00 broadcast 223.147.37.255 machine 2 xl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 options=3<RXCSUM,TXCSUM> inet 192.168.27.1 netmask 0xffffff00 broadcast 192.168.27.255
- If you see something like this on an Ethernet interface, it's pretty clear that it has a cabling problem:
xl0: flags=8943<UP,BROADCAST,RUNNING,PROMISC,SIMPLEX,MULTICAST> mtu 1500 options=3<RXCSUM,TXCSUM> inet 192.168.27.1 netmask 0xffffff00 broadcast 192.168.27.255 media: Ethernet autoselect (none) status: no carrier
In this case, check the physical connections. If you're using UTP, check that you have the right kind of cable, normally a "straight-through" cable. If you accidentally use a crossover cable where you need a straight-through cable, or vice versa, you will not get any connection. Also, many hubs and switches have a "crossover" switch that achieves the same result.
- If you're on an RG-58 thin Ethernet, the most likely problem is a break in the cabling. You can check the static resistance between the central pin and the external part of the connector with a multimeter. It should be approximately . If it's , it indicates that there is a break in the cable, or that one of the terminators has been disconnected.
- If your interface is configured correctly, and you're using a 10 Mb/s card, check whether you are using the correct connection to the network. Some older Ethernet boards support multiple physical connections (for example, both BNC and UTP). For example, if your network runs on RG58 thin Ethernet, and your interface is set to AUI, you may still be able to send data on the RG58, but you won't be able to receive any.
The method of setting the connection depends on the board you are using. PCI boards are not normally a problem, because the driver can set the parameters directly, but ISA boards can drive you crazy. In the case of very old boards, such as the Western Digital 8003, you may need to set jumpers. In others, you may need to run the setup utility under DOS, and with others you can set it with the link flags to ifconfig. For example, on a 3Com 3c509 "combo" board, you can set the connection like this:
# ifconfig ep0 –link0 setBNC # ifconfig ep0 link0 -link1 set AUI # ifconfig ep0 link0 link1 set UTP
This example is correct for the ep driver, but not necessarily for other Ethernet boards: each board has its own flags. Read the man page for the board for the correct flags.
- If your interface looks OK, the next thing to do is to see whether you can send data to other machines on the network. If so, of course, you should continue your search on the machine that isn't responding. If none are working, you probably have a cabling problem.
On a wireless network, you need to check for a number of additional problems. ifconfig should show something like this:
wi0: flags=8843<UP,BROADCAST,RUNNING,SIMPLEX,MULTICAST> mtu 1500 inet6 fe80::202:2dff:fe04:93a%wi0 prefixlen 64 scopeid 0x3 inet 192.168.27.17 netmask 0xffffff00 broadcast 192.168.27.255 ether 00:02:2d:21:54:4c media: IEEE 802.11 Wireless Ethernet autoselect (DS/11Mbps) status: associated ssid "FreeBSD IBSS" 1:"" stationname "FreeBSD WaveLAN/IEEE node" channel 3 authmode OPEN powersavemode OFF powersavesleep 100 wepmode OFF weptxkey 1 wepkey 2:64-bit 0x123456789a 3:128-bit 0x123456789abcdef123456789ab
There are many things to check here:
- Do you have the same operating mode? This example shows a card operating in BSS or IBSS mode. By contrast, you might see this:
media: IEEE 802.11 Wireless Ethernet autoselect (DS/11Mbps <adhoc, flag0>)
In this case, the interface is operating in so-called "Lucent demo ad-hoc" mode, which is not the same thing as "ad-hoc" mode (which in turn is better called IBSS mode). IBSS mode ("ad-hoc") and BSS mode are compatible. IBSS mode and "Lucent demo ad-hoc" mode are not. See "Configuring the local network" , page 306 for further details.
- Is the status associated? The alternative is no carrier. Some cards, including this one, show no carrier when communicating with a station operating in IBSS mode, but they never show associated unless they are really associated.
- If the card is not associated, check the frequencies and the network name.
- Check the WEP (encryption) parameters to ensure that they match. Note that ifconfig does not display the WEP key unless
you are root.
Your card may show associated even if the WEP key doesn't match. In such a case, it knows about the network, but it can't communicate with it.
After checking all these things, you should have a connection. But you may not be home yet:
- If you have a connection, check if all packets got there. Lost packets could mean line quality problems. That's not very likely on an Ethernet, but it's very possible on a PPP or DSL link. There's an uncertainty about dropped packets: you might hit Ctrl-C after the last packet went out, but before it came back. If the line is very slow, you might lose multiple packets. Compare the sequence number of the last packet that returns with the total number returned. If it's one less, all the packets except the ones at the end made it.
- Check that each packet comes back only once. If not, there's definitely something wrong, or you have been pinging a broadcast address. That looks like this:
$ ping 223.147.37.255 PING 223.147.37.255 (223.147.37.255): 56 data bytes 64 bytes from 223.147.37.1: icmp_seq=0 ttl=255 time=0.428 ms 64 bytes from 223.147.37.88: icmp_seq=0 ttl=255 time=0.785 ms (DUP!) 64 bytes from 223.147.37.65: icmp_seq=0 ttl=64 time=1.818 ms (DUP!) 64 bytes from 223.147.37.1: icmp_seq=1 ttl=255 time=0.426 ms 64 bytes from 223.147.37.88: icmp_seq=1 ttl=255 time=0.442 ms (DUP!) 64 bytes from 223.147.37.65: icmp_seq=1 ttl=64 time=1.099 ms (DUP!) 64 bytes from 223.147.37.126: icmp_seq=1 ttl=255 time=45.781 ms (DUP!)
FreeBSD systems do not respond to broadcast pings, but most other systems do, so this effectively counts the number of non-BSD machines on a network.
- Check the times. A ping across an Ethernet should take between about 0.2 and 2 ms, a ping across a wireless connection should take between 2 and 12 ms, a ping across an ISDN connection should take about 30 ms, a ping across a 56 kb/s analogue connection should take about 100 ms, and a ping across a satellite connection should take about 250 ms in each direction. All of these times are for idle lines, and the time can go up to over 5 seconds for a slow line transferring large blocks of data across a serial line (for example, ftping a file). In this example, some line traffic delayed the response to individual pings.