Netconsole is a powerful Linux kernel debugging tool. The
dmesg output from a machine under test is transferred over an ethernet link (via
UDP packets) to another machine. That means that you can see the debugging
messages from the test machine on the screen of another machine. Netconsole
isn't good for debugging early kernel panics, but it is very useful if your new
kernel driver hangs your system.
I used it to debug an oops in the xHCI driver that was caused by a NULL pointer access in a kernel linked list -- I should have used list_empty(). It took four hours to get netconsole working, even with three people who were clueful about Linux. (A big thank you goes out to Jamey Sharp and Josh Triplett for their help with this.)
At the time, there was no good tutorial that talked about all the basics and gotchas, so I decided to create one. This tutorial walks you through configuring both machines to be on the same network subnet, configuring the target machine to listen to UDP packets from the source, and configuring the source to send the kernel debugging messages over UDP.
UPDATE: My latest scripts for setting up Netconsole are here.
Prework
First, you need to have some tools installed. You'll need netcat, ping, and (optionally) wireshark. You'll also need to have netconsole compiled as a module on the source box. Netconsole has to be a module so you can load it after you get the system set up.
Make sure the ethernet driver for both machines supports netpoll. I had an out-of-tree ethernet driver for my eeepc 1000, and it took us a good hour to figure out why we couldn't see the UDP packets from the test box. Also make sure that networkmanager isn't running on either system. Networkmanager detects the ethernet link between the two computers and then tries to do DHCP. This is not what you want, so make sure to kill networkmanager on both boxes.
Configuring Netconsole
In this section, I'll refer to the computer under test that is generating the dmesg output as the "source" machine. The computer that receives the debugging messages is called the "target" machine.
Configure the source machine to answer to the IP address of 10.0.0.2:
# ip addr add 10.0.0.2/8 dev eth0The ethernet device that follows the
devlabel may be different on your system. Use/sbin/ifconfigto figure out what ethernet devices are available on your system.The /8 is a bit of magic to me. The ip manpage says "The ADDRESS may be followed by a slash and a decimal number which encodes the network prefix length." Basically, I think that creates a rule for how many computers can be on this subnet (10.0.0.x). If you don't include the /8, the second machine won't be able to get on the network.
Configure the target machine to answer to the IP address of 10.0.0.1:
# ip addr add 10.0.0.1/8 dev eth0Verify that the two computers can talk to each other with ping. On the source, type:
$ ping 10.0.0.1You should see no dropped packets. Double check that the target works too:
$ ping 10.0.0.2If you have issues with either step, something is wrong with the network configuration. Wireshark is a helpful tool to debug this. Wireshark can show you all the packets flowing across the network (since it puts the NIC into promiscuous mode).
Use netcat to tell the target machine to listen on port 6666:
$ nc -u -l -p 66666666 is the default port that netconsole will send UDP packets to. You might want to redirect this output into a file, and run
tail -f <file>in another window. If you redirect the output, you won't lose data when your screen history buffer fills.Start netconsole on the source machine:
# modprobe netconsole=@/eth0,@10.0.0.1/The netconsole module takes an argument of the form
[source-port]@[source-ip]/[dev],[target-port]@<target-ip>/[target-mac address]Here we're telling netconsole to send messages out the eth0 device, to the IP address 10.0.0.1.
At this point, you should be able to see output from netcat. If you don't, use wireshark to debug the system.
Happy hacking!
| link | 9 comment(s)
Posted by Viral at Tue Jun 9 02:27:18 2009
I didnt understand it fully that how it is different than you boot your Linux in runlevel3 and get all printks/logs on screen of "Source" machine itself.
How netconsole helped you here? I can see lots of advantages of debugging over serial or ethernet ONLY when you are using kgdb which gives you control over "debug" ("source" in your terminology) machine even after machine is hanged.
Posted by whocares at Tue Jun 9 20:14:46 2009
This article is really great, it has helped me solve some driver problems I couldn't figure out.
You said "The /8 is a bit of magic to me."...
Its the subnet mask.
/8 means 255.0.0.0
/16 255.255.0.0
etc.
If you omit this it will be interpreted as an ip address instead of a range.
you could also have written '10.0.0.2/255.0.0.0' as it's the same
That is why it won't work without /8 (or /16, /24).
Posted by Macka at Thu Jun 11 17:17:42 2009
@Viral -- If there are a lot of msgs and you're just watching on the screen then when it scrolls off the top you've lost it. If you run nc in a script(1) session then you can catch everything in the typescript logfile and pick over it at your leisure.
Posted by amne at Fri Jun 12 02:33:25 2009
@Viral
Not sure but this could be useful in an initrd script. If you load this before the kernel starts booting or smtg like that then you can pretty much see all the printks.
Posted by remil at Fri Jun 12 05:31:44 2009
In fact the inline modprobe'ing of netconsole should have the following form:
sudo modprobe netconsole netconsole=@/eth0,@10.0.0.1/
modinfo gives some informations about the parameters format:
modinfo netconsole
<...>
parm: netconsole: netconsole=src-port]@[src-ip]/[dev],[tgt-port]@<tgt-ip>/[tgt-macaddr]
Posted by dacian at Fri Jun 12 08:29:26 2009
Hey this is the way to go!
great... all my respects from a developer!
Posted by Sarah Sharp at Mon Jul 27 19:00:10 2009
Stephen P. Schaefer provides an explanation of what the magic numbers in the ip addr command are:
About 10.0.0.1/8: the 8 is the number of bits from the (conceptual) left for the network number, with the remainder of the 32 bits available for hosts (although 0 and all ones are historically for broadcast - you probably know to avoid that for hosts). So 10.0.0.1/8 means your host numbers will be 10.x.y.z. 10.0.0.1/24 means your host numbers will be 10.0.0.x. You can theoretically use anything from 0 to 32, but /n with n < 8 or n == 32 gets weird, so I'd avoid those.
Posted by foo at Sat Aug 1 12:31:29 2009
"You might want to redirect this output into a file, and run tail -f <file> in another window."
just as:
nc -u -l -p 6666 | tee my.log
Posted by Ryan at Fri Oct 23 05:30:11 2009
Theoretically netconsole allows console messages to be sent to a remote endpoint earlier in the boot process, as well as after rsyslog / syslog have shutdown.

