<?xml version="1.0" encoding="utf-8"?>
<!-- name="generator" content="pyblosxom/1.4.3 01/10/2008" -->
<!DOCTYPE rss PUBLIC "-//Netscape Communications//DTD RSS 0.91//EN" "http://my.netscape.com/publish/formats/rss-0.91.dtd">

<rss version="0.91">
<channel>
<title>The Geekess   </title>
<link>http://sarah.thesharps.us</link>
<description>Linux, bicycling, open source, gardening, amateur rockets, and other seemingly unrelated hobbies.</description>
<language>en</language>
<item>
  <title>Debugging with printks over Netconsole</title>
  <link>http://sarah.thesharps.us/2009-02-22-09-00.html</link>
  <description><![CDATA[
<p><a href="http://flickr.com/photos/saschaaa/152502539/"><img
src="http://lh6.ggpht.com/_KnE2M8e3X8Q/SaGKChDWvUI/AAAAAAAABws/fqeImMeuZvg/s200/flickr-saschaa-ethernet-cable-152502539_c4cb9121eb_m.jpg"
alt="Image Copyright saschaaa -- http://flickr.com/photos/saschaaa/152502539/"
align="left" /></a> Netconsole is a powerful Linux kernel debugging tool.  The
dmesg output from a machine under test is transferred over an ethernet link (via
UDP packets) to another machine.  That means that you can see the debugging
messages from the test machine on the screen of another machine.  Netconsole
isn't good for debugging early kernel panics, but it is very useful if your new
kernel driver hangs your system.</p>

<p>I used it to debug an oops in the xHCI driver that was caused by a NULL pointer
access in a kernel linked list -- I should have used list_empty().  It took four
hours to get netconsole working, even with three people who were clueful about
Linux.  (A big thank you goes out to Jamey Sharp and Josh Triplett for their
help with this.)</p>

<p>At the time, there was no good tutorial that talked about all the basics and
gotchas, so I decided to create one.  This tutorial walks you through
configuring both machines to be on the same network subnet, configuring the
target machine to listen to UDP packets from the source, and configuring the
source to send the kernel debugging messages over UDP.</p>

<p>UPDATE: My latest scripts for setting up Netconsole are <a
href="http://sarah.thesharps.us/2010-03-26-09-41">here.</a></p>

<p></p>

<h1>Prework</h1>

<p>First, you need to have some tools installed.  You'll need netcat, ping, and
(optionally) wireshark.  You'll also need to have netconsole compiled as a
module on the source box.  Netconsole has to be a module so you can load it after
you get the system set up.</p>

<p>Make sure the ethernet driver for both machines supports netpoll.  I had an
out-of-tree ethernet driver for my eeepc 1000, and it took us a good hour to
figure out why we couldn't see the UDP packets from the test box.  Also make
sure that networkmanager isn't running on either system.  Networkmanager detects
the ethernet link between the two computers and then tries to do DHCP.  This is
not what you want, so make sure to kill networkmanager on both boxes.</p>

<h1>Configuring Netconsole</h1>

<p>In this section, I'll refer to the computer under test that is generating the
dmesg output as the "source" machine.  The computer that receives the debugging
messages is called the "target" machine.</p>

<ol>
<li><p>Configure the source machine to answer to the IP address of 10.0.0.2:</p>

<p><code># ip addr add 10.0.0.2/8 dev eth0</code></p>

<p>The ethernet device that follows the <code>dev</code> label may be different on your
system.  Use <code>/sbin/ifconfig</code> to figure out what ethernet devices are available
on your system.</p>

<p>The /8 is a bit of magic to me.  The ip manpage says "The ADDRESS may be
followed by a slash and a decimal number which encodes the network prefix
length."  Basically, I think that creates a rule for how many computers can be
on this subnet (10.0.0.x).  If you don't include the /8, the second machine
won't be able to get on the network.</p></li>
<li><p>Configure the target machine to answer to the IP address of 10.0.0.1:</p>

<p><code># ip addr add 10.0.0.1/8 dev eth0</code></p></li>
<li><p>Verify that the two computers can talk to each other with ping.  On the
source, type:</p>

<p><code>$ ping 10.0.0.1</code></p>

<p>You should see no dropped packets.  Double check that the target works too:</p>

<p><code>$ ping 10.0.0.2</code></p>

<p>If you have issues with either step, something is wrong with the network
configuration.  Wireshark is a helpful tool to debug this.  Wireshark can
show you all the packets flowing across the network (since it puts the NIC into
promiscuous mode).</p></li>
<li><p>Use netcat to tell the target machine to listen on port 6666:</p>

<p><code>$ nc -u -l -p 6666</code></p>

<p>6666 is the default port that netconsole will send UDP packets to.  You might
want to redirect this output into a file, and run <code>tail -f &lt;file&gt;</code> in another
window.  If you redirect the output, you won't lose data when your screen
history buffer fills.</p></li>
<li><p>Start netconsole on the source machine:</p>

<p><code># modprobe netconsole=@/eth0,@10.0.0.1/</code></p>

<p>The netconsole module takes an argument of the form</p>

<p><code>[source-port]@[source-ip]/[dev],[target-port]@&lt;target-ip&gt;/[target-mac
address]</code></p>

<p>Here we're telling netconsole to send messages out the eth0 device, to the IP
address 10.0.0.1.</p></li>
</ol>

<p>At this point, you should be able to see output from netcat.  If you don't, use
wireshark to debug the system.</p>

<p>Happy hacking!</p>

]]></description>
</item>

</channel>
</rss>

