Some ideas for modifying kudzu in CentOS 5.1 so it does not rewrite configuration details for Ethernet NICs, if the system boots up and finds eth0, eth1 etc each has either a different type of NIC or a physically different NIC with a different MAC address 

This is an extra page to go with another one in the parent directory, regarding basic set-up of CentOS 5.1 with software RAID 1.

I have not had time to do these modifications, so this page is a work in progress.  Please let me know if you implement any changes such as these.

Robin Whittle rw@firstpr.com.au 2008-05-17

../ Back to the parent directory concerning web-mail, how I set up software RAID-1 on CentOS 5.1, how I used this server for Postfix, Courier IMAPD, Courier Maildrop etc.


As far as I know, alder systems, such as Red Hat 7.2, did not have the fussy behaviour of later versions of kudzu.  With RH 7.2, I could have a disk image which would boot on any machine, and find the first two Ethernet cards, using them as previously configured for eth0 and eth1.  

With CentOS 5.1 (and probably many earlier RHEL and CentOS systems) if kudzu finds a different type of NIC in each "slot" (eth0 is what I call a "slot" and eth1 is another slot etc.) or if the MAC address is different (which it will be with a physically different NIC) then kudzu will copy the original /etc/sysconfig/networkscripts/ifcfg-eth0 (or whatever) file to a .bak file (overwriting any previous .bak file of the same name . . . .) and then write some default stuff into the new version of the ifcfg-eth0 file, including to use DHCP.  If your system works with the NIC doing DHCP, that is not a problem.  However, if this is not what you want, such as if this NIC is supposed to be on a LAN with a fixed address (including perhaps being a DHCP server etc.) then this function of kudzu will clobber your attempt to do any of the following:
  1. Run the same computer with a different NIC.
  2. Get this hard drive image (the complete bit-for-bit body of data, or the files or whatever) on the same disc or disks - RAID 1 - to boot on a different computer which does not have the same NICs as the original one. (Motherboard NICs will obviously be different.)
  3. As for 2, but by using a copy of the original drive's data, on another drive or drives, in another computer, for the purpose of recreating a server from such a backup.
You will find the NIC won't do what you want, and you will need to mess around rewriting the relevant ifcfg-ethx file manually.  If the affected NIC is the one you rely upon to access the machine remotely, then you will have to do it via some other means, such as plugging in a keyboard and screen and doing it manually, perhaps in Linux Rescue mode.

Introduction - to modifications planned but not yet done

This is a quick and dirty fix.  I don't have time to trawl through the largely uncommented code of kudzu, comment it, and understand it fully.

The overall plan is to get the source code, modify it, and write the resulting binary to be active at subsequent boot time, at /sbin/kudzu . I rename the old version and keep it there for Justin - Just In Case.



How I got the source and modify it 

The two kudzu packages installed on my system are:

kudzu.i386                               1.2.57.1.15-1.el5.cent installed
kudzu-devel.i386                         1.2.57.1.15-1.el5.cent installed

I want to recompile the kudzu executable with some modifications, so I get the source package, from a mirror such as http://mirror.pacific.net.au/linux/CentOS/5.1/os/SRPMS/ :

kudzu-1.2.57.1.15-1.el5.centos.src.rpm  26-Nov-2007 08:30  221
To install this:

rpm -i kudzu-1.2.57.1.15-1.el5.centos.src.rpm

should work, but there were warnings about there being no user "mockbuild", and I found the package wasn't in fact installed, at least according to rpm when I tried to list where the files were:.

rpm -ql kudzu-1.2.57.1.15-1.el5.centos.src.rpm

it complained: "package kudzu-1.2.57.1.15-1.el5.centos.src.rpm is not installed".  Yet when I tried to install it with yum:

yum install kudzu-1.2.57.1.15-1.el5.centos.src

this came back with:"Parsing package install arguments Nothing to do".  So is it installed?  "yum list" indicates not . . and maybe yum isn't normally used for installing source rpms.
   
Compilation is painless:

make

A new kudzu binary appears in the main directory of 300,620 bytes.  I figure this had debugging info, so I strip it:

strip kudzu

and now it has 154,204 bytes.  The currently used binary is at /sbin/kudzu (144,968 bytes).   I don't investigate why these executables are different. They don't report a version number when run with the "-?" option.  For now I assume I can use this executable instead of the original one for booting.  

You might want to try this version produced by the unmodified source code to prove it boots your machine OK.

In /sbin, I rename the original kudzu binary to be kudzu-orig.  The modified binary is copied there as kudzu-mod.  Then a symlink from /sbin/kudzu points to one or the other.

How does the program work?


Now I want to find the code which rewrites the Ethernet config files when a new NIC is found.

It is in hwconf.c, function configure(), around line 553.  Below, I have edited out some IBM s390 stuff, to reduce clutter on this page.

    {
            FILE *f;
            char path1[256];
           
            snprintf(path, 256, "/etc/sysconfig/network-scripts/ifcfg-%s",dev->device);
            if (!access(path, F_OK)) {
                snprintf(path1, 256, "/etc/sysconfig/network-scripts/ifcfg-%s.bak",
                     dev->device);
                rename(path, path1);
            }
            f = fopen(path,"w");
            if (!f) break;
            fprintf(f,"# %s\nDEVICE=%s\nONBOOT=yes\nBOOTPROTO=dhcp\n",dev->desc,dev->device);
            if (dev->classprivate)
                fprintf(f,"HWADDR=%s\n",(char *)dev->classprivate); 
            fclose(f);

            /* nasty hack - make sure device is right */
            rename_device_if_needed(dev);
        }
 
The text in red is what is written into the new ifcfg-ethx file.

The above code is run as part of configure(), only for devices of type NETWORK, and only if the device has a driver (otherwise, there is no driver to configure).  I think the call to rename_device_if_needed() is to cope with NICs being moved around in a machine, so they are in a different ethx position to what they were at the last boot time.  Then, I think, kudzu copies their old config files so they don't need to be configured.

How does this code get called?

In hwconf.c, function configMenu(), line 784.  This occurs only if this device remains on the list newDevices after the previous code, which removes new devices for various reasons.

Around line 731, the code:

                /* If the device only changed in the driver used, ignore it */
                tmpdev = newDevs;
                for ( ; tmpdev ; tmpdev = tmpdev->next) {
                    if (tmpdev->compareDevice(tmpdev,dev) == 2) {
                        oldDevs = listRemove(oldDevs,dev);
                        newDevs = listRemove(newDevs,tmpdev);
                        continue;
                    }
                }

uses a function compareDevice() and if the return value is 2, it removes this device from the new devices list.  

Code below this does some other work:

In the case of a NETWORK device, they will be removed from the newDevices list if they have no MAC address (here known as "hdwadd" or "dev->classprivate" - or if the device is already configured.  (This is my rough guess - I don't have time to reverse engineer this largely uncommented code with full precision.)

Then, the for loop:

    dev = newDevs;
    for ( ; dev ; dev = dev->next) {
        configure(dev);

calls configure() for all devices which remain on the new Devices list.


Let's look at compareDevice(), in kudzu.c, line 243.  It is called with parameters the new device (dev1) and the old device (dev2), I think, in line 734 of hwconf.c.

int compareDevice(struct device *dev1, struct device *dev2) {
    if (!dev1 || !dev2) return 1;
    if (dev1->type != dev2->type) return 1;
    if (dev1->bus != dev2->bus) return 1;
    
    if (dev1->device && dev2->device && strcmp(dev1->device,dev2->device)) {
      /* If a network device has the same hwaddr, don't worry about the
       * dev changing */
      if (dev1->type == CLASS_NETWORK && dev2->type == CLASS_NETWORK &&
          dev1->classprivate && dev2->classprivate &&
          !strcmp((char *)dev1->classprivate,(char *)dev2->classprivate))
          return 0;
      /* We now get actual ethernet device names. Don't flag that as a change. */
      if (strcmp(dev1->device,"eth") && strcmp(dev1->device,"tr") && strcmp(dev1->device,"fddi") &&
          strcmp(dev2->device,"eth") && strcmp(dev2->device,"tr") && strcmp(dev2->device,"fddi"))
          return 1;
    }
    /* Look - a special case!
     * If it's just the driver that changed, we might
     * want to act differently on upgrades.
     */
    if ((dev1->driver && dev2->driver) && strcmp(dev1->driver,dev2->driver)) return 2;
    /* If a network device changes hwaddr, flag it! */
    if (dev1->type == CLASS_NETWORK && dev2->type == CLASS_NETWORK &&
        dev1->classprivate && dev2->classprivate &&
        strcasecmp((char *)dev1->classprivate,(char *)dev2->classprivate))
        return 1;
    return 0;
}

This returns 1 if:
  1. Old or new does not exist.
  2. Old and new have different types (such as ethernet vs token ring).
  3. Old and new are on different busses.
Of the pairs of devices which don't match the above tests . . .

(Note, my modified logic will be inserted here!)

it returns 0 if:
  1. The new device has the same MAC address as the old one, but a different device name, such as eth0 vs. eth1.  This means a particular NIC has been moved, or at least the NICs in the machine have changed their position in the order in which they are discovered, so that for instance, one NIC which was previously eth1 is now eth0 or eth2 or whatever, due to some other NICs being added or moved, and/or this NIC being plugged into a different PCI slot.
Of the pairs of devices which don't match the above tests . . .

it returns 1 if:

      /* We now get actual ethernet device names. Don't flag that as a change. */
      if (strcmp(dev1->device,"eth") && strcmp(dev1->device,"tr") && strcmp(dev1->device,"fddi") &&
          strcmp(dev2->device,"eth") && strcmp(dev2->device,"tr") && strcmp(dev2->device,"fddi"))
          return 1;
  1. The above looks nutty to me.  It is only true if . . . I can't see how it could be true.
Of the pairs of devices which don't match the above tests . . .

it returns 2 if:

    /* Look - a special case!
     * If it's just the driver that changed, we might
     * want to act differently on upgrades.
     */
    if ((dev1->driver && dev2->driver) && strcmp(dev1->driver,dev2->driver)) return 2;
  1. Only the driver has changed.  But with NICs a different type of card has a different driver, and also a different MAC address.

Of the pairs of devices which don't match the above tests . . .

it returns 1 if:
  1. The old and new device have the same name, eg. both are "eth1", but have different MAC addresses.
returns 0 for all the rest, which for NICs, means those old and new devices which:

Have the same name (such as eth1),
AND
have the same driver (meaning they are roughly the same type of NIC)
AND
have different MAC addresses.


So, in summary:

0
The NIC was in the machine previously, either in the same ethx number or in a different one.

1
Old or new doesn't exist, have different busses or are not both ethernet, both token ring, both fddi etc.
Also the weird 6 way && statement which makes no sense to me

2
A different driver for this device name, but not if this NIC was one which was previously in the machine (as detected by its MAC address).



The value 2 is looked for in:

 hwconf.c configMenu() line 734:

Remove the devices from the old and new lists, which I think means the device will not be configured.  In the case of a NIC, it means it will not have its ifcfg-ethx file overwritten with a new default DHCP thing.


Value 0 is looked for in:

kudzu.c listRemove() line 276:

Compares the device (2nd arg) with a bunch of devices in the device list and removes the device if the return value is 0.
 

kudzu.c listCompare() line 1213:

This function is something to do with converting an array into a list.
 


The behavior I want is that whatever device is detected as eth0 will continue to use the ifcfg-eth0 stuff, even if it has a different driver and a different MAC address.  I will manually reconfigure things if I want to, rather than have kudzu mess with things.  Maybe the modified version of kudzu will be no good for initial installation, but that is fine.

In order to do this, I would need to disable the logic which tries to detect a NIC previously known at ethx now being found in another slot at ethy.  This would include getting rid of the rename_device_if_needed(dev) function.

Ideally I would make this new mode of operation selectable via a configuration item in /etc/sysconfig/kudzu .

Then I need to make sure it finds any NICs which have changed in any way from the previous arrangement for which a NIC existed in that slot, for which the following conditions apply:
  1. The NIC in ethx may or may not have been in any other eth slot last time.  
  2. The NIC may or may not have a different driver type than the one previously in this slot.
  3. The NIC has a different MAC address than the one previously in this slot.
If there is a NIC in a slot which previously had no NIC, then it should receive the current treatment - a new config file so the NIC is enabled at boot as a DHCP client.

So some new logic will be required, but the main work is to create a new config file which includes the old instructions, such as the items marked in green below (ignoring IPv6 for now):

 # Intel Corporation 82557/8/9 Ethernet Pro 100
DEVICE=eth0
BOOTPROTO=static
BROADCAST=10.0.0.255
HWADDR=00:A0:C9:B2:18:4A
IPADDR=10.0.0.1
IPV6ADDR=
IPV6PREFIX=
IPV6_AUTOCONF=yes
NETMASK=255.255.255.0
NETWORK=10.0.0.0
ONBOOT=yes

This can probably be achieved by modifying the code which currently writes a new ifcfg-ethx file: in hwconf.c, function configure():

            fprintf(f,"# %s\nDEVICE=%s\nONBOOT=yes\nBOOTPROTO=dhcp\n",dev->desc,dev->device);

so that it does not write this stuff, but instead creates the new file in the following manner:

Writes the first line to be a comment line with the driver type, followed by the device name:

fprintf(f,"# %s\nDEVICE=%s\n",dev->desc,dev->device);

eg.

# Intel Corporation 82557/8/9 Ethernet Pro 100
DEVICE=eth0

Then copies the entire old config file for this device, with two kinds of lines modified to have a "# " at the start: the lines for DEVICE and for HWADDR.  This way, the old driver name will appear as a comment, and so will the old ethx slot number (which should be the same as the DEVICE=ethx just written).  Also, the old MAC address will appear as a comment.

So there needs to be a bit of code written, but I don't have time to to a proper job of this now, with it being enabled by a /etc/sysconfig/kudzu option etc.

.