1 min read

Fun with Dell

I was recently faced with a strange issue involving CSI drivers that really illustrated just how many bugs are present in code for mission-critical software, despite originating from huge enterprises with literal armies of software developers.

For some background, we were connecting a new Dell Unity device into our Kubernetes cluster for use as a storage backend. Thankfully (or so we thought) Dell has a CSM Operator that makes this a pretty straightforward task. However, upon deployment of the operator we found that only 1 of our nodes was able to register with the Unity device. Even stranger, the IP being registered was some nonsense internal Kubernetes IP, not our actual node IP.

After some deep digging, it seemed like a problem with this function:

// GetHostIP - Utility method to extract Host IP
func GetHostIP() ([]string, error) {
	cmd := exec.Command("hostname", "-I")
	cmdOutput := &bytes.Buffer{}
	cmd.Stdout = cmdOutput
	err := cmd.Run()
	if err != nil {
		cmd = exec.Command("hostname", "-i")
		cmdOutput = &bytes.Buffer{}
		cmd.Stdout = cmdOutput
		err = cmd.Run()
		if err != nil {
			return nil, err
		}
	}
	output := string(cmdOutput.Bytes())
	ips := strings.Split(strings.TrimSpace(output), " ")

	hostname, err := os.Hostname()
	if err != nil {
		return nil, err
	}

	var lookupIps []string
	for _, ip := range ips {
		lookupResp, err := net.LookupAddr(ip)
		if err == nil && strings.Contains(lookupResp[0], hostname) {
			lookupIps = append(lookupIps, ip)
		}
	}
	if len(lookupIps) == 0 {
		lookupIps = append(lookupIps, ips[0])
	}
	return lookupIps, nil
}

Basically, this function runs

hostname -I

Then, it tries to filter out the junk to find the right IP for the node. The critical error comes in this line:

if err == nil && strings.Contains(lookupResp[0], hostname)

The problem is hostname is going to have the FQDN for the node, e.g. node1.a.subdomain.mycompany. However, lookupResp[0] is not going to include the domain, e.g. node1. Thus, using string.Contains will always fail 🙄.

Unfortunately, if the function can't find a match, it just picks the first result from the command, leaving us with our garbage IP.

To fix this, I created a new image for the CSI driver that replaced the hostname binary with a script that failed if called with the -I flag. Then, the code would proceed down the path to execute hostname -i which produced the correct IP.