Building AWS AMIs from scratch is the practice many security-focused organizations strictly follow.

I have been building custom AWS AMIs from scratch for my personal AWS environment for many years. Looking at my personal AMI collection, I have custom AMIs dating back to CentOS 5.

A few days ago I decided to do another refresh of my AWS environment and started with a new and improved build of CentOS 7.9. I rehashed my packer build merging in several improvements. OVA creation went flawless and AWS vmimport successfully finished. When I booted my new AMI to upgrade CentOS stock kernel to the latest kernel-lt 5.4, I noticed that DNS resolution is failing and that /etc/resolv.conf did not get updated correctly:

$ cat /etc/resolv.conf
# Generated by NetworkManager

I had to troubleshooting and fix DHCP on my new image.

First, I looked in /etc for scripts that could have written my current /etc/resolv.conf

$ grep -R "Generated by NetworkManager" /etc
/etc/resolv.conf:# Generated by NetworkManager

The search came up empty.

Next, I looked at what in /etc attempts to modify resolv.conf

$ grep -R "resolv.conf" /etc
Binary file /etc/selinux/targeted/policy/policy.31 matches
Binary file /etc/selinux/targeted/active/policy.kern matches
Binary file /etc/selinux/targeted/active/policy.linked matches
/etc/sysconfig/network-scripts/ifdown-post:
  if [ -f /etc/resolv.conf.save ]; then
/etc/sysconfig/network-scripts/ifdown-post:
  change_resolv_conf /etc/resolv.conf.save
/etc/sysconfig/network-scripts/ifdown-post:
  rm -f /etc/resolv.conf.save
/etc/sysconfig/network-scripts/ifup-post:
  if [ -n "${DOMAIN}" ] && ! grep -q "^search.*${DOMAIN}.*$" /etc/resolv.conf
/etc/sysconfig/network-scripts/ifup-post:
  ! tr --delete '\n' < /etc/resolv.conf | grep -E -q "${grep_regexp}"; then
/etc/sysconfig/network-scripts/ifup-post:
  # the 'search' field in /etc/resolv.conf:
/etc/sysconfig/network-scripts/ifup-post:
  # Keep the rest of the /etc/resolv.conf as it was:
/etc/sysconfig/network-scripts/ifup-post: 
  done < /etc/resolv.conf
/etc/sysconfig/network-scripts/ifup-post:
  # backup resolv.conf
/etc/sysconfig/network-scripts/ifup-post:
  cp -af /etc/resolv.conf /etc/resolv.conf.save
/etc/sysconfig/network-scripts/ifup-post:
  # Update the resolv.conf:
/etc/sysconfig/network-scripts/ifup-post:
  change_resolv_conf "${tmp_file}"
/etc/sysconfig/network-scripts/ifup-post:
  net_log $"/etc/resolv.conf was not updated: failed to create temporary file" 'err' 'ifup-post'
/etc/sysconfig/network-scripts/ifup-ppp:
  cp -f /etc/resolv.conf /etc/resolv.conf.save
/etc/sysconfig/network-scripts/network-functions:
  if ! grep search /etc/resolv.conf >/dev/null 2>&1; then
/etc/sysconfig/network-scripts/network-functions:
  cat /etc/resolv.conf >& $rsctmp
/etc/sysconfig/network-scripts/network-functions:
  change_resolv_conf $rsctmp
/etc/sysconfig/network-scripts/network-functions:
  # Invoke this when /etc/resolv.conf has changed:
/etc/sysconfig/network-scripts/network-functions:
  change_resolv_conf ()
/etc/sysconfig/network-scripts/network-functions:
  s=$(/bin/grep '^[\ \       ]*option' /etc/resolv.conf 2>/dev/null)
/etc/sysconfig/network-scripts/network-functions:
  [ -x /sbin/restorecon ] && /sbin/restorecon /etc/resolv.conf >/dev/null 2>&1 # reset the correct context
/etc/sysconfig/network-scripts/network-functions:
  /usr/bin/logger -p local7.notice -t "NET" -i "$0 : updated /etc/resolv.conf"
/etc/rwtab:
  files        /etc/resolv.conf
/etc/dnsmasq.conf:
  # somewhere other that /etc/resolv.conf
/etc/dnsmasq.conf:
  # /etc/resolv.conf
/etc/dnsmasq.conf:
  # If you don't want dnsmasq to read /etc/resolv.conf or any other
/etc/dnsmasq.conf:
  # If you don't want dnsmasq to poll /etc/resolv.conf or other resolv
/etc/aide.conf:
  /etc/resolv.conf$ DATAONLY
/etc/cloud/templates/resolv.conf.tmpl:
  # Your system has been configured with 'manage-resolv-conf' set to true.

Function change_resolv_conf in /etc/sysconfig/network-scripts/network-functions looks interesting. Change_resolv_conf runs a logger with a "NET" tag. Let's see if we have any log statements with the "NET" tag.

$ grep resolv.conf /var/log/messages
Feb 11 22:20:38 ip-172-30-43-251 NET[856]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Feb 11 22:20:43 ip-172-30-43-251 NET[1713]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Feb 13 19:14:25 ip-10-0-0-38 NET[827]: /usr/sbin/dhclient-script : updated /etc/resolv.conf

Change_resolv_conf does indeed run after a restart. Question is what is calling change_resolv_conf?

Let's see if we have dhclient

$ ps -ef | grep dhclient
root       834     1  0 19:14 ?        
  00:00:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease 
    -pf /var/run/dhclient-eth0.pid -H ip-10-0-0-38 eth0        


Dhclient is there indeed. Let's see what parts dhclient ships with.

$  rpm -ql dhclient
/etc/NetworkManager
/etc/NetworkManager/dispatcher.d
/etc/NetworkManager/dispatcher.d/11-dhclient
/etc/dhcp/dhclient-exit-hooks.d
/etc/dhcp/dhclient-exit-hooks.d/azure-cloud.sh
/etc/dhcp/dhclient.d
/usr/lib64/pm-utils/sleep.d/56dhclient
/usr/sbin/dhclient
/usr/sbin/dhclient-script
/usr/share/doc/dhclient-4.2.5
/usr/share/doc/dhclient-4.2.5/README.dhclient.d
/usr/share/doc/dhclient-4.2.5/dhclient.conf.example
/usr/share/doc/dhclient-4.2.5/dhclient6.conf.example
/usr/share/man/man5/dhclient.conf.5.gz
/usr/share/man/man5/dhclient.leases.5.gz
/usr/share/man/man8/dhclient-script.8.gz
/usr/share/man/man8/dhclient.8.gz
/var/lib/dhclient       

/usr/sbin/dhclient-script looks interesting. Let's see if /usr/sbin/dhclient-script calls change_resolv_conf.

$ grep change_resolv_conf /usr/sbin/dhclient-script
        change_resolv_conf ${rscf}
        change_resolv_conf ${rscf}

/usr/sbin/dhclient-script does call change_resolv_conf indeed.

Looks like dhclient does pull DHCP information from AWS and does write /etc/resolv.conf. Unfortunately, we end up with an incorrect /etc/resolv.conf somehow.

Next, let's find out if dhclient is getting correct information from AWS.

$ cat /var/lib/dhclient/dhclient--eth0.lease
lease {
  interface "eth0";
  fixed-address 10.0.0.38;
  option subnet-mask 255.255.255.0;
  option routers 10.0.0.1;
  option dhcp-lease-time 3600;
  option dhcp-message-type 5;
  option domain-name-servers 10.0.0.2;
  option dhcp-server-identifier 10.0.0.1;
  option interface-mtu 9001;
  option broadcast-address 10.0.0.255;
  option host-name "ip-10-0-0-38";
  option domain-name "us-west-2.compute.internal";
  renew 0 2021/02/14 06:45:04;
  rebind 0 2021/02/14 07:14:24;
  expire 0 2021/02/14 07:21:54;
}

Looking at option domain-name-servers and option domain-name "us-west-2.compute.internal"; dhclient does receive correct information.

Let's look at /usr/sbin/dhclient-script. make_resolv_conf() looks like the following

 94 make_resolv_conf() {
 95     [ "${PEERDNS}" = "no" ] && return
 96 
 97     if [ "${reason}" = "RENEW" ] &&
 98        [ "${new_domain_name}" = "${old_domain_name}" ] &&
 99        [ "${new_domain_name_servers}" = "${old_domain_name_servers}" ]; then     
100         return
101     fi
102 
103     if [ -n "${new_domain_name}" ] ||
104        [ -n "${new_domain_name_servers}" ] ||
105        [ -n "${new_domain_search}" ]; then
106         rscf="$(mktemp ${TMPDIR:-/tmp}/XXXXXX)"
107         [[ -z "${rscf}" ]] && return
108         echo "; generated by /usr/sbin/dhclient-script" > ${rscf}
109 
110         if [ -n "${SEARCH}" ]; then
111             search="${SEARCH}"
112         else

/etc/resolv.conf written by /usr/sbin/dhclient-script will contain "; generated by /usr/sbin/dhclient-script". Our /etc/resolv.conf has "# Generated by NetworkManager".

So far we have the following:

  1. dhclient is running on the system and is getting correct information from AWS
  2. dhclient-script is updating /etc/resolv.conf
  3. /etc/resolv.conf end up being incorrect

Looking again at /usr/sbin/dhclient-script, we see that it generates a temporary file with the content for /etc/resolv.conf after calling change_resolv_conf /usr/sbin/dhclient-script removes the temporary file.

186         change_resolv_conf ${rscf}
187         rm -f ${rscf}
188 
189         if [ -n "${search}" ]; then
190             eventually_add_hostnames_domain_to_search "${search}"
191         fi

Let's change /usr/sbin/dhclient-script temprorarily and comment out the "rm -f ${rscf}" lines. We can restart dhclient and see what /etc/resolv.conf is /usr/sbin/dhclient-script trying to run.

$ cat /tmp/vP2Gfy
; generated by /usr/sbin/dhclient-script
search us-west-2.compute.internal
nameserver 10.0.0.2

We see that /usr/sbin/dhclient-script is writing a correct /etc/resolv.conf.

Is /etc/resolv.conf getting overwritten after?

System log has the line form /usr/sbin/dhclient-script with the timestamp "Feb 13 23:00:36"

Feb 13 23:00:36 ip-10-0-0-38 NET[826]: /usr/sbin/dhclient-script : updated /etc/resolv.conf

The last modified timestamp on /etc/resolv.conf is "2021-02-13 23:00:37.745725664" - a second after /usr/sbin/dhclient-script update.

$ ls -la --time-style=full-iso /etc/resolv.conf
-rw-r--r-- 1 root root 64 2021-02-13 23:00:37.745725664 -0800 /etc/resolv.conf

Given the line "# Generated by NetworkManager" in /etc/resolv.conf. Perhaps, the NetworkManager is modifying /etc/resolv.conf after /usr/sbin/dhclient-script?
$ ps -ef | grep NetworkManager
root       618     1  0 23:00 ?        00:00:00 /usr/sbin/NetworkManager --no-daemon

$ nmcli device show
GENERAL.DEVICE:                         eth0
GENERAL.TYPE:                           ethernet
GENERAL.HWADDR:                         02:9D:07:29:33:AD
GENERAL.MTU:                            9001
GENERAL.STATE:                          10 (unmanaged)
GENERAL.CONNECTION:                     --
GENERAL.CON-PATH:                       --
WIRED-PROPERTIES.CARRIER:               on
IP4.ADDRESS[1]:                         10.0.0.38/24
IP4.GATEWAY:                            10.0.0.1
IP4.ROUTE[1]:                           dst = 10.0.0.0/24, nh = 0.0.0.0, mt = 0
IP4.ROUTE[2]:                           dst = 0.0.0.0/0, nh = 10.0.0.1, mt = 0
IP4.ROUTE[3]:                           dst = 169.254.0.0/16, nh = 0.0.0.0, mt = 1002
IP6.ADDRESS[1]:                         fe80::9d:7ff:fe29:33ad/64
IP6.GATEWAY:                            --
IP6.ROUTE[1]:                           dst = ff00::/8, nh = ::, mt = 256, table=255
IP6.ROUTE[2]:                           dst = fe80::/64, nh = ::, mt = 256

GENERAL.DEVICE:                         lo
GENERAL.TYPE:                           loopback
GENERAL.HWADDR:                         00:00:00:00:00:00
GENERAL.MTU:                            65536
GENERAL.STATE:                          10 (unmanaged)
GENERAL.CONNECTION:                     --
GENERAL.CON-PATH:                       --
IP4.ADDRESS[1]:                         127.0.0.1/8
IP4.GATEWAY:                            --
IP6.ADDRESS[1]:                         ::1/128
IP6.GATEWAY:                            --

NetworkManager is running on the system, but is not managing any devices. Do we need to run NetworkManager at all?

Let's disable NetworkManager and see if we get a correct /etc/resolv.conf.

$ systemctl disable NetworkManager
Removed symlink /etc/systemd/system/multi-user.target.wants/NetworkManager.service.
Removed symlink /etc/systemd/system/dbus-org.freedesktop.nm-dispatcher.service.
Removed symlink /etc/systemd/system/network-online.target.wants/NetworkManager-wait-online.service.

Success!

$ cat /etc/resolv.conf
; generated by /usr/sbin/dhclient-script
search us-west-2.compute.internal
nameserver 10.0.0.2


1 Response to "Troubleshooting DHCP in AWS"

  1. Anonymous says:

    Good post.

Leave a Reply

Blogger Templates for WP 2 Blogger sponsored by Cinta.
Content Copyright © 2010 - 2021 Artem Veremey, All Rights Reserved
preload preload preload