Building AWS AMIs from scratch is the practice many security-focused organizations strictly follow.
I have been building custom AWS AMIs from scratch for my personal AWS environment for many years. Looking at my personal AMI collection, I have custom AMIs dating back to CentOS 5.
A few days ago I decided to do another refresh of my AWS environment and started with a new and improved build of CentOS 7.9. I rehashed my packer build merging in several improvements. OVA creation went flawless and AWS vmimport successfully finished. When I booted my new AMI to upgrade CentOS stock kernel to the latest kernel-lt 5.4, I noticed that DNS resolution is failing and that /etc/resolv.conf did not get updated correctly:
I had to troubleshooting and fix DHCP on my new image.
First, I looked in /etc for scripts that could have written my current /etc/resolv.conf
$ grep -R "Generated by NetworkManager" /etc /etc/resolv.conf:# Generated by NetworkManager
The search came up empty.
Next, I looked at what in /etc attempts to modify resolv.conf
$ grep -R "resolv.conf" /etc Binary file /etc/selinux/targeted/policy/policy.31 matches Binary file /etc/selinux/targeted/active/policy.kern matches Binary file /etc/selinux/targeted/active/policy.linked matches /etc/sysconfig/network-scripts/ifdown-post: if [ -f /etc/resolv.conf.save ]; then /etc/sysconfig/network-scripts/ifdown-post: change_resolv_conf /etc/resolv.conf.save /etc/sysconfig/network-scripts/ifdown-post: rm -f /etc/resolv.conf.save /etc/sysconfig/network-scripts/ifup-post: if [ -n "${DOMAIN}" ] && ! grep -q "^search.*${DOMAIN}.*$" /etc/resolv.conf /etc/sysconfig/network-scripts/ifup-post: ! tr --delete '\n' < /etc/resolv.conf | grep -E -q "${grep_regexp}"; then /etc/sysconfig/network-scripts/ifup-post: # the 'search' field in /etc/resolv.conf: /etc/sysconfig/network-scripts/ifup-post: # Keep the rest of the /etc/resolv.conf as it was: /etc/sysconfig/network-scripts/ifup-post: done < /etc/resolv.conf /etc/sysconfig/network-scripts/ifup-post: # backup resolv.conf /etc/sysconfig/network-scripts/ifup-post: cp -af /etc/resolv.conf /etc/resolv.conf.save /etc/sysconfig/network-scripts/ifup-post: # Update the resolv.conf: /etc/sysconfig/network-scripts/ifup-post:change_resolv_conf "${tmp_file}" /etc/sysconfig/network-scripts/ifup-post: net_log $"/etc/resolv.conf was not updated: failed to create temporary file" 'err' 'ifup-post' /etc/sysconfig/network-scripts/ifup-ppp: cp -f /etc/resolv.conf /etc/resolv.conf.save /etc/sysconfig/network-scripts/network-functions: if ! grep search /etc/resolv.conf >/dev/null 2>&1; then /etc/sysconfig/network-scripts/network-functions: cat /etc/resolv.conf >& $rsctmp /etc/sysconfig/network-scripts/network-functions:change_resolv_conf $rsctmp /etc/sysconfig/network-scripts/network-functions: # Invoke this when /etc/resolv.conf has changed: /etc/sysconfig/network-scripts/network-functions:change_resolv_conf () /etc/sysconfig/network-scripts/network-functions: s=$(/bin/grep '^[\ \ ]*option' /etc/resolv.conf 2>/dev/null) /etc/sysconfig/network-scripts/network-functions: [ -x /sbin/restorecon ] && /sbin/restorecon /etc/resolv.conf >/dev/null 2>&1 # reset the correct context /etc/sysconfig/network-scripts/network-functions: /usr/bin/logger -p local7.notice -t "NET" -i "$0 : updated /etc/resolv.conf" /etc/rwtab: files /etc/resolv.conf /etc/dnsmasq.conf: # somewhere other that /etc/resolv.conf /etc/dnsmasq.conf: # /etc/resolv.conf /etc/dnsmasq.conf: # If you don't want dnsmasq to read /etc/resolv.conf or any other /etc/dnsmasq.conf: # If you don't want dnsmasq to poll /etc/resolv.conf or other resolv /etc/aide.conf: /etc/resolv.conf$ DATAONLY /etc/cloud/templates/resolv.conf.tmpl: # Your system has been configured with 'manage-resolv-conf' set to true.
Function change_resolv_conf in /etc/sysconfig/network-scripts/network-functions looks interesting. Change_resolv_conf runs a logger with a "NET" tag. Let's see if we have any log statements with the "NET" tag.
$ grep resolv.conf /var/log/messages Feb 11 22:20:38 ip-172-30-43-251 NET[856]: /usr/sbin/dhclient-script : updated /etc/resolv.conf Feb 11 22:20:43 ip-172-30-43-251 NET[1713]: /usr/sbin/dhclient-script : updated /etc/resolv.conf Feb 13 19:14:25 ip-10-0-0-38 NET[827]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Change_resolv_conf does indeed run after a restart. Question is what is calling change_resolv_conf?
Let's see if we have dhclient
$ ps -ef | grep dhclient root 834 1 0 19:14 ? 00:00:00 /sbin/dhclient -1 -q -lf /var/lib/dhclient/dhclient--eth0.lease -pf /var/run/dhclient-eth0.pid -H ip-10-0-0-38 eth0
Dhclient is there indeed. Let's see what parts dhclient ships with.
$ rpm -ql dhclient /etc/NetworkManager /etc/NetworkManager/dispatcher.d /etc/NetworkManager/dispatcher.d/11-dhclient /etc/dhcp/dhclient-exit-hooks.d /etc/dhcp/dhclient-exit-hooks.d/azure-cloud.sh /etc/dhcp/dhclient.d /usr/lib64/pm-utils/sleep.d/56dhclient /usr/sbin/dhclient/usr/sbin/dhclient-script /usr/share/doc/dhclient-4.2.5 /usr/share/doc/dhclient-4.2.5/README.dhclient.d /usr/share/doc/dhclient-4.2.5/dhclient.conf.example /usr/share/doc/dhclient-4.2.5/dhclient6.conf.example /usr/share/man/man5/dhclient.conf.5.gz /usr/share/man/man5/dhclient.leases.5.gz /usr/share/man/man8/dhclient-script.8.gz /usr/share/man/man8/dhclient.8.gz /var/lib/dhclient
/usr/sbin/dhclient-script looks interesting. Let's see if /usr/sbin/dhclient-script calls change_resolv_conf.
$ grep change_resolv_conf /usr/sbin/dhclient-script change_resolv_conf ${rscf} change_resolv_conf ${rscf}
/usr/sbin/dhclient-script does call change_resolv_conf indeed.
Looks like dhclient does pull DHCP information from AWS and does write /etc/resolv.conf. Unfortunately, we end up with an incorrect /etc/resolv.conf somehow.
Next, let's find out if dhclient is getting correct information from AWS.
$ cat /var/lib/dhclient/dhclient--eth0.lease lease { interface "eth0"; fixed-address 10.0.0.38; option subnet-mask 255.255.255.0; option routers 10.0.0.1; option dhcp-lease-time 3600; option dhcp-message-type 5;option domain-name-servers 10.0.0.2; option dhcp-server-identifier 10.0.0.1; option interface-mtu 9001; option broadcast-address 10.0.0.255; option host-name "ip-10-0-0-38";option domain-name "us-west-2.compute.internal"; renew 0 2021/02/14 06:45:04; rebind 0 2021/02/14 07:14:24; expire 0 2021/02/14 07:21:54; }
Looking at option domain-name-servers and option domain-name "us-west-2.compute.internal"; dhclient does receive correct information.
Let's look at /usr/sbin/dhclient-script. make_resolv_conf() looks like the following
94 make_resolv_conf() { 95 [ "${PEERDNS}" = "no" ] && return 96 97 if [ "${reason}" = "RENEW" ] && 98 [ "${new_domain_name}" = "${old_domain_name}" ] && 99 [ "${new_domain_name_servers}" = "${old_domain_name_servers}" ]; then 100 return 101 fi 102 103 if [ -n "${new_domain_name}" ] || 104 [ -n "${new_domain_name_servers}" ] || 105 [ -n "${new_domain_search}" ]; then 106 rscf="$(mktemp ${TMPDIR:-/tmp}/XXXXXX)" 107 [[ -z "${rscf}" ]] && return 108echo "; generated by /usr/sbin/dhclient-script" > ${rscf} 109 110 if [ -n "${SEARCH}" ]; then 111 search="${SEARCH}" 112 else
/etc/resolv.conf written by /usr/sbin/dhclient-script will contain "; generated by /usr/sbin/dhclient-script". Our /etc/resolv.conf has "# Generated by NetworkManager".
So far we have the following:
- dhclient is running on the system and is getting correct information from AWS
- dhclient-script is updating /etc/resolv.conf
- /etc/resolv.conf end up being incorrect
Looking again at /usr/sbin/dhclient-script, we see that it generates a temporary file with the content for /etc/resolv.conf after calling change_resolv_conf /usr/sbin/dhclient-script removes the temporary file.
186 change_resolv_conf ${rscf} 187rm -f ${rscf} 188 189 if [ -n "${search}" ]; then 190 eventually_add_hostnames_domain_to_search "${search}" 191 fi
Let's change /usr/sbin/dhclient-script temprorarily and comment out the "rm -f ${rscf}" lines. We can restart dhclient and see what /etc/resolv.conf is /usr/sbin/dhclient-script trying to run.
$ cat /tmp/vP2Gfy ; generated by /usr/sbin/dhclient-scriptsearch us-west-2.compute.internal nameserver 10.0.0.2
We see that /usr/sbin/dhclient-script is writing a correct /etc/resolv.conf.
Is /etc/resolv.conf getting overwritten after?
System log has the line form /usr/sbin/dhclient-script with the timestamp "Feb 13 23:00:36"
Feb 13 23:00:36 ip-10-0-0-38 NET[826]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
The last modified timestamp on /etc/resolv.conf is "2021-02-13 23:00:37.745725664" - a second after /usr/sbin/dhclient-script update.
$ ls -la --time-style=full-iso /etc/resolv.conf -rw-r--r-- 1 root root 642021-02-13 23:00:37.745725664 -0800 /etc/resolv.conf
Given the line "# Generated by NetworkManager" in /etc/resolv.conf. Perhaps, the NetworkManager is modifying /etc/resolv.conf after /usr/sbin/dhclient-script?
$ ps -ef | grep NetworkManager root 618 1 0 23:00 ? 00:00:00 /usr/sbin/NetworkManager --no-daemon $ nmcli device showGENERAL.DEVICE: eth0 GENERAL.TYPE: ethernet GENERAL.HWADDR: 02:9D:07:29:33:AD GENERAL.MTU: 9001 GENERAL.STATE: 10(unmanaged) GENERAL.CONNECTION: -- GENERAL.CON-PATH: -- WIRED-PROPERTIES.CARRIER: on IP4.ADDRESS[1]: 10.0.0.38/24 IP4.GATEWAY: 10.0.0.1 IP4.ROUTE[1]: dst = 10.0.0.0/24, nh = 0.0.0.0, mt = 0 IP4.ROUTE[2]: dst = 0.0.0.0/0, nh = 10.0.0.1, mt = 0 IP4.ROUTE[3]: dst = 169.254.0.0/16, nh = 0.0.0.0, mt = 1002 IP6.ADDRESS[1]: fe80::9d:7ff:fe29:33ad/64 IP6.GATEWAY: -- IP6.ROUTE[1]: dst = ff00::/8, nh = ::, mt = 256, table=255 IP6.ROUTE[2]: dst = fe80::/64, nh = ::, mt = 256GENERAL.DEVICE: lo GENERAL.TYPE: loopback GENERAL.HWADDR: 00:00:00:00:00:00 GENERAL.MTU: 65536 GENERAL.STATE: 10(unmanaged) GENERAL.CONNECTION: -- GENERAL.CON-PATH: -- IP4.ADDRESS[1]: 127.0.0.1/8 IP4.GATEWAY: -- IP6.ADDRESS[1]: ::1/128 IP6.GATEWAY: --
NetworkManager is running on the system, but is not managing any devices. Do we need to run NetworkManager at all?
Let's disable NetworkManager and see if we get a correct /etc/resolv.conf.
$ systemctl disable NetworkManager Removed symlink /etc/systemd/system/multi-user.target.wants/NetworkManager.service. Removed symlink /etc/systemd/system/dbus-org.freedesktop.nm-dispatcher.service. Removed symlink /etc/systemd/system/network-online.target.wants/NetworkManager-wait-online.service.
Success!
$ cat /etc/resolv.conf ; generated by /usr/sbin/dhclient-scriptsearch us-west-2.compute.internal nameserver 10.0.0.2
Good post.