Building AWS AMIs from scratch is the practice many security-focused organizations strictly follow.
I have been building custom AWS AMIs from scratch for my personal AWS environment for many years. Looking at my personal AMI collection, I have custom AMIs dating back to CentOS 5.
A few days ago I decided to do another refresh of my AWS environment and started with a new and improved build of CentOS 7.9. I rehashed my packer build merging in several improvements. OVA creation went flawless and AWS vmimport successfully finished. When I booted my new AMI to upgrade CentOS stock kernel to the latest kernel-lt 5.4, I noticed that DNS resolution is failing and that /etc/resolv.conf did not get updated correctly:
I had to troubleshooting and fix DHCP on my new image.
Nagios is loved and hated in the industry. Nagios has a great brand recognition, but suffers from less than intuitive setup and difficult to automate configuration. With passive checks and right Chef cookbooks, Nagios/Icinga setup becomes very effective. Another Nagios weakness is poor implementation. Fortunately, Nagios poorly implements right ideas, so not everything is lost. This war story is about fixing Nagios NSCA implementation. Nagios NSCA is a Nagios passive-check daemon written in C that runs on a Nagios server and processes all passive check sent by Nagios clients.
Starting to use Chef is easy. Realizing the full power of Chef is not quite as easy. Many Chef cookbooks that I looked at can be significantly improved. Cookbooks created by developers can be improved by using systems engineering patterns. Cookbooks created by system administrators can be improved by using software engineering patterns. Opscode does a very good job describing building blocks of Chef cookbooks. Explaining how to write good Chef cooks is something I have not seen done yet. In this post I will go over my cookbook writing process using a syslog-ng cookbook as an example.
One of the application server pools running a LAMP application (Linux / Apache/ MySQL / Memcached / PHP) was showing the following issue. Every time new code was deployed to the pool, Apache processes on 10-15 random servers out of 300+ would deadlock and stop serving requests. The problem was happening for some time but was not perceived as a big issue. Random Apache deadlocks did not impact the end users, and restarts of deadlocked Apache instances returned services to the operational state. I was not very comfortable with this behavior, so I decided to dig in and see what was going on.
Problem Requests to a LAMP-based Facebook application are load balanced between server nodes by an F5 BIG-IP Local Traffic Manager (LTM). The F5 BIG-IP LTM has two components: an ASIC-based fast-switching component and an AMD Opteron-based software-switching component. Layer 4 load balancing between the application server nodes can be handled either exclusively by the F5 ASIC (Performance L4 mode), or exclusively by F5 software components (Standard mode), or by a combination of F5 ASIC and F5 software components. With the Standard mode turned on, F5 BIG-IP LTM capacity utilization is at 60%. With the Performance L4 mode turned on, F5 BIG-IP LTM capacity utilization is at 20%. However, with the Performance L4 mode turned on, developers report seeing an occasional blank Facebook canvas page returned in response to legitimate application requests. Serving blank Facebook canvas responses at any significant rate is unacceptable. The issue with the blank Facebook canvas responses in Performance L4 mode has to be addressed.
If nothing else, DevOps is a buzzword that means different things to different people.
Ted Dziuba thinks that DevOps is a trend of nonsense "where system administrators start writing unit tests and other things to help the developers warm up to them." DevOps is certainly not about system administrators writing unit tests. DevOps is not about Ted Dziuba learning "Operations" (aka UNIX) either. DevOps is about system administrators, Ted Dziuba, and developers working closely together sharing knowledge about unit tests, UNIX, etc. to produce and deliver products.