In some of our recent work we found that creating a component register for AWS deployed resources (with a unique ID for every resource) was extremely helpful in setting up alarms. The register could be used to set up a consistent naming convention that could be used to pull out meaningful context and extra information in a highly efficient way.
The Component Naming Convention allows us to consistently identify deployed resources in respect to:
- What Product they belonged to
- What Environment were they for (e.g. dev/sit/uat/prod)
- What sort of component (though that would be obvious) (e.g. Lambda, Database, WAF)
- What Application/Business Service, and optionally what sub component of that service was this component
- A unique numeric identifier that was managed in a centralised location (e.g. a configuration management database, or documentation in Confluence).
Naming Pattern for Deployed Resources
The pattern for the above is:
${Product}_${Environment}_${ComponentType}_${ServiceName}_ ${ServiceComponent}_${ComponentReference}
A Lambda function would then have a Name like the following:
productA_uat_Lambda_Integration_Requeue_C149
And the IAM Role it executes with would be:
productA_uat_IAMRole_Integration_C145
This naming standard also helps when viewing these resources in the AWS console as the resources are ordered by name, so that resources for the same Product, Environment, and Service are together.
Naming Convention for CloudWatch Alarms
We also use a similar standard for naming CloudWatch Alarms.
This has two main purposes for us:
- Easily identify exactly what the Alarm means and the implications
- Allow information to be decoded from the Alarm name and sent to a user in a different format (email, slack, etc).
The pattern for Alarms is similar to the component naming:
${Product}_${Environment}_${ComponentType}_${ServiceName}_${ServiceComponent}_${ComponentReference}_${AlarmSeverity}_${AlarmType}
For the same Lambda function above, we would configure a Throttling Alarm with the following name:
productA_sit_Lambda_Integration_C149_Critical_T002-Lambda-Throttling-Alarm
For each type of AWS resource, we define an AlarmType that uniquely identifies the nature of this alarm.
These AlarmTypes are documented (e.g. on Confluence) to detail what metrics the Alarm is configured to be triggered by.
The table below shows the naming parts used for both the Component Name and Alarm Name:
Some examples we’ve used for the AlarmType specifically for Lambda functions:
T001-Lambda-Error-Alarm
T002-Lambda-Throttling-Alarm
T003-Lambda-High-Invocations
T004-Lambda-High-Concurrent-Executions
T005-Lambda-High-Duration
T006-Lambda-High-Unreserved-Concurrent-Executions
T007-Lambda-Error-Percentage-Alarm
We’ve used the following when creating CloudWatch alarms for Aurora MySQL databases:
T040-Aurora-High-CPU
T042-Aurora-Free-Local-Storage
T043-Aurora-Database-Connections
T044-Aurora-Commit-Latency
T045-Aurora-Insert-Latency
T046-Aurora-Update-Latency