Design for Automation principle in Cloud Native Applications

Automation has always been good practice for software systems, but the cloud makes it easier than ever to automate the infrastructure and components within it. Although the initial investment is often higher, prefer an automated solution, it will almost always pay off. in the medium term in terms of effort, but also in terms of the resilience and performance of your system. Automated processes can fix, scale, and deploy your system much faster than people. As we’ll see later, cloud architecture isn’t a one-size-fits-all solution, and automation is no exception, because you find new ways your system should act, so you find new things to automate.

And here are some common areas for automating cloud-native applications:

Automate infrastructure creation, as well as updates, using tools like TerraForm or Google Cloud Deployment Management


Automate the compilation, testing, and distribution of the packages that make up the system using tools like Google Cloud Build, Jenkins, and Spinnaker. Not only should you automate deployment, but you should also strive to automate processes like canary testing and rollback.

Scale Up/Down

Unless the system load almost never changes, it is necessary to automate system scaling in response to increasing load and scaling down in response to prolonged load drops. By increasing you ensure that your service remains available and by reducing it you reduce costs. This makes sense for large-scale applications, such as public websites, but also for smaller, unevenly loaded applications, such as internal applications that are very busy at times but barely used at others. For applications that sometimes receive almost no traffic and for which you can tolerate some initial latency, you should also consider scaling to zero (remove all running instances and restart the application if necessary).

Monitoring & automated recovery

You need to monitor and connect to your cloud-native systems early on. Dataflow logging and monitoring can of course be used to monitor system health, but they can have many uses beyond that. For example, they can provide valuable information about system usage and user behavior (how many people are using the system, what parts they are using, what is their average latency, etc.). Second, they can be used in an aggregated fashion to provide a measure of overall system health (eg. Is a drive nearly full again, but filling up faster than usual? Disk and service usage? etc. Finally, they are a great place to plug Now, when the disk is full, instead of just logging an error, you can also automatically resize the disk to keep the system running.