Scaling Amazon ECS Services for Efficient SQS Queue Handling

4 min readFeb 4, 2024

This article serves as a continuation of the initial post authored by Ahmed Azzam and Sebastian Lee titled “Amazon Elastic Container Service (ECS) Auto Scaling using custom metrics.” It includes practical code segments and configurations to assist individuals seeking to implement a comparable solution.

Introduction

This post details our approach to scaling an Amazon Elastic Container Service (ECS) service dynamically based on the load changes in an Amazon Simple Queue Service (SQS) queue. Recent challenges in efficiently managing a growing queue with our event-handler microservices prompted us to devise a custom scaling solution. This is particularly crucial, as sustaining a high number of tasks consistently can lead to substantial costs.

Solution

While leveraging the SQS CloudWatch metric “ApproximateNumberOfMessages” provides a basic scaling mechanism, its limitation lies in its inability to adjust proportionally to the ECS service’s capacity. Merely relying on the number of messages in the SQS queue is insufficient for determining the required number of ECS tasks. Therefore, our solution incorporates additional factors, specifically the time it takes to process a message and the acceptable latency.

Backlog Per Task Custom Metric

Introducing a custom metric named “Backlog per Task,” we factor in both message processing time and acceptable latency to determine the optimal workload per task. For instance, if we have a queue with 1500 messages and the service runs 10 tasks, with an average processing time of 0.1 seconds per message and an acceptable latency of 10 seconds, the backlog per task would be set at 100 messages. Considering the current messages per task value is 150, configuring the target tracking policy to 100 ensures that the ECS service scales with an additional 5 tasks to balance the workload.

A more pragmatic approach would involve regarding the acceptable latency as half of the message retention time, a duration usually configured in the SQS queue attributes. For instance, if the message retention period is set to 1 day, the acceptable latency would be considered as 12 hours (43200 seconds). Assuming a message requires an average processing time of 6.5 seconds, the acceptable backlog per task would then be approximately 6600 messages. By setting an acceptable latency of 12 hours, we guarantee that tasks are consistently adequate to handle fluctuating workloads, even if the queue experiences a surge in messages. The average message processing time is a parameter we adjust to meet the desired messages per task count, providing flexibility in managing varying workloads.

When adjusting the target tracking policy, simply determine the desired number of messages a task should consume and set it accordingly. This approach ensures a dynamic and cost-effective scaling solution for ECS services, addressing the challenges posed by evolving SQS queue loads.

Diagram

Components

Pre-requisites

To implement this solution, ensure the following:

An SQS queue and a corresponding consumer microservice are in place.
Infrastructure autoscaling is configured for the ECS cluster to automatically adjust the number of instances based on task count changes.

A Lambda function is set up to periodically poll the SQS queue’s ApproximateNumberOfMessages and calculate the backlog_per_task (current) metric every 5 minutes.

Lambda Function

The custom Lambda function extracts key metrics, such as the number of messages in the SQS queue and the current tasks in the ECS service, then calculates the current backlog per task. This function runs at regular intervals to provide real-time data for scaling decisions.

approximate_number_of_messages = float(get_queue_attributes(os.environ.get('QUEUE_URL'))['ApproximateNumberOfMessages'])
        number_of_active_task_in_service = get_number_of_active_task_in_service(cluster_name, service_name)
backlog_per_task = approximate_number_of_messages / number_of_active_task_in_service
        metric_data = put_metric_data(backlog_per_task, cluster_name, service_name)
        print({
            'number_of_active_task_in_service': number_of_active_task_in_service,
            'approximate_number_of_messages': approximate_number_of_messages,
            'backlog_per_task': backlog_per_task
        })

Target Tracking Policy

In the microservice’s target tracking policy, we set the acceptable_backlog_per_task value as our target. This is determined by dividing the longest acceptable latency by the average processing time of a message. CloudWatch alarms are configured based on these target values, triggering ECS to scale in or out to maintain the desired backlog_per_task.

To implement this, a CLI method is utilized due to the absence of support for custom target tracking policies via the console UI or boto3. The following steps are involved:

Create a target_tracking_policy.json file for the custom backlog_per_task metric.

{
    "TargetValue":6600,
    "ScaleOutCooldown":15,
    "ScaleInCooldown":15,
    "CustomizedMetricSpecification":{
        "MetricName":"backlogPerTask",
        "Namespace":"CustomDemo/ECS/SQS",
        "Dimensions":[
           {
              "Name":"ECSClusterName",
              "Value":"event-demo-dev-ecs-cluster"
           },
           {
              "Name":"ECSServiceName",
              "Value":"dev-event-handler-dummy-listener-ecs-service"
             }
        ],
        "Statistic":"Average"
     }
 }

2. Run the following AWS CLI command to register the microservice as a scalable target

CLUSTER_NAME=event-demo-dev-ecs-cluster
SERVICE_NAME=dev-event-handler-dummy-listener-ecs-service
aws application-autoscaling register-scalable-target \
--service-namespace ecs \
--scalable-dimension ecs:service:DesiredCount \
--resource-id service/$CLUSTER_NAME/$SERVICE_NAME \
--min-capacity 1 \
--max-capacity 10

3. Run the following AWS CLI command to attach the custom target tracking policy

CLUSTER_NAME=event-demo-dev-ecs-cluster
SERVICE_NAME=dev-event-handler-dummy-listener-ecs-service
POLICY_NAME=sqs-queue-tracking-policy
aws application-autoscaling put-scaling-policy \
--policy-name $POLICY_NAME \
--service-namespace ecs \
--resource-id service/$CLUSTER_NAME/$SERVICE_NAME \
--scalable-dimension ecs:service:DesiredCount \
--policy-type TargetTrackingScaling \
--target-tracking-scaling-policy-configuration file://policy.json

Alternatively, the Lambda function can independently scale the task count by calculating the acceptable_backlog_per_task value. This approach eliminates the need for a target tracking alarm and allows manual scaling of the service.

async def scale_ecs_task(approximate_number_of_messages):
    average_processing_time = float(os.environ.get('AVERAGE_PROCESSING_TIME'))
    acceptable_latency = float(os.environ.get('ACCEPTABLE_LATENCY'))
    acceptable_backlog_per_task = acceptable_latency / average_processing_time
    no_of_task_desired = approximate_number_of_messages / acceptable_backlog_per_task

  return await scale_no_of_tasks_in_service(
        os.environ.get('ECS_CLUSTER_NAME'),
        os.environ.get('ECS_SERVICE_NAME'),
        no_of_task_desired
      )