Fifo-SQS lambda triggering failure handling

Our system uses Fifo SQS queues to drive lambdas. Here's from our SAM template:

  EventParserTriggeringQueue:    Type: AWS::SQS::Queue    Properties:      MessageRetentionPeriod: 1209600  # 14 Days (max)      FifoQueue: true      ContentBasedDeduplication: true      VisibilityTimeout: 240  # Must be > EventParser Timeout      Tags:        - Key: "datadog"          Value: "true"      RedrivePolicy:        deadLetterTargetArn: !GetAtt EventParserDeadLetters.Arn        maxReceiveCount: 1  EventParser:    Type: AWS::Serverless::Function    Properties:      CodeUri: lambdas/event_parser_lambda/      Handler: event_parser.lambda_handler      Timeout: 120      Events:        EventParserTriggeringQueueEvent:          Type: SQS          Properties:            Queue: !GetAtt EventParserTriggeringQueue.Arn            BatchSize: 1            ScalingConfig:              MaximumConcurrency: 2      Policies:        Statement:          - Action:              - ssm:GetParametersByPath              - ssm:GetParameters              - ssm:GetParameter            Effect: Allow            Resource:              - Fn::Sub: "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/datadog/api_key"              - Fn::Sub: "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/sentry/dsn"              - Fn::Sub: "arn:aws:ssm:${AWS::Region}:${AWS::AccountId}:parameter/${AWS::StackName}/*"          - Action:              - sqs:DeleteMessage              - sqs:GetQueueAttributes              - sqs:ReceiveMessage            Effect: Allow            Resource: !GetAtt EventParserTriggeringQueue.Arn  EventParserDeadLetters:    Type: AWS::SQS::Queue    Properties:      MessageRetentionPeriod: 1209600  # 14 Days (max)      FifoQueue: true      ContentBasedDeduplication: true      Tags:        - Key: "datadog"          Value: "true"        - Key: "deadletter"          Value: "true"

What I'm looking for is retry behavior that looks like:

If a lambda fails, it gets to retry immediately
If a lambda fails more than the maximum allowed failure count, its message goes on a dead-letter queue immediately and the next message can be tried immediately.

Instead, the behavior we're seeing is:

If a lambda fails, it is retried only after the visibility timeout period. This period is necessarily longer than the lambda's typical runtime, so a lot of delay is imposed here.
If a lambda fails more than the maximum allowed failure count, the message only goes on a dead-letter queue after the visibility timeout period.

First, let me check my understanding of how the system works, because it's not really documented in any one place:

For an SQS-driven lambda, the lambda runtime calls ReceiveMessage on the SQS queue periodically. From our system, it looks like the default is once every 10 seconds.
If there's a message available, the queue returns it.
When the queue returns a message, it starts the clock on the visibility timeout.
- Until the visibility timeout has elapsed, ReceiveMessage calls to the queue (for the same message group ID) come back empty. (This is a Fifo SQS feature. For non-FIFO queues, only the received messages are hidden.)
- When the visibility timeout has elapsed, if the head message has been received at least the queue's maxReceiveCount, the queue gives up on the message, optionally placing it on a dead-letter queue.
The lambda runtime passes the message along to the lambda function.
If the function succeeds, the runtime calls DeleteMessage on the queue. This removes the head message, and also makes the next message available (i.e. it clears the visibility timeout).
If the message fails, the runtime carries on as though nothing has happened:
- It polls the queue periodically, meaning it gets empty responses to ReceiveMessage until the visibility timeout has elapsed
- Once the visibility timeout is passed, the queue returns the same message again. Or, if the message has been received at least its "max receive count," the queue will return the next message.

One solution I have considered:

Basically, put the lambda in charge:

Put retry logic in a loop in the lambda
If the lambda gets through its loop without a success, have it explicitly enqueue the message to an SQS queue that we'll use for dead letters. This queue wouldn't be configured as a DLQ, only we'd use it that way.
The lambda always returns successfully, so the lambda runtime always deletes the message from the Fifo input queue.

Is this the best I can do?

One serious issue with this approach is, lambda functions can't run longer than 15 minutes and I do worry that retrying 5 times could put us at risk.

Fifo-SQS lambda triggering failure handling

What I'm looking for is retry behavior that looks like:

Instead, the behavior we're seeing is:

First, let me check my understanding of how the system works, because it's not really documented in any one place:

One solution I have considered:

Is this the best I can do?

Trending Articles

Forum Post: RE: F29H85X-SOM-EVM: led_ex2_blinky_cpu1_cpu2_cpu3_multi flash...

Ex-girlfriend of Brunswick murder defendant testifies

Nalgonda District Police Office Mobile Numbers List in Telangana State

NTFS アクセス権と NFS アクセスについて

ENCKE SUSAN C. (CHAKLOS) OF PY...

Dequez Mckie

Download: Dumbwi ft Kaliman & Youngspirit – Tako Imozi

SPC max Keygen - With Delphi Driver

Office 365 でメールをご利用のお客様へ: コネクタを構成している場合の重要なお知らせ

Auto Commit State Persists After Connection Close on UCP

Practice Sheet of Right form of verbs for HSC Students

Bureau of Internal Revenue: Regional Offices (Directory)

Best Suvichar in Hindi |बेस्ट सुविचार |शुभ विचार हिंदी में

Reply: Ticket to Ride Legacy: Legends of the West:: Rules:: Re: Searching for...

SuperCopier Profesional v4.1.0.100 "BlueFish" (2014)

Nahitaji matokeo ya kidato cha nne ya mwaka 1998

Chaness Tapia Arrested by Miami-Dade County Corrections on Dec 16, 2019

A dream come true for Khalia

Dorothy R. McGee

MP Gramin Kamgar Setu Portal Registration for CM Rural Street Vendor Loan...