AWS Elasticsearch/SQS 403 Forbidden

Fixing 403 Forbidden error when writing to AWS Elasticsearch and SQS

  • 403 Forbidden
  • AWS
  • Elasticsearch
  • RequestTimeTooSkewed
  • SQS
  • Simple Queue Service
  • fluentd
Posted on

AWS Elasticsearch/SQS 403 Forbidden

Fixing 403 Forbidden error when writing to AWS Elasticsearch and SQS

Peter Marriott

Back in 2011 we were developing an application on low-powered edge devices that sent messages using AWS SQS. During development this all worked fine but as with most things when it got to the real world testing we started to get problems.

When sending a message to SQS we would get 403 Forbidden. A quick search for SQS and 403 wasn’t very helpful. There was no mention in Amazon SQS API reference Common errors, for example.

When we widened our search we found did find 403 Forbidden in Amazon S3 API error codes.

There are several possible causes of 403 errors, the most likely being RequestTimeTooSkewed. That is, the difference between the timestamp in the message (which is verified by the message signature) and the server’s time is too large.

The full S3 Error detail:

Message Value
Status Code 403
AWS Service Amazon S3
AWS Request ID XXXXXX
AWS Error Code RequestTimeTooSkewed
AWS Error Message The difference between the request time and the current time is too large

These low-powered edge devices had inaccurate clocks that drifted. When they had drifted by over 10 minutes they would start throwing the 403 Forbidden error. The simple fix was to correct the time. This was quite painful as these devices do not have anything like NTP but we managed to get the devices to correct their time periodically by coding an SNTP client.

403 Forbidden on Elasticsearch

Fast forward to 2016 and I was in a team that were using the AWS-hosted Elasticsearch. The logging server using fluentd just stopped logging and started throwing 403 Forbidden errors. The guys looking at this found nothing when searching for ‘403 Forbidden and Elasticsearch’. Overhearing their conversation, I heard them mention ‘AWS’ and ‘403’ in the same sentence. Remembering our experience of five years before I asked them to check the time - and yes, the time on the logging server had drifted.

The investigation showed that the logging server did have NTP on it. However a mis-configuration had meant it had drifted more than 10 minutes. When the time was reset logging resumed.

I hope that this helps someone puzzling over the same problem.