AWS Elasticsearch and SQS Return 403 Forbidden Error
Fixing 403 Forbidden errors when writing to AWS Elasticsearch and SQS
Peter MarriottBack in 2011 we were developing an application on low-powered edge devices that sent messages using AWS SQS. During development this all worked fine but as with most things when it got to the real world testing we started to get problems.
When sending a message to SQS we would get 403 Forbidden
. A quick search for SQS and 403 wasn't very helpful. There was no mention in
Amazon SQS API reference Common errors, for example.
When we widened our search we found did find 403 Forbidden
in Amazon S3 API error codes.
There are several possible causes of 403 errors, the most likely being RequestTimeTooSkewed. That is, the difference between the timestamp in the message (which is verified by the message signature) and the server's time is too large.
The full S3 Error detail:
Message | Value |
---|---|
Status Code | 403 |
AWS Service | Amazon S3 |
AWS Request ID | XXXXXX |
AWS Error Code | RequestTimeTooSkewed |
AWS Error Message | The difference between the request time and the current time is too large |
These low-powered edge devices had inaccurate clocks that drifted. When they had drifted by over 10 minutes they would start throwing the 403 Forbidden
error. The simple fix was to correct the time. This was quite painful as these devices do not have anything like NTP but we managed to get the devices to correct their time periodically by coding an SNTP client.
403 Forbidden on Elasticsearch
Fast forward to 2016 and I was in a team that were using the AWS-hosted Elasticsearch. The logging server using fluentd just stopped logging and started throwing 403 Forbidden
errors. The guys looking at this found nothing when searching for '403 Forbidden and Elasticsearch'. Overhearing their conversation, I heard them mention 'AWS' and '403' in the same sentence. Remembering our experience of five years before I asked them to check the time - and yes, the time on the logging server had drifted.
The investigation showed that the logging server did have NTP on it. However a mis-configuration had meant it had drifted more than 10 minutes. When the time was reset logging resumed.
I hope that this helps someone puzzling over the same problem.