Fixing Amazon S3

    • Posted on 16. May 2011 at 07:04 UTC by Felix Geisendörfer

    Amazon S3 is great, except when it isn't. When you are using it a lot (we have stored ~1 million customer objects so far), you start to see some odd things that look more like "eventual madness" than "eventual consistency".

    One of the things we learned right away, is that you can't trust S3's error codes. Sometimes you will see a 403 Permission Denied, but executing the exact same request again works. So generally you have to retry on any errors you are getting, and only if a certain error occurs several times, you should start listening. We are currently re-trying up to 6 times with increasing timeouts.

    The latest thing we ran into, is that S3's confirmations for PUT requests are not to be trusted. The symptoms we saw were that some of the S3 urls we returned to our customers were simply not working. Reproducing this has been a huge challenge. At one point we were able to see the problem after ~25 PUT's, another time nothing happened after 2500 PUT's (10GB total). So there is still a tiny chance of a bug being somewhere within our system, but at this point we are pretty convinced that this is an Amazon issue that occurs on and off.

    Enter crazy land. When dealing with an eventually consistent system, where "eventually" may sometimes mean "never", how do you verify the success of a write operation? Of course you can check the bucket if the object exists after each write, but what if it doesn't? When does this become an error condition? And even worse, when the object does exist, what does that mean? For all that is safe to assume, it means that at least one client is able to see the object, but not that it has finished replicating / is available to all clients.

    While there is no perfect answer, we have decided to verify the presence of a stored file with up to 11 checks over a period of 2 minutes, and if the file does not exist by then, retry the job up to 6 times. While this is not ideal, it will probably put the number of invalid S3 urls we hand out at the same odds as a decent lottery win.

    Another thing that struck us as interesting is that S3's consistency model is not the same in all regions. While "US Default" employs the traditional "eventual consistency" model, all other regions provide "read-after-write" consistency. While incredibly confusing from a customer standpoint, the reason for this seems to be that "US Default" spawns from the east to the west coast, so the speed of light is getting into the way of providing the same guarantees.

    --fg

    PS: We are in the process of collecting better data on our S3 experience which we will share in the future. S3 is definitely working great for us almost every time, and if it turns out that we were using it wrong, we will follow up on that as well.

Comments