World Line

Archive for December 2010

AWS Key Server

with 15 comments

It’s the last day of 2010, and as an AWS fan there’s a lot to reflect on from the last year. There’s been several new SDKs, a couple of new services, more than twice as many consoles as there were this time last year, and so many major improvements to existing services that if I linked them all, every word in this post would be blue. It’s been really exciting, and nobody could have predicted half of this stuff in 2009.

With that in mind, I’m still going to go ahead and make a bold prediction: I believe, when all is said and done, that the most important release of 2011 will end up being the AWS Identity and Access Management (IAM) service, which is currently still in “Preview Beta”. I don’t know what else 2011 will bring, but I’d be surprised to see anything top this.

This needs justification, of course, since the beta has been proceeding with extremely little fanfare and very few of the AWS community have been talking about it. I think this is because it hasn’t been interpreted right yet; people (including Amazon’s own documentation[2]) still only refer to its ability to split up developers’ access to the account, so that testers can only access test instances, database maintainers can only access RDS operations, etc. This is all well and good, but by itself is not the reason I’m excited about IAM.

To explain, I have to bring up the pair of Mobile SDKs (Android and iOS) released earlier this month. They have an inherent critical flaw: credential management. If you want to write a mobile application that uses AWS services (but not on behalf of the user), you need to, at some point, have your credentials in memory on the device. You can try to obfuscate them, or minimize the amount of time it happens using various tricks like downloading them to SQLCipher-encrypted SQLite blocks, but at the end of the day, the request has to be signed on the device. This presents a problem because a skilled and determined attacker can run your APK in a simulator and read out your credentials no matter how well you’ve obfuscated them. You can try proxying all your requests through a server that does the signing, but that defeats the purpose of using AWS services in the first place — now you have another hop which can be a point of failure, latency, maintenance, which needs to be scaled, etc. I know this is a problem people are actually concerned about, because I’ve spent time answering questions about it in several places.

IAM solves this problem beautifully. It allows you to programatically create low-privileged user accounts on demand for every user of your mobile application, thereby resolving the problem of storing credentials by making the credentials worthless! The advantages are many:

  • Prevents abusing the account in any way other than what could have been done from the application anyway.
  • Allows you to control access through registration, paywalls, etc.
  • Allows you to individually disable abusive accounts.
  • Immediately provides fine-grained permissions control (like only allowing DeleteObject requests on S3 objects created by the same sub-account — the same functionality on a single account would require complicated bookkeeping somewhere else other than S3)
  • Provides a crude form of analytics. You’ll at least have an upper bound on the number of unique users, and roughly how heavily each one is using it.

To demonstrate how neatly IAM works, I’ve created a proof-of-concept awskeyserver project on GitHub. It’s a Google App Engine service that creates on-demand IAM accounts through a RESTful interface. By default it gives them out to everyone, so hitting http://awskeyserver.appspot.com/create_user?group=Users will return something like AKIAIFY7K3N4Y6OE2DLQ:dGV6MHs3pMjRkT4RZmwnOZndOJG75FyOjpeQYVFA, which are the access credentials for an account in the Users group. In this case, the Users group has precisely no permissions, so it’s no big deal that I just posted those on my public blog (a few months ago that would’ve been a code-red disaster!). A mobile app can cache that and use it for the lifetime of the installation.

But that’s not all! If you’re noticing way too many requests for accounts, you can just edit permissions.py and add a CaptchaValidator() policy handler to the Users group. Once you do that, the previous request will instead return something like 03AHJ_VuueXyjZt-oM2Bf1K3c8rfsb_NHnjRQPLKDL0Vc1GhYs4LFmcuEYdBTfpGfYJbtRVwNL-OXeuAyApuHqrZ_J90i3qLJDHKepDFGIGTGQR8f1sRQpIchSDowHQHZczdbeLEdJPmR5Paiq_XjwzcJMMrZ4d1UHXQ, which is a reCAPTCHA challenge id. If you look that URL up in reCAPTCHA it’ll provide you with a captcha image, whose response has to be filled in (along with the challenge id) to the recaptcha_challenge_field and recaptcha_response_field query parameters to awskeyserver. Only then will it cough up the credentials, giving you a way to stem the flow. Captcha’s are one simple way of doing that, but I’ll be adding more context-aware validators to awskeyserver as time goes on.

To show how this works in practice, I’ve taken the sample AwsDemo application that comes with the AWS SDK for Android and modified it[1] to work without any local credentials at all. The modification only lives in one file, AwsDemo.java, which I’ve Gisted. It can handle both plain and captcha policies, and the returned credentials are limited by whatever permissions you set on the server side. In the example below, the group has the following policy file:

{
  "Statement":[{
    "Effect":"Allow",
    "Action":["sdb:ListDomains"],
    "Resource":"*"
  }]
}

and awskeyserver has the policy:

policy = { "Users" : CaptchaValidator() }

So as you’ll see in the screenshots, listing domains works without a problem, but trying to access them results in an error:

Captcha-enabled key server challenge

Once the credentials are there, they're for listing SDB domains only

Any other operation results in an exception.


This sort of behavior was literally impossible to achieve safely before IAM, and now it’s an afternoon’s worth of work. That’s the reason I think IAM will be so huge — up until now, consumers of AWS have always had to hide behind a server, performing requests on the behalf of clients. Amazon always warned you never to give out your secret key (under any circumstances!) but people still did to services like s3fm or Ylastic, because there was simply no other way for them to work. IAM for the first time opens up the possibility for letting clients make their own requests, directly, without having to go against their own AWS accounts to do so. I imagine Dropbox‘s backend architecture would look quite different if it had been designed in a post-IAM world. They could have saved so much by pushing some of the logic their servers perform to the client side and letting it upload to S3 directly.

So yeah. I think 2011 will be the year of client-side AWS applications, and it will be IAM that allows this to happen. If I’m wrong, that means something even cooler comes out of there next year, and I can’t wait to see what it is.


[1]: This is not how I think production code should look like, by the way. It’s very proof-of-concept.
[2]: Well, with one exception. The bottom 10% of this.

Advertisements

Written by Adrian Petrescu

December 31, 2010 at 5:52 am

Posted in Computer Science, Development

Tagged with ,

Knuth’s “Why Pi” Talk at Stanford: Part 1

with 6 comments

A few weeks ago I had another opportunity to meet a giant in my field: Donald Knuth was giving his annual Christmas Tree lecture at Stanford, where I happened to be near at the time. The most exciting part was probably right at the very beginning, when Knuth announced that, for the first (“and probably only”) time, he had finished two books on that same day: The Art of Computer Programming: Volume 4A, and Selected Papers on Fun and Games, with the pre-order links on Amazon going up the same day.

The topic of this year’s lecture was, according to the site:

Mathematicians have known for almost 300 years that the value of n! (the product of the first n positive integers) is approximately equal to\left(\frac{n}{e}\right)^n\sqrt{2\pi n}, where\piis the ratio of the circumference of a circle to its diameter. That’s “Stirling’s approximation.” But it hasn’t been clear why there is any connection whatever between circles and factorials; the appearance of\pihas seemed to be purely coincidental, one of the amazing accidents of nature. Recently however Johan Wรคstlund has found a nice explanation of this phenomenon, and the reason turns out to be connected with the theory of trees.

I found the lecture originally hard to follow; although Knuth is now 74 years old and physically growing quite frail, his mind was still evidently much sharper than the audience’s; he was completely prepared for every question regarding extensions and corner cases of the material he was presenting. I considered the notes I took to be somewhat useless since I had spent so much time just trying to follow his reasoning. However, when the video became available, I had no excuse left, so I went through it again armed with a pen and the pause button, and I now have a set of notes somewhat worthy of posting here for anyone who missed it. Any errors or inconsistencies are almost definitely my fault and not his.

The notes themselves are after the jump.
Read the rest of this entry »

Written by Adrian Petrescu

December 28, 2010 at 4:41 am

Posted in Computer Science, Mathematics

Tagged with ,

Some cool Zsh magic

with 2 comments

If you’re a programmer, chances are you spend a lot of time deep inside some nested folder hierarchy. Oftentimes you want to jump back to some fixed point within that hierarchy, but you’re not exactly sure how many levels deep you are, so you just cd ../../..., using tab completion to guide you until you land where you want.

I find this annoying.

Ideally I would want to have some syntax like cd .../foo which would go back as many levels as needed until it finds foo (or /). If you’re using Bash, your best bet is to write a wrapper for cd which does exactly that. If you’re using Zsh (and you should be!) you can be much more clever. Add the following to your .zshrc file:

setopt extendedglob
function cdd() { builtin cd (../)##$1([1]) }

This is basically a high-powered alias that will use extended glob matching until it finds the shortest path of the form ../(../)*$1, where $1 is the first argument. The double-hashes ensure that at least one occurrence of (../) has to be there (otherwise it could match something in the current working directory). With that all set up, it’s a simple matter of:

$ pwd
/home/user/a/b/c/d/e/f
$ cdd c
$ pwd
/home/user/a/b/c

Good! But not quite good enough. This works for navigating with cd, but any other tool which uses paths would need their own command to pull off a similar trick. If you’re using Bash, this is really the end of the line for you. Zsh users, however, can man zshall, grep for zsh_directory_name, and get a big smile on their face. zsh_directory_name is a shell function that basically allows you to process any path entered in the form ~[path] in the substitution phase. With this facility, we can add the following to .zshrc:

zsh_directory_name () { [[ $1 != n ]] && return 1; reply=( (../)##$2([1]) ) }

The ‘n‘ string checks whether zsh_directory_name is being called to do directory expansion (n) or directory naming (d). If so, the reply array is basically doing the same trick cdd does on $2.

Once we have that, we can operate on any token:

$ pwd
/home/user/a/b/c/d/e/f
$ ls ~[a]
b
$ mkdir ~[a]/g
$ ls ~[a]
b g

Awesome!

Written by Adrian Petrescu

December 3, 2010 at 1:13 am

Posted in Development

Tagged with

%d bloggers like this: