World Line

AWS Key Server

with 15 comments

It’s the last day of 2010, and as an AWS fan there’s a lot to reflect on from the last year. There’s been several new SDKs, a couple of new services, more than twice as many consoles as there were this time last year, and so many major improvements to existing services that if I linked them all, every word in this post would be blue. It’s been really exciting, and nobody could have predicted half of this stuff in 2009.

With that in mind, I’m still going to go ahead and make a bold prediction: I believe, when all is said and done, that the most important release of 2011 will end up being the AWS Identity and Access Management (IAM) service, which is currently still in “Preview Beta”. I don’t know what else 2011 will bring, but I’d be surprised to see anything top this.

This needs justification, of course, since the beta has been proceeding with extremely little fanfare and very few of the AWS community have been talking about it. I think this is because it hasn’t been interpreted right yet; people (including Amazon’s own documentation[2]) still only refer to its ability to split up developers’ access to the account, so that testers can only access test instances, database maintainers can only access RDS operations, etc. This is all well and good, but by itself is not the reason I’m excited about IAM.

To explain, I have to bring up the pair of Mobile SDKs (Android and iOS) released earlier this month. They have an inherent critical flaw: credential management. If you want to write a mobile application that uses AWS services (but not on behalf of the user), you need to, at some point, have your credentials in memory on the device. You can try to obfuscate them, or minimize the amount of time it happens using various tricks like downloading them to SQLCipher-encrypted SQLite blocks, but at the end of the day, the request has to be signed on the device. This presents a problem because a skilled and determined attacker can run your APK in a simulator and read out your credentials no matter how well you’ve obfuscated them. You can try proxying all your requests through a server that does the signing, but that defeats the purpose of using AWS services in the first place — now you have another hop which can be a point of failure, latency, maintenance, which needs to be scaled, etc. I know this is a problem people are actually concerned about, because I’ve spent time answering questions about it in several places.

IAM solves this problem beautifully. It allows you to programatically create low-privileged user accounts on demand for every user of your mobile application, thereby resolving the problem of storing credentials by making the credentials worthless! The advantages are many:

  • Prevents abusing the account in any way other than what could have been done from the application anyway.
  • Allows you to control access through registration, paywalls, etc.
  • Allows you to individually disable abusive accounts.
  • Immediately provides fine-grained permissions control (like only allowing DeleteObject requests on S3 objects created by the same sub-account — the same functionality on a single account would require complicated bookkeeping somewhere else other than S3)
  • Provides a crude form of analytics. You’ll at least have an upper bound on the number of unique users, and roughly how heavily each one is using it.

To demonstrate how neatly IAM works, I’ve created a proof-of-concept awskeyserver project on GitHub. It’s a Google App Engine service that creates on-demand IAM accounts through a RESTful interface. By default it gives them out to everyone, so hitting http://awskeyserver.appspot.com/create_user?group=Users will return something like AKIAIFY7K3N4Y6OE2DLQ:dGV6MHs3pMjRkT4RZmwnOZndOJG75FyOjpeQYVFA, which are the access credentials for an account in the Users group. In this case, the Users group has precisely no permissions, so it's no big deal that I just posted those on my public blog (a few months ago that would've been a code-red disaster!). A mobile app can cache that and use it for the lifetime of the installation.

But that's not all! If you're noticing way too many requests for accounts, you can just edit permissions.py and add a CaptchaValidator() policy handler to the Users group. Once you do that, the previous request will instead return something like 03AHJ_VuueXyjZt-oM2Bf1K3c8rfsb_NHnjRQPLKDL0Vc1GhYs4LFmcuEYdBTfpGfYJbtRVwNL-OXeuAyApuHqrZ_J90i3qLJDHKepDFGIGTGQR8f1sRQpIchSDowHQHZczdbeLEdJPmR5Paiq_XjwzcJMMrZ4d1UHXQ, which is a reCAPTCHA challenge id. If you look that URL up in reCAPTCHA it'll provide you with a captcha image, whose response has to be filled in (along with the challenge id) to the recaptcha_challenge_field and recaptcha_response_field query parameters to awskeyserver. Only then will it cough up the credentials, giving you a way to stem the flow. Captcha's are one simple way of doing that, but I'll be adding more context-aware validators to awskeyserver as time goes on.

To show how this works in practice, I've taken the sample AwsDemo application that comes with the AWS SDK for Android and modified it[1] to work without any local credentials at all. The modification only lives in one file, AwsDemo.java, which I've Gisted. It can handle both plain and captcha policies, and the returned credentials are limited by whatever permissions you set on the server side. In the example below, the group has the following policy file:

{
  "Statement":[{
    "Effect":"Allow",
    "Action":["sdb:ListDomains"],
    "Resource":"*"
  }]
}

and awskeyserver has the policy:

policy = { "Users" : CaptchaValidator() }

So as you'll see in the screenshots, listing domains works without a problem, but trying to access them results in an error:

Captcha-enabled key server challenge

Once the credentials are there, they're for listing SDB domains only

Any other operation results in an exception.


This sort of behavior was literally impossible to achieve safely before IAM, and now it's an afternoon's worth of work. That's the reason I think IAM will be so huge -- up until now, consumers of AWS have always had to hide behind a server, performing requests on the behalf of clients. Amazon always warned you never to give out your secret key (under any circumstances!) but people still did to services like s3fm or Ylastic, because there was simply no other way for them to work. IAM for the first time opens up the possibility for letting clients make their own requests, directly, without having to go against their own AWS accounts to do so. I imagine Dropbox's backend architecture would look quite different if it had been designed in a post-IAM world. They could have saved so much by pushing some of the logic their servers perform to the client side and letting it upload to S3 directly.

So yeah. I think 2011 will be the year of client-side AWS applications, and it will be IAM that allows this to happen. If I'm wrong, that means something even cooler comes out of there next year, and I can't wait to see what it is.


[1]: This is not how I think production code should look like, by the way. It's very proof-of-concept.
[2]: Well, with one exception. The bottom 10% of this.

Written by Adrian Petrescu

December 31, 2010 at 5:52 am

Posted in Computer Science, Development

Tagged with ,

Knuth’s “Why Pi” Talk at Stanford: Part 1

with 5 comments

A few weeks ago I had another opportunity to meet a giant in my field: Donald Knuth was giving his annual Christmas Tree lecture at Stanford, where I happened to be near at the time. The most exciting part was probably right at the very beginning, when Knuth announced that, for the first (“and probably only”) time, he had finished two books on that same day: The Art of Computer Programming: Volume 4A, and Selected Papers on Fun and Games, with the pre-order links on Amazon going up the same day.

The topic of this year’s lecture was, according to the site:

Mathematicians have known for almost 300 years that the value of n! (the product of the first n positive integers) is approximately equal to\left(\frac{n}{e}\right)^n\sqrt{2\pi n}, where\piis the ratio of the circumference of a circle to its diameter. That’s “Stirling’s approximation.” But it hasn’t been clear why there is any connection whatever between circles and factorials; the appearance of\pihas seemed to be purely coincidental, one of the amazing accidents of nature. Recently however Johan Wästlund has found a nice explanation of this phenomenon, and the reason turns out to be connected with the theory of trees.

I found the lecture originally hard to follow; although Knuth is now 74 years old and physically growing quite frail, his mind was still evidently much sharper than the audience’s; he was completely prepared for every question regarding extensions and corner cases of the material he was presenting. I considered the notes I took to be somewhat useless since I had spent so much time just trying to follow his reasoning. However, when the video became available, I had no excuse left, so I went through it again armed with a pen and the pause button, and I now have a set of notes somewhat worthy of posting here for anyone who missed it. Any errors or inconsistencies are almost definitely my fault and not his.

The notes themselves are after the jump.
Read the rest of this entry »

Written by Adrian Petrescu

December 28, 2010 at 4:41 am

Posted in Computer Science, Mathematics

Tagged with ,

Some cool Zsh magic

with 2 comments

If you’re a programmer, chances are you spend a lot of time deep inside some nested folder hierarchy. Oftentimes you want to jump back to some fixed point within that hierarchy, but you’re not exactly sure how many levels deep you are, so you just cd ../../..., using tab completion to guide you until you land where you want.

I find this annoying.

Ideally I would want to have some syntax like cd .../foo which would go back as many levels as needed until it finds foo (or /). If you're using Bash, your best bet is to write a wrapper for cd which does exactly that. If you're using Zsh (and you should be!) you can be much more clever. Add the following to your .zshrc file:

setopt extendedglob
function cdd() { builtin cd (../)##$1([1]) }

This is basically a high-powered alias that will use extended glob matching until it finds the shortest path of the form ../(../)*$1, where $1 is the first argument. The double-hashes ensure that at least one occurrence of (../) has to be there (otherwise it could match something in the current working directory). With that all set up, it's a simple matter of:

$ pwd
/home/user/a/b/c/d/e/f
$ cdd c
$ pwd
/home/user/a/b/c

Good! But not quite good enough. This works for navigating with cd, but any other tool which uses paths would need their own command to pull off a similar trick. If you're using Bash, this is really the end of the line for you. Zsh users, however, can man zshall, grep for zsh_directory_name, and get a big smile on their face. zsh_directory_name is a shell function that basically allows you to process any path entered in the form ~[path] in the substitution phase. With this facility, we can add the following to .zshrc:

zsh_directory_name () { [[ $1 != n ]] && return 1; reply=( (../)##$2([1]) ) }

The 'n' string checks whether zsh_directory_name is being called to do directory expansion (n) or directory naming (d). If so, the reply array is basically doing the same trick cdd does on $2.

Once we have that, we can operate on any token:

$ pwd
/home/user/a/b/c/d/e/f
$ ls ~[a]
b
$ mkdir ~[a]/g
$ ls ~[a]
b g

Awesome!

Written by Adrian Petrescu

December 3, 2010 at 1:13 am

Posted in Development

Tagged with

Next Steps in GAE-AWS SDK

leave a comment »

I’ve been getting lots of great feedback on the GAE port of the AWS SDK for Java that I released a few months back as part of a school project I was working on. As I’d hoped, it’s grown way beyond that, and there’s lots of people that have let me know they’re making use of it with GAE. For my part, I’ve been keeping it up-to-date and working within a few days of each new AWS SDK for Java release — all 14 of them!

Unfortunately (as those people are well aware) it’s far from complete, and is actually quite buggy. This is not the AWS SDK’s fault, but rather due to the hacks I needed to put in place to get around GAE’s restrictions and bugs. Regardless of where the blame lies, however, the point is still that it has been holding back the improvements people have been asking for (chief among of them being support for S3).

Today I’m making a “new release” available. Unfortunately it doesn’t include S3 support, which still requires a large amount of rethinking, but it does include a major piece of functionality: suites of integration tests. Setting up test cases for the SDK on App Engine as well as the local Jetty server is a painful and time-consuming process (and don’t assume that those two things behave the same — they don’t at all, with respect to GAE-AWS). It’s been the main thing holding back rapid development. But now you can visit http://gae-aws-sdk-test.appspot.com, enter your AWS credentials, choose some service test suites, and see the current development version of the SDK running on App Engine!

AWS Tests Running on GAE

As you can see, they don't all pass yet...

The exact same thing works locally (though you may be surprised to see very different results!) If you don’t feel comfortable sending your AWS credentials to my GAE app, feel free to download the code and run your own instance.

Google has really instilled some testing discipline in me, so I feel much more comfortable now ripping out the innards of the S3 client in order to make it GAE-ready and satisfy some of the requests I’ve been getting. Until then, grab the GAE-AWS SDK for the other 15 or so services and play around with it. Now that there’s a free tier for AWS to go along with free Google App Engine quota, the barrier to entry for cloud computing is at an historic low.

Enjoy!

Written by Adrian Petrescu

November 3, 2010 at 3:13 am

Posted in Development

Tagged with ,

S3FS

leave a comment »

S3FSA short while ago, I took a renewed interest in Randy Rizun’s S3FS project (which had mostly been stagnating) and cloned it to GitHub. It’s basically a FUSE module backed by an S3 bucket. It’s often used on EC2 instances to provide a shared persistent store (which means no EBS); nothing too fancy, but it nicely combines several of my technical interests. I began cleaning it up to prepare it for packaging to Debian and Gentoo, which involved moving it to Autotools, when I was contacted by another developer who had simultaneously decided to rehabilitate the project as well. He had taken the extra effort of contacting Randy about getting us commit access to the original project, which was swiftly granted, so now those changes have been rolled in. Dan also fixed up a lot of outstanding issues, and I’ve started refactoring the old code. The renewed activity must have caught users’ attention as well, because we’ve started getting patches, such as this one to cache read-only files. So it looks like there’s bright things in store for S3FS! Some of the things on my road map (and I don’t speak for Dan here) are:

  • .debs and .ebuilds for inclusion into (at least) the distributions I’m familiar with, and which are commonly used with EC2.
  • Port to MacFUSE.
  • GVFS/KIO wrappers for proper integration into Gnome/KDE. I haven’t done much research into this one yet, so I’m not sure how much of an undertaking this will turn out to be.
  • Intelligent caching.
  • Some rudimentary mapping of ACLs to Unix permissions.
  • General code clean-up and refactoring.
So, this post is for people searching on information about S3FS to stumble on and know what’s in store for the project.

Written by Adrian Petrescu

October 25, 2010 at 6:36 am

Posted in Computer Science, Development

Tagged with

The Knight Metric

with 3 comments

Here’s a problem I’ve been thinking about. I’m curious if anybody has any ideas about it; I have reasons to believe it might be very hard (in the theoretical sense) but I haven’t given it much time yet.

Consider a (possibly infinite) two-dimensional integer grid G\subseteq \mathbb Z^2, with some finite subset of its cells colored black. We consider two cellsx, yto be connected if ||x - y|| = \sqrt{5}; that is, if the cells are a knight’s move apart. Write a function that computes the least number of additional cells that would need to be colored black in order to ensure that there exists a connected path between any two black cells.

The trickiness comes from the fact that adding a black square can either increase or decrease the total number (it’s not hard to come up with examples of either possibility), so the problem resists simplification. I have no idea if this can be computed in polynomial time or space. I thought of it in the context of geometric constructions, for which computability results are notoriously difficult, but it smells like this can be solved in purely graph-theoretic terms. There is actually a closed formula for the “knight’s distance” between two points, though it is a bit complicated:

d_K(x,y) = \max{\left(\lceil\frac{M}{2}\rceil, \lceil\frac{M+m}{3}\rceil\right)} + (M+m) - \left(\max{\left(\lceil\frac{M}{2}\rceil, \lceil\frac{M+m}{3}\rceil\right)}\pmod{2}\right)

whereM=\max{\left(|x|,|y|\right)} andm=\min{\left(|x|, |y|\right)}. If we take each pair of given black squares, calculate their shortest distance, we can use that as an admissible heuristic to an A* search to find the shortest paths in polynomial time. Easy.

Except that it doesn’t work. It’s minimizing the wrong thing: distance, instead of the minimum number of squares. Consider the following configuration:

Figure A

Figure A

It’s easy to see that there’s (two) ways to solve this problem by adding just a single square (either x or y). Therefore, the answer to our problem is 1. However, the above approach would instead notice that it could make all the distances be less than or equal to 2 by adding x and y and z, while adding just x would maked_K(A,B)=4and just y would maked_K(A,D)=4.

Thus the problem has some non-local properties that I don’t really know how to translate into graph-theoretic terms, unless I just consider all paths within a certain ball whose size is linear in the diameter of the graph, but that would be exponential.

If anyone has any clever ideas, I’d be interested in hearing about it.

Written by Adrian Petrescu

October 25, 2010 at 5:51 am

Java puzzler: Generic type inference

with 3 comments

Here’s an issue I ran into while hacking a type system to work with some very inflexible generated code; I’ve distilled the strangeness into the simplest form I could.

Suppose you have a simple type hierarchy:

public class A { }
public class B extends A { }
public class C extends A { }

Then, it turns out that the following code works just fine:

List<Class<? extends A>> list = ImmutableList.of(B.class, C.class);

Surprisingly, however, the following code fails to compile(!):

List<Class<? extends A>> list = ImmutableList.of(B.class);

with the error

incompatible types
found   : com.google.common.collect.ImmutableList<java.lang.class<B>>
required: java.util.List<java.lang.class<? extends A>>
    List<Class<? extends A>> list = ImmutableList.of(
    ...

 

That's right; adding the parameter makes this typesafe! (Note that my example uses ImmutableLists from the Google Common Collections, but this is actually happening because of Java's type inference mechanism, not anything related to that particular class). Of course, it's easy to work around it, but the interesting thing is what can be learned about Java by understanding the problem here.

I'll post a "solution" in a few days, in case anyone wants to give it a shot.

Written by Adrian Petrescu

October 13, 2010 at 12:29 am

Posted in Computer Science, Development

Tagged with

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: