World Line

Posts Tagged ‘AWS

AWS Key Server

with 15 comments

It’s the last day of 2010, and as an AWS fan there’s a lot to reflect on from the last year. There’s been several new SDKs, a couple of new services, more than twice as many consoles as there were this time last year, and so many major improvements to existing services that if I linked them all, every word in this post would be blue. It’s been really exciting, and nobody could have predicted half of this stuff in 2009.

With that in mind, I’m still going to go ahead and make a bold prediction: I believe, when all is said and done, that the most important release of 2011 will end up being the AWS Identity and Access Management (IAM) service, which is currently still in “Preview Beta”. I don’t know what else 2011 will bring, but I’d be surprised to see anything top this.

This needs justification, of course, since the beta has been proceeding with extremely little fanfare and very few of the AWS community have been talking about it. I think this is because it hasn’t been interpreted right yet; people (including Amazon’s own documentation[2]) still only refer to its ability to split up developers’ access to the account, so that testers can only access test instances, database maintainers can only access RDS operations, etc. This is all well and good, but by itself is not the reason I’m excited about IAM.

To explain, I have to bring up the pair of Mobile SDKs (Android and iOS) released earlier this month. They have an inherent critical flaw: credential management. If you want to write a mobile application that uses AWS services (but not on behalf of the user), you need to, at some point, have your credentials in memory on the device. You can try to obfuscate them, or minimize the amount of time it happens using various tricks like downloading them to SQLCipher-encrypted SQLite blocks, but at the end of the day, the request has to be signed on the device. This presents a problem because a skilled and determined attacker can run your APK in a simulator and read out your credentials no matter how well you’ve obfuscated them. You can try proxying all your requests through a server that does the signing, but that defeats the purpose of using AWS services in the first place — now you have another hop which can be a point of failure, latency, maintenance, which needs to be scaled, etc. I know this is a problem people are actually concerned about, because I’ve spent time answering questions about it in several places.

IAM solves this problem beautifully. It allows you to programatically create low-privileged user accounts on demand for every user of your mobile application, thereby resolving the problem of storing credentials by making the credentials worthless! The advantages are many:

  • Prevents abusing the account in any way other than what could have been done from the application anyway.
  • Allows you to control access through registration, paywalls, etc.
  • Allows you to individually disable abusive accounts.
  • Immediately provides fine-grained permissions control (like only allowing DeleteObject requests on S3 objects created by the same sub-account — the same functionality on a single account would require complicated bookkeeping somewhere else other than S3)
  • Provides a crude form of analytics. You’ll at least have an upper bound on the number of unique users, and roughly how heavily each one is using it.

To demonstrate how neatly IAM works, I’ve created a proof-of-concept awskeyserver project on GitHub. It’s a Google App Engine service that creates on-demand IAM accounts through a RESTful interface. By default it gives them out to everyone, so hitting http://awskeyserver.appspot.com/create_user?group=Users will return something like AKIAIFY7K3N4Y6OE2DLQ:dGV6MHs3pMjRkT4RZmwnOZndOJG75FyOjpeQYVFA, which are the access credentials for an account in the Users group. In this case, the Users group has precisely no permissions, so it’s no big deal that I just posted those on my public blog (a few months ago that would’ve been a code-red disaster!). A mobile app can cache that and use it for the lifetime of the installation.

But that’s not all! If you’re noticing way too many requests for accounts, you can just edit permissions.py and add a CaptchaValidator() policy handler to the Users group. Once you do that, the previous request will instead return something like 03AHJ_VuueXyjZt-oM2Bf1K3c8rfsb_NHnjRQPLKDL0Vc1GhYs4LFmcuEYdBTfpGfYJbtRVwNL-OXeuAyApuHqrZ_J90i3qLJDHKepDFGIGTGQR8f1sRQpIchSDowHQHZczdbeLEdJPmR5Paiq_XjwzcJMMrZ4d1UHXQ, which is a reCAPTCHA challenge id. If you look that URL up in reCAPTCHA it’ll provide you with a captcha image, whose response has to be filled in (along with the challenge id) to the recaptcha_challenge_field and recaptcha_response_field query parameters to awskeyserver. Only then will it cough up the credentials, giving you a way to stem the flow. Captcha’s are one simple way of doing that, but I’ll be adding more context-aware validators to awskeyserver as time goes on.

To show how this works in practice, I’ve taken the sample AwsDemo application that comes with the AWS SDK for Android and modified it[1] to work without any local credentials at all. The modification only lives in one file, AwsDemo.java, which I’ve Gisted. It can handle both plain and captcha policies, and the returned credentials are limited by whatever permissions you set on the server side. In the example below, the group has the following policy file:

{
  "Statement":[{
    "Effect":"Allow",
    "Action":["sdb:ListDomains"],
    "Resource":"*"
  }]
}

and awskeyserver has the policy:

policy = { "Users" : CaptchaValidator() }

So as you’ll see in the screenshots, listing domains works without a problem, but trying to access them results in an error:

Captcha-enabled key server challenge

Once the credentials are there, they're for listing SDB domains only

Any other operation results in an exception.


This sort of behavior was literally impossible to achieve safely before IAM, and now it’s an afternoon’s worth of work. That’s the reason I think IAM will be so huge — up until now, consumers of AWS have always had to hide behind a server, performing requests on the behalf of clients. Amazon always warned you never to give out your secret key (under any circumstances!) but people still did to services like s3fm or Ylastic, because there was simply no other way for them to work. IAM for the first time opens up the possibility for letting clients make their own requests, directly, without having to go against their own AWS accounts to do so. I imagine Dropbox‘s backend architecture would look quite different if it had been designed in a post-IAM world. They could have saved so much by pushing some of the logic their servers perform to the client side and letting it upload to S3 directly.

So yeah. I think 2011 will be the year of client-side AWS applications, and it will be IAM that allows this to happen. If I’m wrong, that means something even cooler comes out of there next year, and I can’t wait to see what it is.


[1]: This is not how I think production code should look like, by the way. It’s very proof-of-concept.
[2]: Well, with one exception. The bottom 10% of this.

Written by Adrian Petrescu

December 31, 2010 at 5:52 am

Posted in Computer Science, Development

Tagged with ,

Next Steps in GAE-AWS SDK

leave a comment »

I’ve been getting lots of great feedback on the GAE port of the AWS SDK for Java that I released a few months back as part of a school project I was working on. As I’d hoped, it’s grown way beyond that, and there’s lots of people that have let me know they’re making use of it with GAE. For my part, I’ve been keeping it up-to-date and working within a few days of each new AWS SDK for Java release — all 14 of them!

Unfortunately (as those people are well aware) it’s far from complete, and is actually quite buggy. This is not the AWS SDK’s fault, but rather due to the hacks I needed to put in place to get around GAE’s restrictions and bugs. Regardless of where the blame lies, however, the point is still that it has been holding back the improvements people have been asking for (chief among of them being support for S3).

Today I’m making a “new release” available. Unfortunately it doesn’t include S3 support, which still requires a large amount of rethinking, but it does include a major piece of functionality: suites of integration tests. Setting up test cases for the SDK on App Engine as well as the local Jetty server is a painful and time-consuming process (and don’t assume that those two things behave the same — they don’t at all, with respect to GAE-AWS). It’s been the main thing holding back rapid development. But now you can visit http://gae-aws-sdk-test.appspot.com, enter your AWS credentials, choose some service test suites, and see the current development version of the SDK running on App Engine!

AWS Tests Running on GAE

As you can see, they don't all pass yet...

The exact same thing works locally (though you may be surprised to see very different results!) If you don’t feel comfortable sending your AWS credentials to my GAE app, feel free to download the code and run your own instance.

Google has really instilled some testing discipline in me, so I feel much more comfortable now ripping out the innards of the S3 client in order to make it GAE-ready and satisfy some of the requests I’ve been getting. Until then, grab the GAE-AWS SDK for the other 15 or so services and play around with it. Now that there’s a free tier for AWS to go along with free Google App Engine quota, the barrier to entry for cloud computing is at an historic low.

Enjoy!

Written by Adrian Petrescu

November 3, 2010 at 3:13 am

Posted in Development

Tagged with ,

S3FS

leave a comment »

S3FSA short while ago, I took a renewed interest in Randy Rizun’s S3FS project (which had mostly been stagnating) and cloned it to GitHub. It’s basically a FUSE module backed by an S3 bucket. It’s often used on EC2 instances to provide a shared persistent store (which means no EBS); nothing too fancy, but it nicely combines several of my technical interests. I began cleaning it up to prepare it for packaging to Debian and Gentoo, which involved moving it to Autotools, when I was contacted by another developer who had simultaneously decided to rehabilitate the project as well. He had taken the extra effort of contacting Randy about getting us commit access to the original project, which was swiftly granted, so now those changes have been rolled in. Dan also fixed up a lot of outstanding issues, and I’ve started refactoring the old code. The renewed activity must have caught users’ attention as well, because we’ve started getting patches, such as this one to cache read-only files. So it looks like there’s bright things in store for S3FS! Some of the things on my road map (and I don’t speak for Dan here) are:

  • .debs and .ebuilds for inclusion into (at least) the distributions I’m familiar with, and which are commonly used with EC2.
  • Port to MacFUSE.
  • GVFS/KIO wrappers for proper integration into Gnome/KDE. I haven’t done much research into this one yet, so I’m not sure how much of an undertaking this will turn out to be.
  • Intelligent caching.
  • Some rudimentary mapping of ACLs to Unix permissions.
  • General code clean-up and refactoring.
So, this post is for people searching on information about S3FS to stumble on and know what’s in store for the project.

Written by Adrian Petrescu

October 25, 2010 at 6:36 am

Posted in Computer Science, Development

Tagged with

AWS Automator Actions

leave a comment »

Automator has always been one of those OS X features that I feel never gets enough attention, except when it’s in one of those “comparison” articles that always seems to mention it because it’s so unique and novel; but once the initial wave of First Look and Sneak Peek and Oh-my-God-I-can’t-believe-we’re-the-only-magazine-covering-this articles dies out, you never hear about it again, even though there is, in fact, a fairly active community of users and developers around it.

In fact, Snow Leopard added, with little fanfare, many useful features to Automator 2.1; the most interesting among these, I think, is the ability to run shell scripts in anything that emulates a shell, including Python (and Ruby, and Perl). Now, of course, there had been hand-made solutions to this before (Automator’s extendability being another not-often-touted feature) but having it built-in just makes distribution and adoption easier. Apple’s version, as well, seems a bit simpler since it doesn’t use the PyObjC bridge to pass around the full AppleScript event descriptor, instead just passing the text in through argv or stdin. Of course this means it’s strictly less powerful than Johnathan’s version, but let’s face it — if you really need to be using this bridge, it’s probably overkill for Automator anyway. Besides, it’s almost always possible to save whatever kind of data you need (an image, most commonly) to a temporary file in /tmp and just pass the path to Python, anyway.

Anyway, although I’m not a heavy Automator user by any stretch (most people who can write proper scripts aren’t, after all), I’m a big proponent of it as an idea, particularly in the way it easily allows non-technical users to automate away a lot of the tedium that comprises the majority of many people’s computing experience. And even if you can script something in the traditional way, Automator is often the easiest way to Cocoa-ize a simple script so that it can grab selections from Finder, insert itself into the Services menu, etc. Yesterday I wanted to do something to this effect — an easy way to upload directory structures to S3, sort of as an impulse click rather than going through the trouble of opening up a client, etc., since it’s an operation Rachel and I perform manually all the time. It was delightfully simple to use Automator to pass in Finder selections to a simple Python script that did the heavy lifting. Without Automator, the boilerplate would have been many times longer than the remarkably simple script itself:

import os
import sys

AWS_ACCESS_KEY = sys.argv[1]
AWS_SECRET_KEY = sys.argv[2]
bucket_name = sys.argv[3].lower()
return_url = ""

files = []

for f in sys.argv[4:]:
	files.append(f)

sys.path.insert(0, '/sw/lib/python2.6/site-packages')
import boto
import boto.s3 as s3
from boto.s3.key import Key

def recursively_upload(f, prefix = ''):
	if (os.path.isdir(f)):
		for ff in os.listdir(f):
			recursively_upload('/'.join([f, ff]), ''.join([prefix, f[f.rindex('/')+1:], '/']))
	else:
		global return_url
		k = Key(bucket)
		k.key = ''.join([prefix, f[f.rindex('/')+1:]])
		k.set_contents_from_filename(f, policy='public-read')
		return_url = "".join(["http://", bucket_name, ".s3.amazonaws.com/", k.key])

conn = boto.connect_s3(AWS_ACCESS_KEY, AWS_SECRET_KEY)
bucket = conn.create_bucket(bucket_name, location=s3.connection.Location.DEFAULT)

for f in files:
	recursively_upload(f)

print return_url

The only gotcha is the sys.path.insert(0, '/sw/lib/python2.6/site-packages') line, which is there because the sandboxed shell that Automator spawns has no access to PYTHONPATH or anything; you have to programatically load third-party modules if you want them. The module, in this case, is boto, an AWS library for Python; you’ll have to change that line to the appropriate path for your installation if you want to run it (or, if you installed boto-py26 through Fink, don’t do anything). When you combine the above script with the following Automator workflow:

You know what that is? That's dozens of lines of boring Python that didn't get written!

Equally happy with flat files and directories

Simply saving that in Automator automatically gives you what you see on the right here. Also, I took both these screenshots in just a few seconds with another Automator script that took just a few minutes to write that automatically takes a screenshot, uploads it to S3, and puts the link into your clipboard. Both of these scripts and workflows are available from my Codebook GitHub repository.

Of course, I think it would be infinitely better to have native Automator actions for common AWS operations, but that would require a solid Objective-C library first, of which there aren’t any yet :(

Written by Adrian Petrescu

July 3, 2010 at 2:41 pm

Posted in Development

Tagged with ,

Introducing the GAE-AWS SDK for Java

with 13 comments

I’m making what I hope will be a useful release today — a version of the Amazon Web Services SDK for Java that will run from inside of Google App Engine. This wouldn’t work if you simply included the JAR that AWS provides directly into GAE’s WAR, because GAE’s security model doesn’t allow the Apache Commons HTTP Client to create the sockets and low-level networking primitives it requires to establish an HTTP connection; instead, Google requires you to make all connections through its URLFetch utility. This version of the SDK does exactly that, swapping out the Apache HTTP Connection Manager for one that returns URLFetchHttpConnections.

With it, you can make use of the multitude of powerful AWS services (excepting S3, which requires more work before it will integrate properly) from within Google App Engine, making both of these tools significantly more useful. In fact, AWS seems to nicely complement several of Google App Engine’s deficiencies; for example, my original motivation for creating this was to be able to make SimpleDB requests from GAE for a school project I’m working on.

Using it is very simple — just download the full package from its GitHub page, and add all of the JAR files in the package (including the third-party/ directory) to Google App Engine’s war/WEB-INF/lib/ directory, and add them to your build path in Eclipse. Then just use the SDK in the usual fashion! For example, the following lines of code:

		AmazonSimpleDB sdb = new AmazonSimpleDBClient(new PropertiesCredentials("AWSCredentials.properties"));
		sdb.setEndpoint("http://sdb.amazonaws.com");

		String serverInfo = getServletContext().getServerInfo();
		String userAgent = getThreadLocalRequest().getHeader("User-Agent");
		return "Hello, " + input + "!<br><br>I am running " + serverInfo
				+ ".<br><br>It looks like you are using:<br>" + userAgent
				+ ".<br><br>You have " + sdb.listDomains().getDomainNames().size() + " SDB domains.";

turns the usual GWT welcome page into this:

Notice the last line there!

I intend to merge in all upstream changes from the real AWS SDK for Java, of which this is simply a GitHub fork. Please post any issues or questions you have in the GitHub page!

Written by Adrian Petrescu

June 22, 2010 at 9:38 am

Posted in Development

Tagged with ,

Follow

Get every new post delivered to your Inbox.

%d bloggers like this: