Step Functions were introduced a month ago at the 2016 re:Invent conference. They offer a way to string together Lambda functions via a number of standard programming constructs. While they certainly offer benefits over gluing things together on your own, this article will discuss the gaps that you'll need to overcome to use Step Functions with Serverless apps.

Step Function Overview

First, lets start with the basics.

As mentioned, Step Functions are a way to glue together AWS Lambda functions. Step Functions provide several programming concepts such as serial execution, parallel execution, and exception handling. The neat thing about Step Functions are that they provide the building blocks often used when composing code. They provide infrastructure code that most programmers often write themselves and provide benefits such as error and state reporting along the way. These are nice things to have without having to do any additional work. As a result, I've started using them to replace traditional coding patterns.

Creating a Service

Let us use a simple example of downloading an assets (such as the current Bitcoin price from Coin Desk's price index), parsing that JSON result, and then persisting that parsed information into some data store (such as DynamoDB).

The Coin Desk price index feed looks like this:

{
    "bpi": {
        "EUR": {
            "code": "EUR",
            "description": "Euro",
            "rate": "910.2181",
            "rate_float": 910.2181,
            "symbol": "€"
        },
        "GBP": {
            "code": "GBP",
            "description": "British Pound Sterling",
            "rate": "777.0171",
            "rate_float": 777.0171,
            "symbol": "£"
        },
        "USD": {
            "code": "USD",
            "description": "United States Dollar",
            "rate": "964.7425",
            "rate_float": 964.7425,
            "symbol": "$"
        }
    },
    "disclaimer": "This data was produced from the CoinDesk Bitcoin Price Index (USD). Non-USD currency data converted using hourly conversion rate from openexchangerates.org",
    "time": {
        "updated": "Jan 5, 2017 21:40:00 UTC",
        "updatedISO": "2017-01-05T21:40:00+00:00",
        "updateduk": "Jan 5, 2017 at 21:40 GMT"
    }
}

Our DynamoDB table is generated with the following metadata:

{
    "AttributeDefinitions": [
        {
            "AttributeName": "date",
            "AttributeType": "N"
        },
        {
            "AttributeName": "price",
            "AttributeType": "N"
        }
    ],
    "TableName": "PriceIndex",
    "KeySchema": [
        {
            "AttributeName": "date",
            "KeyType": "HASH"
        }
    ],    
    "ProvisionedThroughput": {
        "ReadCapacityUnits": 1,
        "WriteCapacityUnits": 1
    }
}

Single Function

The most naive approach to solving this problem would be to create a single Lambda function that performs all three pieces of functionality as a single function. It might look something like this:

'use strict';

let AWS = require('aws-sdk');
let http = require('http');
let docClient = new AWS.DynamoDB.DocumentClient();

module.exports.single = function(event, context, callback) {

  // create http request options
  let feedOptions = {
    host: 'api.coindesk.com',
    path: '/v1/bpi/currentprice.json'
  };

  // initiate the http request
  let req = http.get(feedOptions, (res) => {

    // handle the response
    let buffers = [];
    res.on('data', (buffer) => buffers.push(buffer));
    res.on('end', () => {

       // convert all the buffers into the output
       let rawData = Buffer.concat(buffers).toString();

       // do some parsing
       let json  = JSON.parse(rawData);
       let date  = new Date(json.time.updatedISO).getTime();
       let price = json.bpi.USD.rate;
       let item = {
         date,
         price
       };

       // then do a database insertion
       docClient.put({
         TableName: 'PriceIndex',
         Item: item
       }, (err) => {

         // execute callback with or item
         if(err) callback(err);
         else    callback(null, item);

       });
    });
  });
  req.on('error', callback);
  req.end();
};

So... that's pretty ugly. That function violates the Single Responsibility Principle. This can certainly be improved by pulling out functionality into individual methods, but at the end of the day it still executes under a single Lambda context, which violates the SRP for Lambda functions.

Lets take an evolution to a better design. Breaking each function into individual Lambdas.

Individual Lambdas

Breaking apart the functionality into individual Lambdas makes a lot of sense. For starters, we get a single Lambda function for each logical piece of code. This has several benefits:

  • function reuse across various pieces of your application
  • logging and duration tracking for each functional piece of code
  • testing of each part in isolation is now possible
  • smaller footprint means you get faster loading and execution time... which translates into reduced cost
  • enable VPC access for only parts of code that require it

So what does it look like to split our functions? Well, lets start with the first piece, making the HTTP request to a feed.

'use strict';
let http = require('http');

module.exports.handler = function(event, context, callback) {

  // create http request options
  let feedOptions = {
    host: 'api.coindesk.com',
    path: '/v1/bpi/currentprice.json'
  };

  // initiate the http request
  let req = http.get(feedOptions, (res) => {
    let buffers = [];
    res.on('data', (buffer) => buffers.push(buffer));
    res.on('end', () => {
      let rawData = Buffer.concat(buffers).toString();
      callback(null, rawData);
    });
  });
  req.on('error', callback);
  req.end();
};

You'll see that this function now returns the result of the feed as a string.

The next piece is actually parsing that string into meaningful values. We'll do this with another function.

'use strict';

module.exports.handler = function(event, context, callback) {
  let json  = JSON.parse(event);
  let date  = new Date(json.time.updatedISO).getTime();
  let price = json.bpi.USD.rate;
  let item = {
    date,
    price
  };
  callback(null, item);
};

This function now requires input, that of the raw string. This function will convert the string into JSON, then extract the date and price. Finally it will return the newly constructed object.

The last function we'll create is used to insert the data into the database.

'use strict';

let AWS = require('aws-sdk');
let docClient = new AWS.DynamoDB.DocumentClient();

module.exports.handler = function(event, context, callback) {
  docClient.put({
    TableName: 'PriceIndex',
    Item: event
  }, (err) => {
    if(err) callback(err);
    else    callback(null, event);
  });
};

This method will accept the input object that was generated by the parsing function and insert it into the database. Pretty cool.

Now, you may be saying to yourself, how do we glue all of this together? And you would be correct.

The downside to breaking apart these function is that we need to consider orchestration. At the 2015 re:Invent there were numerous talks on patterns for building complex applications. To list a few:

  • direct Lambda invocation (Lambda A blocks on call to Lambda B)
  • using SNS/SQS to pass events and messages
  • using Kinesis to pass data
  • using S3 to pass data

I won't focus on these because frankly, we have a solution in this article for handling the orchestration... Step Functions. They prevent us from having to write orchestration code. So let's get started with a Step Function that calls our three Lambda Functions.

We now have the pieces to create a Step Function.

Creating a Step Function

Before you can create one though, you will need to create a new IAM role that has the ability to invoke Lambda functions as shown below:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "lambda:InvokeFunction"
            ],
            "Resource": "*"
        }
    ]
}

The actual state machine definition uses Amazon's JSON language to define the various states and the transition between those states.

In our example, we simply go from one state to the next. The state machine definition looks like this:

{
  "Comment": "Step Function that calls the Lambda",
  "StartAt": "Fetch",
  "States": {
    "Fetch": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:serverless-stepfunctions-dev-fetchFeed",
      "Next": "Parse"
    },
    "Parse": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:serverless-stepfunctions-dev-parseFeed",
      "Next": "Persist"
    },
    "Persist": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:serverless-stepfunctions-dev-writeEntry",
      "End": true
    }
  }
}

The state machine definition species the ARN for each of the Lambda functions that we previously created. You may have noticed that REGION and ACCOUNT_ID are placeholders for the actual values. You will need to modify the above template to match the actual ARNs of your functions.

At this point you can manually execute the step function by creating it through the user interface. This is a good way to test and verify that things are working.

If you're like me, you may feel dirty using the AWS UI to perform code related tasks. Unfortunately there are a few short comings.

Step Function Interactions

Step Functions aren't yet fully baked. There are some issues that make working with them challenging.

Command Line Access

Fortunately, the AWS command line tools do support step functions! You can access them via

aws stepfunctions

You'll be able to list, create, invoke, and delete Step Functions from the command line.

For creating a step function you can use the following:

aws stepfunctions create-state-machine \
--name update-feed \
--role-arn arn:aws:iam::<ACCOUNT_ID>:role/service-role/StatesExecutionRole-us-east-1 \
--definition "$(cat stepfunction.json)"

This allows me to define the Step Function in its JSON formation in the stepfunction.json file. You will need to substitute the name of the state machine and the role as appropriate.

Cloud Formation

Not yet supported :(

This means we can't create Step Functions in the resources block of serverless apps.

AWS SDK

Step Functions are available in the latest version of the SDK.

http://docs.aws.amazon.com/AWSJavaScriptSDK/latest/AWS/StepFunctions.html

So this means that we can access Step Functions from inside code!

Invoke Step Functions from Serverless

The last piece of the puzzle is actually integrating Step Function invocation from inside a Lambda function controlled by serverless. There are a few hoops we need to jump through here first though.

  • As previously mentioned, Step Functions are not supported by Cloud Formation yet. This means that you'll need to create your Step Function from the command line or from code.

  • Once the state machine is created, you will need to configure an IAM statement in your serverless application to grant state execution rights to your serverless Lambda functions.

iamRoleStatements:
  - Effect: Allow
    Action:
      - states:StartExecution
    Resource: "*"
  • You will need to include the latest version of the aws-sdk as part of your function so that you have access to the sdk invocation functions. Unfortunately, the Node Lambda functions include SDK version 2.6.9 which does not have StepFunction support. This adds almost 3.5MB to your deployment.

With all that you can finally code your service to invoke your Step Function:

'use strict';

let AWS = require('aws-sdk');
let stepfunctions = new AWS.StepFunctions();

module.exports.invoke = function(event, context, callback) {
  let accountId = context.invokedFunctionArn.split(':')[4];
  let params = {
    stateMachineArn: `arn:aws:states:us-east-1:${accountId}:stateMachine:update-feed`,
    input: JSON.stringify({ }),
  };
  stepfunctions.startExecution(params, callback);
};

The above function simply invokes the named state machine. You can pass input into the state machine. In this example, it simply passes in an empty object.

If you invoke this Lambda function, it will trigger the State Machine. The Lambda will immediately complete once the State Machine is started.

Conclusion

Step Functions hold promise for simplifying Lambda interaction. There are a few hurdles that need to be jumped through, but considering this technologies is less than a month old, it's safe to assume there will be some improvements! Hopefully this article gets you started on the path of using them.

For a full working example check out:
https://github.com/bmancini55/serverless-stepfunctions

The official support for Step Functions in Serverless can be tracked via GitHub Issue 3024:
https://github.com/serverless/serverless/issues/3024