Distributed QuantLib using AWS Lambda

Here I present a proof of concept for running QuantLib functions in AWS Lambda.

AWS Lambda offers an exciting way to leverage distributed computing without worrying about infrastructure or server provisioning, all you need to do is upload your Lambda function and trigger it using one of the supported triggers. It automatically scales to the size of your workload and you only pay for the amount of time your code was running in Lambda down to a 100 ms interval.

Your AWS Lambda function runs in Amazon’s custom Linux environment which is available as a machine image called Amazon Linux AMI. You must first compile QuantLib in Amazon’s Linux environment. You can either spin up an EC2 instance to do that or you can download and host the VirtualBox image of the Amazon’s Linux environment. I went the second route. Out of the box, Amazon Linux is pretty bare bones and you need to install “Development Tools” package and upgrade python to 3.7 From then on, its the compile QuantLib as usual and create python bindings.

When you upload your Lambda function to AWS, you have two choices when it comes to your dependencies. Either you can package them along with your function or you can use Lambda Layers. The advantage of using Layers is that you can abstract away your dependencies into a package separately and keep reusing the layer for your functions. It also reduces your function package size as there are limits on the size (250 MB). Keep in mind that, the layers are zip files and will be unzipped by AWS environment before running your function. Since QuantLib compiles into .so files, we need to make sure that the files will be unzipped into the correct path on the Lambda instance where Lambda runtime can find them.

I have already gone through this exercise and have created Layers and have made them public as shown in the table below. You will need to reference those ARN (amazon resource name) if you want to add these layers to your Lambdas. I have also used a publicly available Numpy Layer from KLayer in my Lambda functions to provide numpy functionality.

LayerARN
QuantLib116-java-java-layer
arn:aws:lambda:us-east-1:734853675260:layer:QuantLib116-java-java-layer:1
QuantLib116-native-java-layer
arn:aws:lambda:us-east-1:734853675260:layer:QuantLib116-native-java-layer:1
QuantLib116-native-python-layer
arn:aws:lambda:us-east-1:734853675260:layer:QuantLib116-native-python-layer:1
QuantLib116-python-python-layer
arn:aws:lambda:us-east-1:734853675260:layer:QuantLib116-python-python-layer:1
Klayers-python37-numpyarn:aws:lambda:us-east-1:113088814899:layer:Klayers-python37-numpy:1

If you are developing Python 3.7 Lambda function, you will need to include both the python layers from the above table.

Now comes the fun part of writing the Lambda functions. For this demo, I am going to borrow from Mikael Katajamäki’s excellent blog at Exposure Simulation which shows swap exposure simulations for computing expected positive exposure using Hull-White One Factor model. On my laptop, the script takes about 7 and a half minutes to compute swap NPVs for 500 paths and 262 dates for one swap. I have re-engineered the original code so that I have one “worker” function that takes in a path and computes swap NPVs over the date grid, the time step for simulations is 1 week. There is one “controller” Lambda that generates the 500 random paths based on QuantLib HullWhiteProcess and then calls worker Lambdas asynchronously. There is a local script running on my laptop that uses boto3 library to call the controller Lambda to kick off the entire workflow. Each worker Lambda writes the NPVs it has calculated to a file in an S3 bucket that has been specified as environment variable to the Lambda. For the purposes of this demo, the local script keeps polling that S3 bucket to see how many files are available, once we have 500 files (as there are 500 paths) , the script assumes all the work is done and aggregates the results. I am sure there are better ways to signal when all worker Lambda have finished, probably using DynamoDB and atomic counters or using SQS/SNS queues.

To aid in passing required parameters from controller to worker lambda, I save the intermediate data to a file on S3. For example, dates and discount factors needed to build market term structure are loaded in from S3 bucket in both Lambdas. The simulated fixings generated in Controller are saved to S3 and then read by worker Lambdas. The location and file names in S3 are specified as environment variables.

All the code is available at https://github.com/suhasghorp/QuantLib-Lambda

Here are some relevant portions of the controller Lambda function:

Here is how Controller Lambda calls Worker Lambdas:

Using AWS Lambda cold-start, the time taken to compute swap NPVs for 500 paths and 262 dates is about 2 minutes compared to 7 and half minutes on my laptop. After kicking off the controller Lambda, I wait for a minute for the output s3 bucket to have some files, this can be optimized by using another signalling mechanism like queues. The Lambdas themselves can be warmed up before hand so that dependencies are loaded and they are ready to go which could save additional time. In the end, I think AWS ecosystem presents an interesting alternative to multi-threading or GPU based computing solutions.