AWS Lambda SnapStart - Reduce Cold Starts by up to 90%

2024.12.23

What is Lambda SnapStart

AWS Lambda SnapStart is a feature that significantly reduces startup time by saving an initialized snapshot of the function and restoring it during cold starts.

Supported Languages

RuntimeSupport Status
Java 11, 17, 21Fully Supported
Python 3.12+Supported
.NET 8Supported
Node.jsComing Soon

How It Works

Normal Cold Start:
1. Container startup
2. Runtime initialization
3. Function code loading
4. Initialization code execution
5. Handler execution
→ Total: 2-10 seconds (for Java)

SnapStart:
1. Restore cached snapshot
2. Handler execution
→ Total: 200-500ms

Snapshot Lifecycle

flowchart TB
    subgraph FirstDeploy["Initial Deployment"]
        F1["1. Initialize the function"]
        F2["2. Snapshot the memory state after initialization"]
        F3["3. Save snapshot to cache"]
        F1 --> F2 --> F3
    end

    subgraph LaterStart["Subsequent Startups"]
        L1["1. Restore snapshot from cache"]
        L2["2. Execute afterRestore hooks"]
        L3["3. Execute handler"]
        L1 --> L2 --> L3
    end

    FirstDeploy --> LaterStart

Configuration Methods

AWS Console

Lambda > Functions > Configuration > General configuration > SnapStart
→ Select "PublishedVersions"

AWS SAM

# template.yaml
Resources:
  MyFunction:
    Type: AWS::Serverless::Function
    Properties:
      Runtime: java21
      Handler: com.example.Handler::handleRequest
      SnapStart:
        ApplyOn: PublishedVersions
      AutoPublishAlias: live

Terraform

resource "aws_lambda_function" "example" {
  function_name = "my-function"
  runtime       = "java21"
  handler       = "com.example.Handler::handleRequest"

  snap_start {
    apply_on = "PublishedVersions"
  }
}

resource "aws_lambda_alias" "live" {
  name             = "live"
  function_name    = aws_lambda_function.example.function_name
  function_version = aws_lambda_function.example.version
}

Runtime Hooks

Java (CRaC)

import org.crac.Context;
import org.crac.Core;
import org.crac.Resource;

public class Handler implements RequestHandler<APIGatewayProxyRequestEvent, APIGatewayProxyResponseEvent>, Resource {

    private Connection dbConnection;

    public Handler() {
        // Register during initialization
        Core.getGlobalContext().register(this);
        // Establish DB connection
        this.dbConnection = createConnection();
    }

    @Override
    public void beforeCheckpoint(Context<? extends Resource> context) {
        // Close DB connection before snapshot
        dbConnection.close();
    }

    @Override
    public void afterRestore(Context<? extends Resource> context) {
        // Re-establish DB connection after restore
        this.dbConnection = createConnection();
    }

    @Override
    public APIGatewayProxyResponseEvent handleRequest(
            APIGatewayProxyRequestEvent event,
            Context context) {
        // Handler logic
    }
}

Python

import boto3
from aws_lambda_powertools import Logger

logger = Logger()

# Global variable (included in snapshot)
db_client = None

def init_db():
    global db_client
    db_client = boto3.client('dynamodb')

# Execute during initialization phase
init_db()

# If re-initialization is needed after restore
def on_restore():
    global db_client
    # Refresh connection
    db_client = boto3.client('dynamodb')

def handler(event, context):
    # Use db_client restored from snapshot
    return db_client.get_item(...)

Important Notes

Ensuring Uniqueness

// Bad example: Value from snapshot time is reused
private final String uniqueId = UUID.randomUUID().toString();

// Good example: Generate per request
public APIGatewayProxyResponseEvent handleRequest(...) {
    String uniqueId = UUID.randomUUID().toString();
    // ...
}

Network Connections

// Close connections before snapshot
@Override
public void beforeCheckpoint(Context<? extends Resource> context) {
    httpClient.close();
    dbConnection.close();
}

// Reconnect after restore
@Override
public void afterRestore(Context<? extends Resource> context) {
    httpClient = HttpClient.newHttpClient();
    dbConnection = dataSource.getConnection();
}

Performance Comparison

Spring Boot (Java 17):
- Normal: 6,000ms
- SnapStart: 400ms (93% reduction)

Quarkus (Java 17):
- Normal: 1,500ms
- SnapStart: 200ms (87% reduction)

Python:
- Normal: 800ms
- SnapStart: 150ms (81% reduction)

Pricing

No Additional Cost
- SnapStart itself is free
- Normal Lambda charges only
- Snapshot storage is also free

Best Practices

✓ Optimize initialization code (heavy processing during initialization)
✓ Properly implement beforeCheckpoint/afterRestore hooks
✓ Generate unique values per request
✓ Refresh connection pools after restore
✓ Use versions/aliases

Summary

Lambda SnapStart is a feature that dramatically improves cold start issues. It is particularly effective for runtimes with long startup times like Java, and can be used at no additional cost. With proper runtime hook implementation, it can be safely utilized in production environments.

← Back to list