AWS AppSync for Long Running Amazon Bedrock Invocation
Amazon Bedrock is a generative AI service, and is the easiest way to build and scale generative AI applications with foundation models (FMs). GraphQL with AWS AppSync has always been a great way of better integrating and building application with an LLM. AWS AppSync has built in websocket support out of the box, via subscriptions, which makes streaming responses from LLM’s also a much simpler task.
I have implemented a use case where the AppSync mutations takes more than 30 seconds, since the synchronous lambda need to return a response one of the design pattern: invokes a Lambda synchronously which needs to put the contents of the query on a SQS queue, to avoid encountering a timeout. This message is then picked up and processed by another Lambda which feeds the prompt into Amazon Bedrock. This limitation can be eliminated by using asynchronous lambda. In this blog we will discuss how to implement this design. This will help in creating Responsive and scalability of bedrock application by executing the API in the backend
The above diagram show a reference architecture for invoking the api in background. The Lambda function invoked asynchronously from AWS AppSync, the called Lambda function can respond immediately to AWS AppSync and then continue working on sending the prompt to Amazon Bedrock and then subsequently then start to invoke a mutation with the response tokens from Bedrock on AWS AppSync, which then sends those tokens to the user via WebSockets.
Flow of the architecture:
- User will have a subscription, which sets up a WebSocket, and makes a request to AWS AppSync to trigger the invocation.
- AWS AppSync calls your AWS Lambda function in Event mode and immediately returns a response to the client.
- Your Lambda function invokes the model on Amazon Bedrock. The Lambda function can use a synchronous API, such as
InvokeModel
, or a stream API, such asInvokeModelWithResponseStream
, to get progressive updates. - As updates are received, or when the invocation completes, the Lambda function sends updates via mutations to your AWS AppSync API which triggers subscriptions.
- The subscription events are sent in real-time and received by your client over the WebSocket.
export function request(ctx) {
return {
operation: "Invoke",
invocationType = "Event",
payload: {
variables: ctx.arguments
},
};
}
export function response(ctx) {
return "OK";
To implement an asynchronous invocation of an AWS Lambda function from AppSync we simply change the resolver call to the function .We can use the InvocationType of Event which will enable the resolver to call the Lambda asynchronously. The response function of the resolver can return a static response to indicate to the caller that AppSync has received the request for processing.
The Lambda data source allows you to define two invocation types: RequestResponse
and Event
. The invocation types are synonymous with the invocation types defined in the Lambda API. The RequestResponse
invocation type lets AWS AppSync call your Lambda function synchronously to wait for a response. The Event
invocation allows you to invoke your Lambda function asynchronously. For more information on how Lambda handles Event
invocation type requests, see Asynchronous invocation. The invocationType
field is optional. If this field is not included in the request, AWS AppSync will default to the RequestResponse
invocation type.
The sample GraphQL schema can be defined as below:
schema {
mutation: Mutation
subscription: Subscription
}
type Mutation {
ModelSet: Boolean!
sendChunk(chunk: String!): String
}
type Subscription {
chat: String @aws_subscribe(mutations: ["sendChunk"])
}
The above sample setup ensures that our schema is valid while focusing on the mutations and subscriptions needed for the implementation. The ModelSet mutation is used to trigger our model, and the sendChunk
mutation is used to handle streaming responses. The chat
subscription listens for updates from the sendChunk
mutation, allowing to emulate a streaming response.
Conclusion
AWS AppSync support for asynchronous Lambda invocations is a significant advancement in building modern, responsive, and scalable applications. The final response payload size for subscriptions cannot exceed 240K which means careful design considerations. provides architectural flexibility, allowing you to design applications that are robust, efficient, and future-proof.