Introducing: Adaptive Scaling and Function Chaining Auto-Scaling
Today we announce two new features: Adaptive Scaling allows developers to define a scaling graph for a specific function and Function Chaining Auto-Scaling will scale functions, that are part of the request flow, and will provide a completely warm function-chain.
December 04, 2019
Cold start is one of the biggest pain-points FaaS users experience. To overcome this, FaaS platforms keep functions “warm” after the first invocation to try and minimize the penalty of cold starts.
With time, developers started using it to their advantage and pre-warmed functions.
Keeping the functions warm is a limited solution that does not scale and doesn’t solve problems with peaky traffic.
Nuweba overcame this obstacle by offering a platform performance that is 10+x times faster than other leading platforms. In addition, Nuweba offers warm function without re-using instances, which eliminate operational issues (memory leaks, unpredictable bugs) and security issues (function poisoning using RCE)
FaaS platforms like AWS Lambda, Azure Functions, and GCF offer predictable scaling; Either one instance that can handle one request/invocation (for example AWS Lambda), or one instance that can handle multiple requests (for example Azure Functions). Platforms that deploy one instance per request are limited to 1:1 scaling, and for every new request that is not met with an available instance, the system will scale by one with the dreaded cold start.
Today we announce the next advancement in our cutting edge FaaS performance technology - Adaptive Scaling and Function Chaining Auto-Scaling.
Enabled by our ultra-fast invocation overhead (and specifically by the platform’s ability to provide runtime, code size and package size agnostic invocation overhead) there is no need to pre-warm functions, Instead, we can just customize the scaling behavior to minimize tail latency.
Adaptive Scaling allows developers to define a scaling graph (envelope) for a specific function that will instruct our auto-scaler on how to scale, throttle and cool-down the function.
Adaptive Scaling for a function is not reserved, each function can scale to the max concurrency of the account, but the max amount of concurrent request still apply.
Function throttling is not “reserved concurrent”, each function can be throttled without taking concurrency slots from other functions.
Developers can now design the scale graph with the new intuitive UI.
The first example is a simple scaling of 1:3 - for every new invocation - 3 instances will be created.
The Adaptive Scale-Up parameters are powerful and give the developer the flexibility they need:
Value: scale unit
Possible values: 0-100
This parameter will set the highest point the line graph can reach (the highest scale unit the developer need).
For example: setting it to 50, linear, will instruct the auto-scaler to start with 1 and go up to 50 (divided by the amount of concurrency).
Value: scale unit
Possible values: 0-100
The flat rate of scaling, for example - Sustain Concurrency of 3 will instruct the auto-scaler to scale up 1:3 (3 new instances for 1 new invocation).
If the value is zero, then the function is throttled.
Sustain Concurrency can be applied for the entire scale cycle or just for the X last invocation.
Value: concurrency cutoff
Possible values: 0-account concurrency limit
This parameter will determine where to start the Sustain Concurrency, a value of zero will disable the slope scaling, and a value of the max limit will disable the sustain.
In this example the sustained concurrency follows the end of the linear scale - after reaching 700 concurrent instances, the function will scale until the limit with a flat rate of 1:100.
And a slightly different example where after the scale to 550, the scale unit returns to the default - 1:1.
This is an example of a function that will scale exponentially by steps from 1 to 50 until it reaches 550 concurrent functions, and then stop scaling and will be throttled.
The cool-down graph can be customized to gradually scale down function instances over time.
Time can be between 0.00001 to 5 hours, and the shape of the line can be linear, exponential, cos and sin.
Adaptive Scale Up: without additional cost (!).
Cool-down: $0.008 per GB-hour for idle time until receiving the request, and then regular pricing for the function’s invocation execution.
Adaptive scaling preview will be rolling out soon.
Function Chaining Auto-Scaling
Nuweba’s automatically generated distributed tracing map is powered by our network traffic inspection layer and is the enabler of our new function-flow based scaling.
Function Chaining Auto-Scaling will scale functions, that are part of the request flow, and will provide a completely warm function-chain.
Auto-scaling functions based on their observed flow can improve the overall performance of complex APIs by up to 50%, or in other words - the total duration of the flow will be cut by half, and the total cost will be reduced, sometimes dramatically.
Fan-out based APIs and long flows will benefit from this feature the most, as they are the one that generally contains more functions in a flow.
In order to use this feature, developers just need to enable it for a specific flow.
Function Chaining also supports Adaptive Scaling - the outcome of the first function in the flow will determine the outcome of the rest of the flow.
In this example, we see a flow that will scale simultaneously and automatically by the Function Chaining Auto-Scaling.
This feature is without additional cost to all Nuweba customers.
Function Chaining Auto-Scaling preview will be rolling out soon.