Building on the Platform
Best Practices
Retry Mechanism Implementation
4min
integrating with the founda platform requires handling scenarios where requests may not reach their destination due to transient issues like network problems or (downstream) service downtime this occurs most commonly when we are unable to deliver a request to the downstream data sources due to connectivity issues between founda and the provider organization the founda platform does not automatically retry failed requests for you we will clearly return the received error code as a response for this reason, it is strongly advised to implement a retry mechanism depending on your requirements, an implementation using exponential backoff might make the most sense retrying failed requests is not always the right choice this is especially important to consider in (synchronous) patient or doctor facing applications specific requests that require direct user feedback need to be designed to deal with potential unavailability of downstream systems http status codes when retrying it is wise to consider the status code the following guidance applies implement retries for transient error responses such as 503 service unavailable 504 gateway timeout 502 bad gateway 408 request timeout do not retry on errors that are unlikely to resolve themselves or indicate client side issues, such as 400 bad request 401 unauthorized 403 forbidden 404 not found 409 conflict example the general recommendation is to utilize exponential backoff this is an approach to manage retries effectively it involves incrementally increasing the delay between retry attempts to balance network use and increase the likelihood of successful delivery for example initial attempt make your request failure detection on failure, check if the status code suggests a transient error first retry if retryable, wait for a moderate initial delay (depending on your application between 5 and 30 seconds) subsequent retries increase the delay for each retry exponentially retry limit cap the retries at a reasonable number (e g , 4 attempts) ceiling on delay implement a maximum delay (e g , 15 minutes) to avoid long waits jitter add random jitter to prevent synchronized retry patterns in large scale outages best practices monitor retry attempts keep track of retries and adjust your strategy based on outcomes error handling have a plan for handling requests that fail after all retries logging make sure you log your retries you can also check your audit service docid\ tp cjr7zfux6 ooh1pwok