Error Code Handling in Public Cloud Services
Effective error code handling is crucial for building robust and reliable applications in the public cloud. Error codes, whether generated by public cloud services or HTTP protocols, provide valuable feedback on the status and outcome of operations. This topic explores best practices for handling error codes when using public cloud services, focusing on both cloud service-specific errors and HTTP protocol responses.
IONOS Cloud provides multiple interfaces to its products—Data Center Designer (DCD), APIs, SDKs, and Configuration Management tools.
The DCD includes functions that validate configurations of virtual data centers and will return errors or warnings before provisioning.
Errors appear in red, and they block the provisioning of a resource. Hence, we recommend resolving them before provisioning. For example, when a new block storage with a public image gets created, you must define either a root password to the image or apply an SSH key. If you do not configure either, the DCD returns an error and requests the user to resolve it.
Warnings are marked in yellow and indicate useful improvements but do not block provisioning. These warnings may indicate that the configuration is incomplete, and you have to continue to complete the configuration after the current provisioning run. For example, you can create an instance without a network interface, which means it cannot communicate with other instances or the public internet. As this is an uncommon use case, the DCD returns a notification that the configuration needs to be improved. The validation dialog gets displayed before provisioning. You can fix the errors and click Provision Now to continue.
Apart from client-side validations in the DCD, there are other types of errors that require respective handling.
HTTP error handling
When dealing with HTTP-based cloud services, adhere to standard HTTP status codes to process the result of a request. IONOS APIs use well-known status codes, such as 200 for successful responses and 4xx and 5xx codes for different error scenarios. Here are a few examples, but as the list of HTTP status codes, an entire list of potential use cases resulting in each code cannot be provided.
HTTP Status Codes
Description
200
The status code is returned when the respective API call is accepted. It still may return an error from the backend application (see next chapter), but the call construct itself was valid and consistent.
401
The status code is returned when the credentials used for an API call are incorrect. This happens when there is a typo in the username or password or the user does not exist within the IONOS user management. IONOS will not return details if the username or the password is incorrect not to reveal if there is a user within its database.
404
The status code is returned when a resource was supposed to be retrieved but does not exist (resource not found). This could be the case when a resource Universal Unique Identifier (UUID) was used that is incomplete or when a resource was retrieved that was deleted before and, therefore, no longer exists.
500
The status code is returned when the API backend encounters an unexpected error. The user cannot resolve this error. If the issue persists after retries, it can be helpful to contact IONOS Cloud support.
An official list can be retrieved from the Hypertext Transfer Protocol (HTTP) Status Code Registry, maintained by the Internet Assigned Numbers Authority (IANA).
IONOS application errors
IONOS applications have two types of errors—fail-fast validation errors and provisioning errors.
Whenever you send a request to create, update, or delete a resource, the IONOS backend applies first checks. Here are a few examples:
Is the user authorized to execute this change?
Does the contract contain enough resources to request a specific resource?
Does the API call request a resource with a configuration out of product range or a resource configuration not supported in the respective location?
The API will respond synchronously to such errors in the request call. The response contains an explicit and readable description of the error so it can be correlated with the initial API call.
You need to change your API call accordingly to meet the application criteria. For example, change the configuration to fit the product specifications and retry.
Note: The IONOS API only returns the first fail-fast validation error it encounters. It does not validate the entire request and returns all errors but fails immediately after encountering the first error. Consequently, the retry may reveal the following fail-fast error that requires the user's attention.
Once the request passes all fail-fast validations, it will be queued for processing within the customer contract API queue.
The API call will return an HTTP Status 200 to signal the successful submission of the API call.
In the next step, IONOS processes the API call and may detect other issues, resulting in an incomplete order. As this happens asynchronously, the error is reported within the request's status.
In the response header of the initial API call, you will find the header "location" that contains the URI for the respective request status call, including its identifier. You can use this resource to retrieve information about the order's status. Remember to update it frequently as progress is reflected at the request's runtime.
If the request fails to be completed successfully, the request status returns an error, including an error code in the format of "VDC-x-xxx" (while 'x' may represent a certain 1 - 4 digit numeric code).
In some cases, the response directly returns the reason for the error. In some other cases, the error code mentions you to get in contact with IONOS Cloud Support for further assistance. At this point, IONOS will not publish a list of potential error codes. It is planned for a later period, and once a list of error codes—including measures to handle these errors—is available, it will also be linked here.
Note: The details in this section apply to all IONOS Infrastructure services—server, block storages, and network.
IONOS Object Storage error handling
IONOS Object Storage is available via the web interface and the API. The web interface is based on the same API, so this documentation will cover API only. IONOS Object Storage APIs are based on HTTP and follow the same standards mentioned in the IONOS application errors section; hence, we recommend reading it beforehand for further details.
IONOS Object Storage explicitly has application error descriptions containing specific action recommendations for users. Here are a few examples:
When the format of the bucket name is wrong.
The bucket name already exists. IONOS operates its own installation of Object Storage from a third party, which follows the standard S3 Object Storage REST error responses. Usually, the error response contains information for the content type and the HTTP status code in the scope of 3xx, 4xx, or 5xx. Furthermore, the error message contains information about the error in a message tag, which helps to identify and resolve the issue. Depending on the error message, you should retry or change the S3 Object Storage request to resolve the issue returned in the error message.
At this point, IONOS will not publish a list of potential error codes. It is planned for a later period and once a list of error codes—including measures to handle these errors—is available, it will also be linked here.
Implementation of best practice advisor
When implementing automatic routines via API, SDKs, or Configuration Management tools, it is best to consider a few elements to help deal with potential errors returned from the cloud service provider application.
Comply with the HTTP status code standard: If you follow the standard, your implementation and the interface consumed by this implementation will create the best value.
Implement retry mechanism: Consider implementing retry mechanisms for transient errors. Public cloud services occasionally experience temporary issues, and retrying the operation after a short delay can often lead to successful outcomes. Implementing exponential backoff algorithms can help prevent overwhelming cloud services with retries during high error periods.
Implement circuit breakers: In addition to the retry mechanism, there are error cases that are good to retry directly or after a short wait period. You may need to use circuit breaker patterns to manage service failures gracefully. Circuit breakers act as safety mechanisms that detect recurring errors and temporarily halt requests to the failing service. It prevents cascading failures and helps the system recover from transient errors more effectively.
Log errors and exceptions: Log errors and exceptions at appropriate levels in the logging system of your implementation. Detailed error logs aid in post-mortem analysis, troubleshooting, and monitoring the health of your implementation. It can also be a helpful source for analysis and investigation on the cloud service provider side. It provides details of your use case and data used to reproduce the error, which is necessary to arrive at a proper solution or a workaround. It may also happen that an unintended error has been discovered and requires a fix by the cloud service provider.
Plan for unknown errors: Prepare your application to handle unknown or unexpected errors gracefully, in addition to logging errors and exceptions. Implement fallback mechanisms and default behavior for scenarios with undocumented or unrecognized error codes.
Integrate error handling with Cloud Providers Service Level Agreements (SLA) and Service Catalog: Understand the Service Level Agreements (SLA) and the Service Catalog Specifications of the cloud services you are using. Integrate error-handling practices with the defined SLA to manage response times and escalations during prolonged service disruptions. The service catalog will provide details about the service range boundaries which helps to understand the limits of the product.
Continuously review and improve error handling: Regularly review your error handling mechanisms and identify areas for improvement. Embrace a culture of continuous improvement to ensure your application evolves along with changing cloud service conditions and requirements.
IONOS's responsibility
IONOS logs errors and uses them for failure analysis. As outlined earlier, errors are categorized into two segments. First, the application analyzes errors caused by invalid use of the interfaces, which might be caused by misleading documentation, product descriptions, or customer expectations. The second category collects errors caused by the IONOS application itself. In both cases, IONOS will analyze and review the errors to apply improvements to eliminate them. The data collected for this action aligns with data privacy regulations and will only be used to reduce errors and improve the product and its associated services, such as documentation. As mentioned earlier, it will not be used for other purposes, especially commercial purposes.
Last updated