I have a use case where to mark a create entity operation as successful in my service, I have to write some metadata to DataBase 1 (my service has direct access to DB), internally call another api of my current service which will persist some relations in DataBase 2 (A graphdb to which my service has direct access to) and then subsequently, mark entity state in DataBase 3 as ACTIVE, hence marking the create entity operation as successful. All the 3 databases are NoSQL data bases. I am looking for strategies to handle partial failure and rollback strategies. My clients can call the create entity operation api in Synchronous as well as asynchronous. My thought process around handling partial failure is after all the retries are exhausted, move the original create entity operation request to a DLQ, have alarms on the no of messages in this DLQ so that oncalls can be notified and look into the request for debugging purpose. If the root cause suggest transient issues with underlying systems, then oncalls can re-drive the original request to the source queue in order to retry the request in order to recover from partial failures otherwise further debugging and making system corrections & backfills would be the way to recover from partial fa..

Read more