Managing the happy path in a service flow is the first priority in terms of service delivery methodology. But the real pain comes in when the faulty situation needs to be dealt with. Fault Management Framework 11g allows us to centrally manage the faults for a composite application. Service Deliveries that include multiple services can have a centralized fault management framework. This allows the service delivery to be more configuration driven, less painful in terms of managing the fault handling and declarative in nature – “if this happens, do this!”
In this post, I will concentrate on managing Fault Policy Framework for SOA Suite 11g composite apps. There are two aspects of Fault Policy in 11g composites:
1. Handling BPEL faults
2. Handling Mediator Faults
Fault Policy and Fault Binding are the files that construct the Fault Management Framework for SOA Suite 11g. The Fault Policy and the Binding should comply with their respective schemas otherwise the Fault Management Framework [FMF] will not be able to parse the Fault Policy and Binding.
Handling BPEL faults
A detailed description of handling BPEL fault can be found here at Oracle documentation.
The key points of Fault Handlers of SOA Suite 11g (184.108.40.206.5 ) are:
- BPEL fault works only on invocation failures, in the case where there is a custom fault thrown. If there are two BPELs, BPEL-A and BPEL-B, where BPEL-A calls BPEL-B. Where a business fault is thrown in BPEL-A, fault handlers will not be active on this fault. But in the case where BPEL-B throws a fault, which is captured at the invocation level of BPEL-A, then the fault framework becomes active.
- By default, fault framework allows us to take the following handling actions on occurrence of a fault:
- Human Intervention [humanIntervention] – Reports the fault to the error recovery queue. Support team can log into the Enterprise Manager Console and grep the recovery instance and resubmit the instance.
- Rethrow [rethrowFault] – Bubbles the fault up to the caller service. Allows the client to handle the fault.
- Termination [abort] – Terminates the process and is dehydrated. No further recovery action can be done on that particular instance.
- Replay Fault [replayScope] – The replay scope allows us to replay the service.
- Custom Java Action [javaAction] – Custom java class can be invoked in case we want to handle the fault in an “out of the box” method.
- Retry [retry] – Retry action allows the service to retry on failure. This action has further child elements under it.
*If you set the Retry Interval in the fault policy to a duration less than 30 seconds, then the retry may not happen within the specified intervals. This is because the default value of the org.quartz.scheduler.idleWaitTime property is 30 seconds, and the scheduler waits for 30 seconds before retrying for available triggers, when the scheduler is otherwise idle. If the Retry Interval is set to a value less than 30 seconds, then latency is expected.
If you want the system to use a retry interval that is less than 30 seconds, then add the following property under the section *<property name=’quartzProperties’>* in the fabric-config-core.xml file: *org.quartz.scheduler.idleWaitTime=<value>*
**Exponential backoff indicates that the next retry attempt is scheduled at 2 x the delay, where delay is the current retry interval. For example, if the current retry interval is 2 seconds, the next retry attempt is scheduled at 4, the next at 8, and the next at 16 seconds until the retryCount value is reached.
Handling Mediator Faults
Handling Mediator Faults is pretty much the same as what we have for BPEL. The only thing to note about Mediator Fault handling is: Mediator Fault handling only works if the flow service is a parallel invocation as Mediator Faults are never picked up by the fault handler. In the case of a sequential call, it is up to the client to handle the fault.
Mediator faults are always thrown in the namespace and part of: http://schemas.oracle.com/mediator/faults}mediatorFault
There are predefined Mediator Error codes as listed below.
Mediator Pre-defined Error Codes
The following list describes various error groups contained in the TYPE_ALL error group:
* TYPE_DATA: Contains errors related to data handling.
* TYPE_DATA_ASSIGN: Contains errors related to data assignment.
* TYPE_DATA_FILTERING: Contains errors related to data filtering.
* TYPE_DATA_TRANSFORMATION: Contains errors that occur during transformation.
* TYPE_DATA_VALIDATION: Contains errors that occur during payload validation.
* TYPE_METADATA: Contains errors related to Mediator metadata.
* TYPE_METADATA_FILTERING: Contains errors that occur while processing the filtering conditions.
* TYPE_METADATA_TRANSFORMATION: Contains errors that occur during getting the metadata for transformation.
* TYPE_METADATA_VALIDATION: Contains errors that occur during validation of metadata for Mediator (.mplan file).
* TYPE_METADATA_COMMON: Contains other errors that occur during the handling of metadata.
* TYPE_FATAL: Contains fatal errors that are not easily recoverable.
* TYPE_FATAL_DB: Contains database related fatal errors, such as Datasource not found error.
* TYPE_FATAL_CACHE: Contains Mediator cache-related fatal errors.
* TYPE_FATAL_ERRORHANDLING: Contains fatal errors that occur during error handling such as Resubmission queues not available.
* TYPE_FATAL_MESH: Contains fatal errors from the Service Infrastructure such as Invoke service not available.
* TYPE_FATAL_MESSAGING: Contains fatal messaging errors arising from the Service Infrastructure.
* TYPE_FATAL_TRANSACTION: Contains fatal errors related to transactions such as Commit can’t be called on a transaction which is marked for rollback.
* TYPE_FATAL_TRANSFORMATION: Contains fatal transformation errors such as error occurring because of the XPath functions used in a transformation.
* TYPE_TRANSIENT: Contains transient errors that can be recovered on retrying.
* TYPE_TRANSIENT_MESH: Contains errors related to the Service Infrastructure.
* TYPE_TRANSIENT_MESSAGING: Contains errors related to JMS such as enqueue, dequeue.
* TYPE_INTERNAL: Contains internal errors.