net.sf.ehcache.writer.writebehind.WriteBehindQueue does not appear to clean out failed batch writes from quarantined list after all retryAttempts fail.

Type: Bug
Status: Closed
Priority: 1 Critical
Resolution: Duplicate
Component/s: ehcache
Labels:

Assignee: alexsnaps
Reporter: jacodw
Created: September 08, 2010
Votes: 0
Watchers: 2
Updated:
July 27, 2012
Resolved:
August 16, 2011

Description

I have witnessed the following behaviour in version 2.0.0 and also believe I have confirmed my suspicion from the source file for @version $Id: WriteBehindQueue.java 2379 2010-05-03 15:16:18Z gbevin $

It appears as though the processBatchedOperations method in WriteBehindQueue throws if all retryAttempts have failed. Before throwing it does not attempt to remove the failed batch from the quarantined list.

I would expect that if the batch fails on all retryAttempts, that it should be removed from the list and not processed on subsequent cycles.

Instead, it appears to stay in the list. This list then gets added to as a result of reassemble being called in the try catch handler of processItems.

*Example:* —- I write a batch of 10 records to the cache (using a db cachewriter).

My cache is configured to perform batch writing and set to retry 10 times at 1 second intervals.

One of the records I write violates a non null constraint, so the batch fails.

This failure is persistent (unlike a possible transient error such as connection timeout), therefore all 10 retry attempts will fail.

During the retry Attempts another batch of 10 records gets written to the cache and queued for processing.

At the end of the 10 retries, the WriteBehindQueue does not clear my failed batch from the quarantined list before adding the waiting 10 records to it, and re-processing.

As a result I am now in a situation where my database writes fail for the lifetime of my application.

Comments

Chris Dennis 2010-09-29

I haven’t had a chance to look at this yet, I wouldn’t be at all surprised if he’s correct about the behavior however.

Chris Dennis 2010-09-30

Talking with some of the other developers we think there are two issues at play here.

Assuming that your example scenario is approximately the same as the real problem you are facing then the correct solution for you is to handle the exception caused by the database constraint violation in the CacheWriter implementation itself. As you rightly state a constraint violation is highly unlikely to be a transient condition it makes no sense to throw it and cause a retry. Whatever cleanup may be necessary to handle the constraint violation can then be performed in the writer itself, after which you can return successfully from the method.

On the other hand however we do agree with you that the current behavior of the WriteBehindQueue is not very useful. Our general idea of what we think should happen here is that on the last retry attempt (or after it, but before removal) the queue should call either a distinct method on the CacheWriter marking this as the last attempt to write, or it should call a method on a dedicated FailedExecutionHandler instance. Following this we can then safely remove the item from the queue (just the failing item however - not the whole batch).

Fiona OShea 2010-10-12

Per Mike: This should not hold Magnum

Alexander Snaps 2011-08-16

EHC-851 addresses this in clustered as well. Items that couldn’t be processed will be “sent’ to the net.sf.ehcache.writer.CacheWriter.throwAway(Element, SingleOperationType, RuntimeException) method. (as of ehcache 2.5)