Skip to content
Matthew Adams By Matthew Adams Co-Founder
Retrying tasks with TPL, async and synchronous code

Update: This library is now Corvus.Retry.

In this world, we have to face up to the horrible possibility of failure. Sometimes, that failure is irrecoverable. There's no going back. Sometimes, we can have another bash at it and see if it comes out OK the second time around. Or maybe the third. Or the fourth.

The Introduction to Rx.NET 2nd Edition (2024) Book, by Ian Griffiths & Lee Campbell, is now available to download for FREE.

In the software business, historically, we've kind of got away with assuming that everything is going to work fine. Resources are 'just there' when we ask for them. They don't mysteriously disappear only to  reappear a moment later. Tasks complete if you set them going…

Of course, that was never really true before, and the unreality of that attitude is all the more obvious when dealing with cloud services.  Here resources really can be temporarily missing, refused or moved. Tasks may begin and never complete because our thread has suspended or died in some virtual infrastructure, only to be resurrected again later.

When we start an operation that may potentially fail, there are a few failure scenarios that could play out:

  1. A catastrophic error has occurred, so we want to propagate that exception straight out to the caller
  2. A potentially transient error has occurred, so we want to retry the exact same operation again
  3. We want to abandon the operation

Abandoning the operation is straightforward. We can use the standard .NET CancellationToken mechanism.

The best hour you can spend to refine your own data strategy and leverage the latest capabilities on Azure to accelerate your road map.

Scenarios 1 and 2 require a Retry mechanism of some kind,

For those, we've provided the Endjin Retry Framework. (That link takes you to the nuget package download.) Source code is also available, along with a sample.

Policy

Policies are used to determine whether we should consider retrying at all, given that a particular exception has occurred The default policy is AnyException - you can always retry regardless of the particular exception or its content

We also provide an AggregatePolicy which allows you to retry if and only if all of a set of policies allow you to retry.

It is up to you to write custom policy if you want particular exceptions to be "non-retryable" For example, you might set up a policy that does not allow you to retry if you get a 404 (not found) from an http operation, but does retry if you get a 501 (internal server error). To do this, you implement the  (very simple) IRetryPolicy interface.

public class AnyException : IRetryPolicy
{
    public bool CanRetry(Exception exception)
    {
        return true;
    }
}

Strategy

The strategy determines how a task is retried. There are two phases to this. In the first phase, we prepare to retry given a particular exception, and calculate an optional delay before we retry the operation. In the second phase, the framework checks whether we are allowed to attempt a retry (we haven't, for example, exceeded some maximum number of retries).

The strategy also aggregates the exceptions that caused us to require a retry attempt, and raises a Retrying event just before a retry attempt occurs.

We provide three strategies in the box:

Count: will retry immediately up to a maximum number of times

Incremental: retries up to a maximum number of times, with an (optionally increasing) delay between retries.

Backoff: is similar to Incremental, but provides an exponentially increasing delay between retries, with a random element.

Note - you don't want to use the Incremental or BackOff strategies in Windows Azure. It is better to hammer the fabric and let it adapt to your preferred usage pattern.

If you want to implement your own strategy, you inherit from RetryStrategy and override the PrepareToRetry() and CanRetry() methods. Here's an example.

public class Count : RetryStrategy
{
    private readonly int maxTries;
    private int tryCount;

    public Count()
        : this(5)
    {
    }

    public Count(int maxTries)
    {
        this.maxTries = maxTries;
    }

    public override bool CanRetry
    {
        get
        {
            return this.tryCount < this.maxTries;
        }
    }

    public override TimeSpan PrepareToRetry(Exception lastException)
    {
        this.AddException(lastException);

        this.tryCount += 1;

        return TimeSpan.Zero;
    }
}

Scenario One: TPL

If you want to start a new Task using the TPL, but automatically retry that task using our retry framework, we provide a source-compatible replacement for Task<T>.Factory.StartNew() and Task.Factory.StartNew() called RetryTask<T>.Factory.StartNew(), and RetryTask.Factory.StartNew().

Each of the overloads also takes additional optional parameters for the Strategy and Policy, defaulting to AnyException and Count (with a maximum of 5 retries).

Here's an example of that.

ISomeService someTasks = new MyService();
var result = await RetryTask<string>.Factory.StartNew(() => someTasks.SecondTask(someTasks.FirstTask()), new Count(10));

Scenario Two: Async

Sometimes, you are using the async/await pattern and need to call an async method. In that case, you can use our Retriable.RetryAsync() method.

private static async Task RunAsync()
{
    ISomeService someTasks = new MyService();
    var result = await Retriable.RetryAsync(() => SomeFuncAsync(someTasks));
}

private static async Task<string> SomeFuncAsync(ISomeService someTasks)
{
    var response = await someTasks.FirstTaskAsync();
    return await someTasks.SecondTaskAsync(response);
}

This also works for inline async delegates

ISomeService someTasks = new MyService();
var result = await Retriable.RetryAsync(
    async delegate
    {
        var response = await someTasks.FirstTaskAsync();
        return await someTasks.SecondTaskAsync(response);                
    });

Scenario Three: Synchronous

Finally, it works just as well for a synchronous method.

ISomeService someTasks = new MyService();
var result = Retriable.Retry(() => someTasks.SecondTask(someTasks.FirstTask()));

Matthew Adams

Co-Founder

Matthew Adams

Matthew was CTO of a venture-backed technology start-up in the UK & US for 10 years, and is now the co-founder of endjin, which provides technology strategy, experience and development services to its customers who are seeking to take advantage of Microsoft Azure and the Cloud.