New Proactor implementation for POSIX

For the development of high-performance networked applications on WIN32 and POSIX, it is highly desirable to have a full-fledged single pattern working on both platforms. It is obvious that the asynchronous I/O gives the best performance on WIN32 platforms and the Proactor pattern is the only one choice to be made. Yet, the status of POSIX AIO subsystems on many platforms does not allow making a choice of Proactor pattern as a pattern for production applications. Moreover, the Thread Pool Reactor can give a better performance because many POSIX AIO implementations spawn “hidden” threads instead of kernel AIO solution. In addition, POSIX AIO standard does not support asynchronous accept, connect and datagram operations.

As a result the goal of this work is:

§ To develop POSIX_Proactor which will be able to emulate AIO facilities

§ To provide all asynchronous operations set support

§ To make Proactor compatible with existing users applications

§ To achieve the performance at least not less than TP_Reactor performance

§ To develop Proactor which will support both POSIX AIO API and emulation API

§ User applications should not be recompiled if Proactor selects another internal engine, i.e. switches from POSIX AIO to the AIO emulation and vice versa.

§ Proactor should be configurable at run-time to make programmer free from selection of the internal engine, as it is the task of the system administrator to select the right engine.

§ Proactor should have open interfaces to add new engine(s).

2. New features of POSIX Proactor

The new POSIX_Proactor works on any POSIX platform regardless of POSIX AIO support. If platform includes AIO which implementation is reliable and fast, POSIX_Proactor can rely on it and use native implementation. If platform does not have AIO support or has AIO implementation broken, POSIX_Proactor will emulate AIO and will give performance at least not worse than given by Thread Pool Reactor.

POSIX_Proactor provides full asynchronous operations set including asynchronous datagram operations.

The new POSIX_Proactor is fully compatible with existing applications.

Therefore, we can use the Proactor as realiable pattern for high-performance networked application on both platforms WIN32 and POSIX.

Design and components:

ü AIO_Dispatcher – responsible for selection of active AIO_Processor and dispatching of completed results.

ü AIO_Processor – responsible for management of collection active operations, asynchronous activation/start and cancel operations, waiting for completions, transfer of completed results to the AIO_Dispatcher. The AIO_Processor creates at run-time the implementation of AIO_Provider to execute low-level primitives start_aio and cancel_aio. The AIO_Processor obtains AIO_Wait_Strategy interface from AIO_Provider to perform “wait for completions” function.

ü AIO_Provider - is an abstract interface to the “pluggable AIO device driver”. It defines low-level primitives: start_aio, cancel_aio. In addition, it provides access to the AIO_Wait_Strategy interface. The concrete implementations of “pluggable” AIO_Provider-s are currently supported:

§ POSIX_STD_Provider based on standard POSIX AIO API.

§ SUN_Provider based on SUN specific API

§ SELECT_Provider – emulation POSIX AIO via select () call or other demultiplexing API.

§ LINUX_Provider - based on Linux kernel 2.6 specific API

ü AIO_Wait_Strategy – is abstract interface used by AIO_Processor to perform “wait for completions” action. The implementation should be provided or created by AIO_Provider. Usually AIO_Provider has its own proprietary AIO_Wait_Strategy, but some Providers can have more than one AIO_Wait_Strategy. For example, in POSIX we can use strategies:

§ POSIX_STD_Strategy based aio_suspend () (former AIOCB_Proactor),

§ POSIX_SIG_Strategy - waiting for RT signals (former SIG_Proactor)

§ POSIX_SGI_Strategy – based on aio callback function (former CB_Proactor)

ü AIO_Interrupt_Strategy - is abstract interface used by AIO_Processor and AIO_Dispatcher if it is required to interrupt “wait for completions” action. The implementation should be provided or created by AIO_Wait_Strategy. There are two AIO_Interrupt_Strategies that are ready to use by any AIO_Provider and that are independent of AIO_Wait_Strategy. They are AIO_Interrupt_Pipe_Strategy and AIO_Interrupt_Signal_Strategy. However, POSIX_SGI_Strategy can also use its own proprietary interrupt strategy.

ü AIO_Config – is a small class, which allows you to tailor at run-time your own Proactor implementation.

SELECT_Provider:

ü Supports full asynchronous operations set:

Read_Stream Write_Stream

Read_File Write_File

Read_Dgram Write_Dgram

Accept Connect

ü Available on any platform regardless of POSIX AIO support

ü It can be used instead of POSIX or SUN providers.

ü It can be used in combination with POSIX or SUN providers.

ü It gives performance not less the Thread Pool Reactor (TP_Reactor). Results of tests are enclosed.

ü Supports a set of implementations of AIO_Wait_Strategies that allows making run-time choice of the best demultiplexing mechanism for given OS (see SELECT_Provider chapter below)

Support of two pairs AIO_Processor-AIO provider simultaneously in one Proactor.

Each provider supports proper set of operations. For example, one provider can be POSIX_STD_Provider to perform read/write operations on streams and files, and another provider can be SELECT_Provider to deal with accept, connect and datagram operations.

If current POSIX AIO is broken, we can delegate stream (socket/pipe) operations to the SELECT_Provider and use POSIX_STD_Provider only for read/write files (hard disks devices).

3. Motivation for the leader-follower pattern

One of the differences between POSIX AIO and WIN32 Asynchronous API is that in POSIX we have to support and maintain our own collection of started AIO operations. This can be explained by the following reasons:

Theoretical: POSIX API has only one method - aio_suspend ()- to wait for the multiple AIO completions. aio_suspend () does not identify concrete aiocb’s with completions (compare with Sun aiowait () and Win32 GetQueuedCompletionStatus ()). The completion status of the referenced asynchronous I/O operations must be determined using aio_error () and aio_return () for each relevant aiocb, i.e. Proactor should scan the list of operations. In addition, it is incorrect to modify aio_suspend () parameters list in one thread while another thread also is waiting on this list. This error was in old POSIX_Proactor. So Proactor has to support own collection of started operations because of aio_suspend () parameters list should be synchronized with this collection each time start or completion is occurred.

Practical: If POSIX RT signals are used to determine AIO completions, there is a big chance that some OS can loose or have problems with delivering RT completion signals and Proactor still has to check all other AIO for completions. For example, Linux RT signal queue can be overflowed. In this case, Linux generates SIGIO signal instead of RT-signal and Proactor should be able to check the status of all outstanding requests. Therefore, we need to maintain a collection of started AIOs.

Practical. SUN-specific aiowait () similarly to WIN32 GetQueuedCompletionStatus () could free us of support own collection of started AIOs. But:

ü It is only SUN specific

ü There are some bugs in Sun versions (including 5.8) and aiowait () has problems with MT-safe

Theoretical. Model of our own collection of started AIO operations allows us to build abstract AIO system with pluggable AIO Providers.

As long as Proactor has to support its own collection of started AIO operations, access to the collection: insert, delete, find, and iterate operations – should be synchronized. After AIO completion is detected, “find” or “iterate” operation is used to determine concrete completed aiocb’s and process/dispatch completions. There is no advantage of using aio_suspend () in several threads simultaneously, if after aio_suspend () Proactor locks the same mutex in all threads. Moreover, the problem of synchronization of aio_suspend () parameters in all threads does not exist in leader-follower pattern.

Actually, we already have some kind of hidden leader-follower pattern as only one thread can scan collection for completions.

As the matter of fact, there is another “secret goal”: applying leader-follower pattern opens a way “emulate AIO” via select () call.

4. Modification of the leader-follower pattern for Proactor

A classical model can be described as follows: while leader thread is waiting for specific events, followers are waiting for a leadership token. When a leader releases leadership, it signals semaphore or conditional variable to promote one of the followers, releases a token and starts to process event. As soon as event processing is finished, the “former leader” enqueues itself in the followers queue.

4.1.A follower can do useful job

In Proactor pattern, we deal with two kinds of completions: real AIO completed results and faked “post-completed” results. The latter results are timer, wakeup and other used-defined events.

Let us delegate “waiting for real AIO completions” function to the leader thread.

While leader thread waits for AIO completions, we can signal the same semaphore/conditional variable to wake up one of the followers to process “post-completed” results without interrupting the leader. Such approach assumes that post-completed results are saved in the special completion queue. The follower before becoming a leader should check this special queue. If queue is not empty, the follower dispatches results from the completion queue, otherwise it tries to obtain leadership. There is only one case when we still have to interrupt leader to process post-completions. This is case of one leader and no followers.

4.2.A leader only waits for AIO events

Now we will consider leaders function.

Waiting for completions can be aio_suspend (), aiowait (), sigwaitinfo (), select (), etc…

When the leader detects one or more AIO completions, it has two choices:

1) To remove first completed result from the collection of started AIO, promote a follower and dispatch extracted result, i.e. call users callback method. After that it enqueue itself in the followers queue.

2) To remove some or all completed results from the collection of started AIO and pass all these results to the collection of completed results. This can be the same collection where “post-completed” results are stored, i.e. common queue of completed results! After detecting all completions, a leader releases leadership, picks up the first result from the completion queue for dispatching and wakes up the follower. The follower also checks the completion queue. If it is not empty, the follower will help the “former” leader to process completed results, otherwise it becomes a leader.

The choice (2) seems to be better than choice (1) because it is more flexible and includes choice (1) as a particular case. Disadvantage of choice (2) is that we spend extra time to pass results to the completion queue. (Please, see Collections of results below how to avoid/minimize the cost of operations on collections). Advantage of choice (2) is that we can dispatch all completed results from only one place – in the component AIO_Dispatcher.

4.3.Dispatcher

The AIO_Dispatcher is not only a dispatcher for completed results; it also plays the role of the manager in the leader-follower pattern. It makes a decision (based on the size of completed queue and the number threads-followers) for the each thread what to do: to sleep as a follower, to process completed results or perform leader’s waiting for aio completions. So, let delegate “pure” leaders functions such waiting for the AIO completions to the class AIO_Processor and delegate processing completions to the AIO_Dispatcher.

Note. Dispatcher can support more than one AIO_Processor (see Team-work of two AIO_Processors)

4.4.Locking model

All the components of Proactor share the common Proactor’s mutex. This mutex also is used with conditional variable. All main entries of Proactor: start_aio (), cancel_aio (), handle_events () and post_completion () immediately lock the mutex for the whole time of call. Mutex is unlocked only in three cases via anti-lock pattern:

When result is removed from the completed queue and is about to be dispatched (Unlock Point 1)
When follower waits on the conditional variable/semaphore (Unlock Point 2)
When AIO_Proccesor enters into the waiting for aio completions (Unlock Point 3)

5. Asynchronous results and collections of results

5.1. Asynchronous result - stateful object

§ AIO start is performed by POSIX_Proactor and not by ACE_Asynch_Operation subclass (this is one of the differences between POSIX_Proactor and WIN32_Proactor). Start also can be deferred. Therefore, the asynchronous result should contain all input information, which is required for start. This information includes operation code and all other necessary parameters depending on the code. The code can be:

ü READ_STREAM

ü WRITE_STREAM

ü READ_FILE

ü WRITE_FILE

ü READ_DGRAM

ü WRITE_DGRAM

ü ACCEPT

ü CONNECT

§ Dispatching can be deferred. Therefore, after AIO completion all output information should be saved in asynchronous result.

§ Asynchronous result is a stateful object and can be in one of the following states:

ü INITIALIZED result is created and ready for start.

ü STARTED operation in progress, result is in started queue (belongs to the AIO_Processor)

ü DEFERRED start deferred, result is in deferred queue (belongs to the AIO_Processor)

ü RESULT_READY AIO finished, all output information (errno, bytes transferred) is saved, result is not dispatched yet and goes to the completed queue (belongs to the AIO_Dispatcher)

ü START_ERROR start is unavailable, i.e. not recoverable errors, no point to repeat later. If it was immediate start, the result is returned to the caller. If it was deferred start, the result goes to the completed queue as it was in RESULT_READY state.

ü POSTED operation was posted via “post_completion”, result is in completed queue (belongs to the AIO_Dispatcher)

ü COMPLETED operation is dispatched, i.e. it is removed from the completed queue and users callback method is in progress. After callback returns control the result is destroyed by AIO_Dispatcher.

5.2. Collections of results

There are three collections of AIO results:

Completed results collection. Belongs to the AIO_Dispatcher and contains real AIO completed results and “post-completed” results.
Started results collection. Belongs to the AIO_Processor and contains results which are in progress.
Deferred results collection. Belongs to the AIO_Processor and contains results which should be started later.

The efficient form of container for AIO results has to be provided in order to minimise time of insert/delete operations and to avoid memory allocations/de-allocations during transferring a result from one collection to another. Memory allocations can slow down the performance because of contention for heap mutex and can produce a problem with an error ENOMEM inside Proactor it is unclear what to do with result (it can not be just deleted as user looses notification). Another very important criteria is the speed of the “remove” operation of any element in the collection and not only of first/last element.

The ideal candidate for container is intrusive double linked list – ACE_Double_Linked_List<ACE_POSIX_Asynch_Result>

As the result contains link fields (next_, prev_), we avoid creation/deletion link elements, so we never get ENOMEM. The cost of insert/remove operations on the intrusive list is zero and results are never lost.

The choice of ACE_Double_Linked_List<> as a form of container for started, deferred and completed queues gives us flexible way to transfer results from one collection to another without loosing the performance. Definitely, results should contain extra link fields: next_ and prev_.

5.3.Cashed Results Allocator

The asynchronous results are relatively small objects with short lifetime. Usually, they are created and destroyed inside the Proactor framework. To avoid expenses for new/delete operators (contention for heap mutex) POSIX_Proactor has now a component POSIX_AIO_Allocator, based on ACE_Cached_Allocator. It contains a free-list of the equal size memory chunks. The size of the chunk is maximum size of all results defined in ACE_POSIX_Asynch_IO.h.

To use POSIX_AIO_Allocator the extra private field – pointer to the ACE_Allocator- was added to the ACE_POSIX_Asynch_Result.

When result is created without allocator, this field is set to zero, therefore the rules for all non-standard results created by user for post_completions are preserved and compatibility with existing applications are also preserved.

After result has been dispatched, the AIO_Dispatcher performs the following actions:

§ If the pointer to the ACE_Allocator is zero, it deletes result

§ If the pointer to the ACE_Allocator is non-zero, it calls the destructor and returns the result to the ACE_Allocator

Let us consider how we can minimize time on creation of asynchronous results.

The bunch of classes derived from ACE_POSIX_Asynch_Operation creates results using the family of Proactor functions: create_XXXX_YYYY_result (), where XXXX = Read/Write/Accept/Connect and YYYY = Stream/File/Dgram.

After creation, these classes just call Proactor::start_aio ().

In this scheme POSIX_Proactor framework locks two mutexes – one of them is a heap mutex during new operation and the second is Proactor common mutex during start_aio () method.

Let us make such sequence as atomic, i.e. add to the POSIX_Proactor the new family of functions: create_and_start_XXXX_YYYY_result (). The POSIX_Proactor now creates results using built-in allocator and then starts them. Instead of using two locks, we use only one Proactor common mutex. Another advantage of doing this is that memory allocation from free-list is faster than any OS malloc.

The old family of functions create_XXXX_YYYY_result () are never used in new POSIX_Proactor implementation. Nevertheless, they have not been removed as they are public and can be used in some existing applications.

6. AIO_Dispatcher

There is only one instance of AIO_Dispatcher per POSIX_Proactor.

The AIO_Dispatcher is created and destroyed by POSIX_Proactor.

The AIO_Dispatcher has following responsibilities:

a) Dispatching completed results from the completion queue.

b) Selection of new thread-leader and distribution jobs between followers.

The AIO_Dispather is the owner of completed results queue. The other important members:

Pointers to the primary and secondary AIO_Processors (Proactor can support many independent AIO processors. But in current implementation, we are limited by two AIO_Processors).
Number_Active_Threads, i.e. threads entered in handle_events ()
Number_Followers, i.e. threads waiting for any job (leader’s job or processing results from queue)
Number_Leaders (can be one leader per each AIO_Processor)
Notification state - defines the action to be taken when result is placed in the completed queue. It can be one of:

ü DO_NOTHING (no leaders and no followers)

ü WAKEUP_FOLLOWER (at least one follower exists)

ü INTERRUPT_LEADER (no followers, at least one leader exists)

Each thread entered in handle_events () can be in one of the states:

FOLLOWER – sleeps on conditional variable/semaphore
LEADER – waits for AIO events in AIO_Processor
HANDLER – works with completion queue

Number_Active_Threads =

Number_Followers + Number_Leaders + Number_Handlers

The POSIX_Proactor delegates the post_completion () and handle_events () functions to the AIO_Dispatcher.

6.1.Post_Completion

The “post-completion” algorithm consists of the following steps:

1. Placing the result in the completed queue

2. If there are no followers and no leaders, then everything is done, return. (DO_NOTHING notification state)

3. If there is at least one follower, then wake up follower and return. (WAKEUP_FOLLOWER notification state)

4. Otherwise there are only leaders, interrupt one leader to process “post-completed” queue (INTERRUPT_LEADER notification state)

Actually, implementation of this algorithm has one extra parameter specifying the level of notification.

It can be one of the following:

§ NOTIFY_NONE performs only step1.

§ NOTIFY_FOLLOWER performs only steps 1-3.

§ NOTIFY_STRONG performs step 1-4.

6.2.Handle_Events

The “handle_events” algorithm looks like:

1. Check and process the completion queue.

a) If the size of queue is > 1 and the Number_Followers > 1, then wakeup one more follower to help process completions.

b) While the completion queue is not empty:

Remove head result,

Unlock common Proactor mutex (Begin Unlock Point 1),

Dispatch Result,

Lock common Proactor mutex (End Unlock Point 1),

Delete result or return it to the POSIX_AIO_Allocator

c) If there were one or more processed results, then return.

Otherwise, go to step 2.

2. For each AIO_Processor (prime and second) try to obtain leadership. AIO_Processor can obtain leadership if:

a) It has no leader

b) It has at least one started or deferred AIO operation

c) It is not a Dedicated_AIO_Processor (see Dedicated AIO_Processor)

If AIO_Processor has become a leader, then call AIO_Processor::handle_aio_events () and after that go to step 1.

If not, try next AIO_Processor.

If there is no more AIO_Processors have left, go to step 3.

3. At this point, thread becomes a follower. Wait on conditional variable for specified amount of time. During wait, common Proactor mutex becomes unlocked (Unlock Point 2). Thread is woken up on one of the following conditions:

a) Another thread has released a leadership (go to step 1)

b) There are some new results in the completion queue (go to step 1)

c) The number of started/deferred operations in any AIO_Processor increased from 0 to 1 (go to step 1)

d) Time is out. Return

TODO: Step 3 – wait as follower can be used for timer queues service. Avoid using extra thread to asynchronous timer events.

7. AIO_Processor

The AIO_Processor is created and destroyed by POSIX_Proactor.

Currently POSIX_Proactor creates maximums two AIO_Processors, but in general Proactor can support the collection of AIO_Processors.

Responsibilities:

a) Planning of start asynchronous operation

b) Planning of cancel asynchronous operation(s).

c) Management of started and deferred queues

d) Handling AIO completion events

AIO_Processor itself does not perform AIO operations; instead of this, it creates the AIO_Provider – the “working horse” for the execution of start_aio, cancel_aio, and handle_aio_events (wait for aio completions).

The two reasons why AIO_Provider is not derived from AIO_Processor are:

§ AIO_Provider should be simple as possible like “device driver”

§ The intention was to isolate AIO_Provider from communication with AIO_Dispatcher.

The AIO_Processor obtains the following information from the AIO_Provider:

§ Maximum number of AIOs allowed to start simultaneously

§ Mask of operations codes that AIO_Provider supports (see Asynchronous result - stateful object)

§ Flag START_FROM_ANY_THREAD indicating if start_aio is allowed from any thread or only in leader thread

§ Flag SHOULD_START_INTERRUPT_LEADER indicating should we interrupt leader after start_aio. This flag is valid only if START_FROM_ANY_THREAD is allowed.

§ Interface AIO_Wait_Strategy, which is used for waiting for AIO completions.

§ Interface AIO_Interrupt_Strategy, which is used to interrupt waiting for AIO completions.

Flag START_FROM_ANY_THREAD reserved for specific providers and wait strategies. Currently all AIO_Providers and Wait_Strategies allow to start AIO from any thread. There is only one exception; combination of Dev_Poll_Strategy with SELECT_Provider allows starting AIO only from leader thread.

Flag SHOULD_START_INTERRUPT_LEADER is used when it is necessary to interrupt leader after start_aio, as leader should re-issue its wait functions.

Example 1, aio_suspend () should be reissued with updated aiocb list as we cannot insert aiocb in the list, which is currently used in running aio_suspend ().

Example2, select () should be reissued with updated fdset’s if new descriptor was added to the pending set.

If start_aio() call has to interrupt leader to update AIO_Processor wait state, the user has two run-time options:

a) Set both flags to 1. Proactor can AIO from any thread and if the current thread is not a leader, Proactor interrupts the existing leader thread to update leader wait state.

b) Set both flags to 0. If Proactor has to start aio from a non-leader thread, it will place AIO into deferred queue and will interrupt leader to start deferred AIOs.

AIO_Processor has two public methods accessible from external classes. Any thread could call these methods:

§ Start_aio

§ Cancel_aio

AIO_Processor has three private methods accessible only from the friend class AIO_Dispatcher:

§ handle_aio_events (called only in leader thread context)

§ acquire_leadership

§ release_leadership

There are two variations of AIO_Processor: Shared AIO_Processor and Dedicated AIO_Processor

7.1.Shared AIO_Processor

This is the base interface of AIO_Processor. It waits for AIO completions (see Handle_aio_events) in callers thread space, i.e. in the thread where Proactor::handle_events and therefore AIO_Dispatcher::handle_events (see Handle_Events) were called.

The AIO_Dispatcher delegates the handle_aio_events () function to the AIO_Processor. The AIO_Dispatcher guarantees that handle_aio_events () is called only in leader context, i.e.:

§ After successful call acquire_leadership ();

§ In the same thread where acquire_leadership () was called.

After handle_aio_events () returns control to the AIO_Dispatcher, the method release_leadership () is called in the same thread.

7.2.Dedicated AIO_Processor

This class is derived from AIO_Processor. It has its own permanent thread-leader, which is responsible for handling AIO completions. This thread is created at start-up. It immediately obtains leadership and holds leadership until Proactor is deactivated. The thread function does one thing: always call AIO_Processor::handle_aio_events() for infinite time. The AIO_Dispatcher could never obtain leadership for Dedicated AIO_Processor (see Handle_Events).

Note 1. Dedicated AIO_Processor is only one choice for LinuxRT_Strategy with SELECT_Provider. Linux RT-I/O completions signal for given file descriptor is also associated with process id (pid). Currently, in Linux each thread has its own pid. Therefore, any generic I/O completion demultiplexor based on Linux RT signals should have separated thread, which will receive these signals.

Note 2. We can consider the class ACE_Asynch_Pseudo_Task in old POSIX_Proactor as very limited prototype of Dedicated AIO_Processor.

7.3.Team-work of two AIO_Processors

It is obvious that the performance of Dedicated AIO_Processor is much worse than shared AIO_Processor performance. We have to pay for context switching between threads.

The reason why Dedicated AIO_Processor is still required is following. POSIX_STD_Provider and SUN_Provider do not support asynchronous accept, connect and datagram operations. If AIO_Processor uses SELECT_Provider, than we do not have to create the second AIO_Processor.

There is only one difference between the primary and secondary AIO_Processors.

When Proactor wants to start the new AIO operation, it always asks the primary AIO_Processor first if it supports required operation. If not, Proactor will ask the secondary AIO_Processor. After AIO_Processor is selected, Proactor delegates the operation start to the chosen AIO_Processor.

Possible options for two AIO_Processors configurations are:

Both AIO_Processors are shared. If Proactor::handle_events () is called at least by two threads, then both of them can be leaders simultaneously, but in the different threads. If Proactor::handle_events () is called only in one thread, there will be contention between AIO_Processors as only one can be a leader (see Handle_Events).
Both AIO_Processors are dedicated. They wait for AIO completions in their own permanent leader-threads. The AIO_Dispatcher works only for dispatching and post-completions.
One AIO_Processor is shared and another is dedicated. The shared AIO_Processor waits for completions in user thread space and the dedicated AIO_Processor uses its own thread.

7.4.Start_aio

The POSIX_Proactor delegates the start_aio () functions to the AIO_Processor.

Any thread can call start_aio ().

Initially POSIX_Proactor asks the primary AIO_Processor if it supports the required operation. If yes, POSIX_Proactor delegates start_aio to this AIO_Proccesor. If not POSIX_Proactor ask the secondary AIO_Processor. The AIO_Processor does:

1) Check if the limit of maximum number of simultaneous AIOs has not been exceeded. If the limit is reached, put the result in the deferred list and return.

2) Check if START_FROM_ANY_THREAD is allowed. If not and the current thread is not the leader thread, put the result in the deferred list, interrupt leader and return.

3) Call AIO_Provider::start_aio

4) If start is deferred, add result to the deferred list and return

5) If start is failed, return (caller is responsible for the result deletion)

6) If started and finished immediately, call AIO_Dispatcher::post_completion with notification level NOTIFY_STRONG and return.

7) AIO started, add the result to started list.

8) If the number of started operations increased from 0 to 1 and the current thread is not the leader, ask AIO_Dispatcher to activate one follower (method AIO_Dispatcher::notify_only_follower). This means AIO_Processor now AIO events to wait, therefore it is ready for leadership (Handle_Events). Return.

9) If flag SHOULD_START_INTERRUPT_LEADER is on, then interrupt leader.

10) Return.

7.5.Cancel_aio

The POSIX_Proactor delegates cancel_aio () function to the AIO_Processor.

Cancel_aio () can be called from any thread.

1) The AIO_Processor scans deferred list for the results with the required file handle.

2) For each found result: it is removed from the deferred list, marked as cancelled and AIO_Dispatcher::post_completion () is called with notification level NOTIFY_STRONG

3) The AIO_Processor scans started list for the results with the required file handle.

4) For each found result call AIO_Provider::cancel_aio

5) If the result state is changed to RESULT_READY, then result is removed from the started list and AIO_Dispatcher::post_completion () is called

7.6.Handle_aio_events

Handle_aio_events() performs the following operations:

While the limit of simultaneously running AIOs is not reached, then scan the deferred list and make attempt to start each deferred operation

Call AIO_Wait_Strategy::prepare_to_wait (started list). We pass the reference to started list to allow wait strategy to build necessary parameters to its wait function, i.e. aio_suspend(),select(),etc

Unlock common Proactor mutex (Begin Unlock Point 3)

Call AIO_Wait_Strategy::wait (time_to_wait)

Lock common Proactor mutex (End Unlock Point 3)

Call AIO_Wait_Strategy::get_next_completed_result ()

If result is returned, remove it from started list, put the result into AIO_Dispatcher completion queue, i.e. call AIO_Dispatcher::post_completion with parameter – notification level. Here there is some optimisation:

7.1. If this AIO_Processor works in shared mode (see Shared AIO_Processor), then we don’t have to notify the AIO_Dispatcher since AIO_Processor returns control to the AIO_Dispatcher after all results are obtained. The notification level is NOTIFY_NONE.

7.2. If this AIO_Processor works in dedicated mode (see Dedicated AIO_Processor), then we have to notify the AIO_Dispatcher. The notification level is NOTIFY_FOLLOWER.

If there are no more completed results, return.

8. AIO_Provider interface

The AIO_Provider is an abstract class, which defines common interface between AIO_Processor and concrete low-level system AIO engine. The AIO_Processor has no knowledge about specific concrete implementation, it always uses only AIO_Provider interface.

Relationship: AIO_Provider – AIO_Processor 1:1

All pure virtual methods of AIO_Provider are divided for 3 groups.

8.1.Information pure virtual methods

These methods let the AIO_Processor to know what and how this Provider can do.

§ get_supported_operations_mask () - tells which asynchronous operations this AIO_Provider supports

§ can_start_in_any_thread () - tells the AIO_Processor if start_aio is allowed from any thread or only in leader thread (START_FROM_ANY_THREAD flag)

§ should_start_interrupt_leader ()- tells the AIO_Processor should we interrupt leader after start_aio. (SHOULD_START_INTERRUPT_LEADER flag) This flag is valid only if START_FROM_ANY_THREAD is allowed.

§ get_max_aio_num () –maximum number of AIOs allowed to start simultaneously

8.2.Executive pure virtual methods

AIO_Processor calls these methods only when it is required to start or cancel asynchronous operation. Other classes cannot call these methods.

§ start_read_stream ()

§ start_write_stream ()

§ start_read_file ()

§ start_write_file ()

§ start_read_dgram ()

§ start_write_dgram ()

§ start_accept ()

§ start_connect ()

§ cancel_aio ()

8.3.Factory pure virtual methods of AIO_Wait_Strategy interface

The AIO_Provider should also support methods-executors for waiting for AIO completion. However, some AIO_Providers can have more than one options how to wait and detect AIO completions. Instead of declaration of “wait for AIO completion” methods in AIO_Provider interface, we declare factory method for AIO_Wait_Strategy interface, which is responsible for wait functions. It is up to the AIO_Provider to implement this interface itself or create the instance of some other class.

§ create_wait_strategy () - creates the wait strategy

destroy_wait_strategy () – destroys the wait strategy

Usually, each provider has one wait strategy. Currently, only the POSIX_STD_Provider has several wait strategies.

9. AIO_Wait_Strategy interface

The AIO_Wait_Strategy is an abstract interface, which is used by AIO_Processor for waiting for AIO_Completions. Logically the AIO_Wait_Strategy is a part of AIO_Provider. The reason why the wait for AIO completions methods were placed in the separate interface is that the AIO_Provider could have more that one way to detect AIO completions.

9.1.Executive pure virtual methods

prepare_to_wait () – this method is called by AIO_Processor only in leader thread context. This is the last method, which is called with locked common Proactor mutex before calling wait (). It is safe to store necessary information into member variables that shared between: prepare_to_wait(),wait(), get_next_completed_result(). The AIO_Processor guarantees that other threads never/ will call any of these methods.

wait () - the core method. Wait for AIO completions during specified time interval. This method is called with unlocked common mutex (Unlock Point 3)

get_next_completed_result () – this method is called after wait() method. Returns next completed result or zero if there are no more completions.

9.2.Notification pure virtual methods

init () – called once after creation
fini () – called once before deletion

on_aio_start () - this method is called when asynchronous operation is about to start. The AIO_Wait_Strategy can do some actions to set up OS AIO completion notification mechanism. For example, update aiocb.aio_sigevent field.

9.3.Factory pure virtual methods of AIO_Interrupt_Strategy interface

The AIO_Wait_Strategy should also support methods that allow us interrupt waiting for AIO completion. In general “how is to interrupt” depends on “how to wait”. In addition, it is possible to interrupt by means of different mechanisms.

Similarly to the mechanism in which AIO_Provider creates the implementation of AIO_Wait_Strategy, the AIO_Wait_Strategy creates the AIO_Interrupt_Strategy, which is used by AIO_Processor and AIO_Dispatcher when it is required to interrupt leader.

It is up to the AIO_Wait_Strategy to implement AIO_Interrupt_Strategy interface itself or create the instance of some other class.

create_interrupt_strategy ()
destroy_interrupt_strategy()

10. AIO_Interrupt_Strategy interface

The AIO_Interrupt_Strategy is an abstract interface that is used to interrupt leader thread, when the latter waits for AIO completions.

There are two concrete implementations of AIO_Interrupt_Strategy suitable for any AIO_Wait_Strategy. They are Interrupt_Pipe_Strategy and Interrupt_Signal_Strategy.

10.1. Executive pure virtual methods

interrupt () - the AIO_Dispatcher and AIO_Processor have to interrupt a leader in the following cases:

ü There are no followers and the new post-completed result was put in AIO_Dispatcher completion queue

ü The new AIO was issued from non-leader thread and leader should update its wait state. Example, if leader uses aio_suspend (), then aio_suspend () parameters list should be synchronized with collection of started operations each time start is occurred. The same is true for select-based provider.

10.2. Notification pure virtual methods

init () – called once after creation
fini () – called once before deletion
on_leader_activated () - this virtual function is called by AIO_Processor the current thread obtained the leadership and is about to wait for AIO completions. Interrupt Strategy can do some preparations: set up signal mask for the thread, or start AIO on notification pipe.
on_leader_deactivated() – this virtual function is called by AIO_Processor the current thread is about to release the leadership.

10.3. Interrupt_Pipe_Strategy

This is default strategy and it is based on using notification pipe – the former ACE_Notification_Pipe_Manager in old POSIX_Proactor.

The Interrupt_Pipe_Strategy has some optimisations in comparison with the ACE_Notification_Pipe_Manager. The AIO result used for read operation by Interrupt_Pipe_Strategy has special internal flag that prevents the result from passing to the completion queue, i.e. to the AIO_Dispatcher. Therefore, this result belongs to the Interrupt_Pipe_Strategy and can be re-used. There are no extra expenses for new/delete operators and for dispatching the result. The Interrupt_Pipe_Strategy starts AIO read operation only from on_leader_activated() callback method and only if the operation is not already started, i.e. result is not in STARTED state.

10.4. Interrupt_Signal_Strategy

This strategy is based on sending the specified signal to the leader-thread.

The signal number can be any number specified by user, not necessary RT signal number. For example, it can be SIGUSR1. This strategy calls thr_kill () or pthread_kill () API functions, therefore, the signal is delivered only to specified leader-thread and never comes into another thread. The method interrupt () is always called under locked common Proactor mutex. Therefore, we know which thread is the leader and the leader cannot release a leadership at this point as release_leadership() method is also called under locked common mutex. This strategy installs doing nothing signal handler for the specified signal number. The goal of interrupt function is to provoke EINTR error in AIO_Wait_Strategy::wait() method.

This interrupt strategy works fine on all POSIX platforms, including Linux, and can be suitable for any AIO_Provider that recognizes EINTR error. Currently, all providers can use this strategy.

Note 1. Unfortunately, to be ready to use the Interrupt_Signal_Strategy the AIO_Provider should call OS API functions: aio_suspend (), aiowait (), select (), etc… directly. AIO_Provider cannot use ACE_OS wrappers for such functions because ACE_OS wrappers restore system calls on error EINTR.

Note 2. It would be nice if we could specify for ACE_OS wrapper function how to process EINTR – ignore or return. In addition, it would be nice if the ACE_OS::select () and other functions with timeouts could restore system call with decremented amount of time after EINTR case.

11. Implementations of AIO_Provider and Wait_Strategies.

11.1. POSIX_STD_Provider

This provider is based on standard POSIX AIO API:

ü aio_read ()

ü aio_write ()

ü aio_cancel ()

It can use several AIO_Wait_Strategies, described below

11.1.1. POSIX_STD_Strategy

This is default strategy for the POSIX_STD_Provider.

The POSIX_STD_Strategy is based on aio_suspend () POSIX API call similar to the former POSIX_AIOCB_Proactor.

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.1.2. POSIX_SIG_Strategy

This wait strategy is based on sigwaitinfo () POSIX API call similar to the former POSIX_SIG_Proactor.

It sets up aiocb.aio_sigevent field to the SIGEV_SIGNAL value during on_aio_start () callback.

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.1.3. POSIX_SGI_Strategy

This wait strategy is based on system notification via callback function registered in aiocb – the former POSIX_CB_Proactor and currently supported only by SGI IRIX AIO system.

It sets up aiocb.aio_sigevent field to the SIGEV_CALLBACK value during on_aio_start () callback.

Notification call back function signals on the special semaphore to wake up the wait () method. This wait strategy does not require the external interrupt strategy as it also implements AIO_Interrupt_Strategy interface (signal on the same semaphore).

Possible interrupt strategies are:

POSIX_SGI_Strategy (itself, default)

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.2. SUN_Provider

This provider can be used only on SUN OS platforms as it is based on SUN specific AIO API (former SUN_Proactor):

ü aioread ()

ü aiowrite ()

ü aiocancel ()

11.2.1. SUN_Strategy

This is only one and default strategy for the SUN_Provider and it is based on aiowait () SUN API call:

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.3. SELECT_Provider

This new AIO_Provider emulates any ACE AIO operation via select () call.

This is only one Provider that can do asynchronous accepts, connects and read/write datagram operations. It can work on any platform regardless of AIO support.

11.3.1. SELECT_Strategy

It uses select () and performs real operations when file descriptor is ready to read/write for pending operations.

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.3.2. POLL_Strategy

It uses poll () and performs real operations when file descriptor is ready to read/write for pending operations.

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.3.3. LinuxRT_Strategy (Linux 2.4)

Obviously, this wait strategy works only on Linux kernel 2.4.

It uses Linux Real Time (RT) signals to detect readiness for real I/O operations (see http://www.kegel.com/c10k.html)

It can work only with Dedicated AIO_Processor as RT signals should be delivered to the same thread. This strategy is the best choice to process more 1024 file descriptors simultaneously.

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.3.4. Event_Poll_Strategy (Linux 2.6)

This is the best wait strategy for Linux with kernel 2.6.

(See http://www.kegel.com/c10k.html)

It uses new Linux 2.6 epoll interface for demultiplexing I/O.

It allows to process more 1024 file descriptors simultaneously.

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.3.5. Dev_Poll_Strategy (Sun OS 5.8+)

Currently, this strategy works on SunOS 5.8 and higher systems.

It uses /dev/poll interface instead of select(). This strategy ignores flag START_FROM_ANY_THREAD and allows start aio only from leader thread. The reason is that /dev/poll interface does not permit “declare_interest”/write() operation while another thread is waiting for events via ioctl(). See AIO_Processor. This strategy should be faster select() if you have to process more 1000 file descriptors.

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.3.6. Kqueue_Strategy (Free BSD)

Currently it is under development.

It will use kqueue () and performs real operations when file descriptor is ready to read/write for pending operations.

Possible interrupt strategies are:

Interrupt_Pipe_Strategy

Interrupt_Signal_Strategy

11.4. LINUX_Provider (Linux 2.6)

This provider can be used only on Linux kernel 2.6 as it is based on Linux 2.6 specific AIO API :

ü io_submit () - to start aio

ü io_cancel () - to cancel aio

11.4.1. LINUX_NAIO_Strategy

This is only one and default strategy for the LINUX_Provider and it is based on io_getevents() API call:

Possible interrupt strategies are:

Interrupt_Signal_Strategy

Interrupt_Pipe_Strategy (currently does not work – see below the reason)

Note! This experimental provider and should not be used for production applications. Now, Linux kernel 2.6 did not support aio for pipes. The sockets aio is performed in synchronously (see http://lse.sourceforge.net/io/aio.html). In the future, we hope LINUX_Provider will be the main and best choice.

Currently, the best choice for Linux 2.6 is still SELECT_Provider with Event_Poll_Strategy.

12. AIO_Config

This class is designed to specify all desired parameters for Proactor at run-time. More precisely, AIO_Config describes the parameters of AIO_Processor. The POSIX_Proactor constructor has just two parameters; each of them is reference to the constant AIO_Config object. The first AIO_Config specifies the first AIO_Processor and the second AIO_Config specifies the second AIO_Processor.

Note! All classes and interfaces: AIO_Processor, AIO_Provider, AIO_Wait_Strategy, AIO_Interrupt_Strategy receive the reference to the AIO_Config object as a first parameter in their constructors.

12.1. Member variables

§ max_num_aio_ maximum number of simultaneously started asynchronous operations supported by this pair AIO_Processor – AIO_Provider.

§ processor_type_ defined by enumeration

enum Processor_Type

{

PCT_NONE, // Dummy Processor, not exist

PCT_SHARED, // Shared Processor

PCT_DEDICATED // Dedicated Processor

};

§ processor_flages_ reserved for the future extensions

§ provider_type_ defined by enumeration

enum Provider_Type

{

PVT_NONE, // Provider not exist

PVT_POSIX, // POSIX_STD_Provider

PVT_SUN, // SUN_Provider

PVT_SELECT // SELECT_Provider

PVT_LINUX // LINUX_Provider

};

§ provider_flags_ combination of following bits-flags

enum Provider_Flags

{

PVF_AIO_START_ANY_THREAD,

// ON = start AIO is allowed from any thread,

// OFF = start AIO is allowed from only in leader thread

PVF_AIO_START_INTERRUPT_WAIT

// ON = start AIO should interrupt leader

// OFF = wait for AIO completions is independent of start AIO

};

Note 1.PVF_AIO_START_INTERRUPT_WAIT is valid only if PVF_AIO_START_ANY_THREAD is set.

Note 2. Some AIO_Wait_Strategies automatically works in mode AIO_START_INTERRUPT_WAIT regardless of provider_flags_ value.

Examples. POSIX_STD_Strategy (aio_suspend) and SELECT_Strategy (select) always work in this mode. Other strategies can work without this flag.

§ ws_type_ - AIO_Wait_Strategy type, defined by enumeration

enum Wait_Strategy_Type

{

WST_NATIVE, // use default Wait_Strategy

// default strategies maked with *

WST_AIOCB, // * aiosuspend ()

WST_SIGWAIT, // wait for RT completion signals

WST_SGI // wait notification from callback function

WST_SUN // * aiowait for SUN Provider

WST_SELECT // *

WST_POLL //

WST_DEV_POLL // interface /dev/poll (SUN OS 5.8)

WST_EVENT_POLL // epoll for Linux 2.6+

WST_LINUX_RT // Linux RT signals for Linux 2.4

WST_LINUX_NAIO // * Linux 2.6 io_getevents ()

WST_KQUEUE // FreeBSD kqueue

};

Possible combinations:

AIO_Provider AIO_Wait_Strategy

----------------- -----------------------------------

PVT_POSIX WST_AIOCB = POSIX_STD_Strategy

PVT_POSIX WST_SIGWAIT = POSIX_SIG_Strategy

PVT_POSIX WST_SGI = POSIX_SGI_Strategy

PVT_SUN WST_SUN = SUN_Strategy

PVT_SELECT WST_SELECT = SELECT_Strategy

PVT_SELECT WST_POLL = POLL_Strategy

PVT_SELECT WST_DEV_POLL = Dev_Poll_Strategy

PVT_SELECT WST_EVENT_POLL = Event_Poll_Strategy

PVT_SELECT WST_LINUX_RT = LinuxRT_Strategy

PVT_SELECT WST_KQUEUE = Kqueue_Strategy

PVT_LINUX WST_LINUX_NAIO = Linux_NAIO_Strategy

If the value of ws_type_ is inappropriate for the AIO_Provider, it can ignore it and make a choice by its own.

§ ws_sig_num_ - any integer, can be used for AIO_Wait_Strategy as an extra parameter. Currently, only POSIX_SIG_Strategy considers this field as a RT signal number used for AIO completions.

§ is_type_ - AIO_Interrupt_Strategy type, defined by enumeration

enum Interrupt_Strategy_Type // Interrupt Strategy Types

{

IST_DEFAULT // Default,

IST_PIPE // Interrupt_Pipe_Strategy

IST_SIGNAL // Interrupt_Signal_Strategy

IST_SGI // POSIX_SGI_Strategy

};

If the value of is_type_ is inappropriate for the AIO_Wait_Strategy it can ignore it and make a choice by its own.

§ is_sig_num_ - any integer, can be used for AIO_Interrupt_Strategy as an extra parameter. Currently, Interrupt_Signal_Strategy considers this field as a signal number used to interrupt leader.

12.2. Factory methods

It is very convenient to provide the AIO_Config with factory methods that can create well-known implementations of AIO_Processor, AIO_Provider, AIO_Wait_Strategy, and AIO_Interrupt_Strategy. Currently, these methods are:

§ create_interrupt_strategy ()

§ create_provider ()

§ create_processor ()

TODO: add persistence methods to the AIO_Config to allow read Proactor configuration from XML.

13. Test Results

The modified versions of Proactor_Test and TP_Reactor_Test were selected as testing tools. The modifications include:

Statistics performance information was added
TP_Reactor_Test now uses one lock (handler and message queues) instead of two locks
TP_Reactor_Test was improved by conversion “level-triggered” event handlers to the “edge-triggered” event-handlers.
Both tests should finish their work when time is expired
Both tests have a new parameter –k = the size of block to send
Both tests have a new parameter –w = the size of flow-control window for full-duplex transmission. If w is 0, then it means half-duplex mode.
Both tests have parameter –d. It defines number of microseconds for emulation long-time callback. If d is 0, then callback has no delay.

The right column shows the average performance speed calculated as total sum of all transmitted bytes in all sessions divided by time of test execution.

--- Sessions=1 Threads=1 Blk=512 Win=1024 Delay=0 Time=2 sec

TP_Reactor 25955584.000000

Proactor=Select leader=SHARED 34612480.000000

Proactor=Select leader=DEDICATED 21715968.000000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 26846720.000000

--- Sessions=1 Threads=1 Blk=512 Win=1024 Delay=10 Time=2 sec

TP_Reactor 1574912.000000

Proactor=Select leader=SHARED 1575168.000000

Proactor=Select leader=DEDICATED 1575424.000000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 1575424.000000

--- Sessions=1 Threads=1 Blk=8192 Win=8192 Delay=0 Time=2 sec

TP_Reactor 206524416.000000

Proactor=Select leader=SHARED 280158208.000000

Proactor=Select leader=DEDICATED 198004736.000000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 217325568.000000

--- Sessions=1 Threads=5 Blk=8192 Win=8192 Delay=0 Time=1 sec

TP_Reactor 99426304.000000

Proactor=Select leader=SHARED 167583744.000000

Proactor=Select leader=DEDICATED 103571456.000000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 75456512.000000

--- Sessions=1 Threads=5 Blk=8192 Win=8192 Delay=10 Time=1 sec

TP_Reactor 16801792.000000

Proactor=Select leader=SHARED 16818176.000000

Proactor=Select leader=DEDICATED 16801792.000000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 16818176.000000

--- Sessions=40 Threads=5 Blk=8192 Win=8192 Delay=10 Time=5 sec

TP_Reactor 68858675.200000

Proactor=Select leader=SHARED 68470374.400000

Proactor=Select leader=DEDICATED 83825459.200000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 83568230.400000

--- Sessions=40 Threads=5 Blk=8192 Win=0 Delay=10 Time=2 sec

TP_Reactor 79896576.000000

Proactor=Select leader=SHARED 84480000.000000

Proactor=Select leader=DEDICATED 73015296.000000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 83648512.000000

--- Sessions=100 Threads=4 Blk=1024 Win=1024 Delay=10 Time=5 sec

TP_Reactor 7290880.000000

Proactor=Select leader=SHARED 8189132.800000

Proactor=Select leader=DEDICATED 8431411.200000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 8413593.600000

--- Sessions=100 Threads=5 Blk=8192 Win=8192 Delay=10 Time=5 sec

TP_Reactor 69946572.800000

Proactor=Select leader=SHARED 64448102.400000

Proactor=Select leader=DEDICATED 83247104.000000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 83699302.400000

--- Sessions=100 Threads=5 Blk=8192 Win=8192 Delay=0 Time=5 sec

TP_Reactor 91511193.600000

Proactor=Select leader=SHARED 130216755.200000

Proactor=Select leader=DEDICATED 114596249.600000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 126651596.800000

--- Sessions=100 Threads=5 Blk=8192 Win=0 Delay=10 Time=5 sec

TP_Reactor 61944627.200000

Proactor=Select leader=SHARED 77760102.400000

Proactor=Select leader=DEDICATED 69690982.400000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 79906406.400000

--- Sessions=100 Threads=5 Blk=8192 Win=0 Delay=0 Time=5 sec

TP_Reactor 85606400.000000

Proactor=Select leader=SHARED 122698137.600000

Proactor=Select leader=DEDICATED 101944524.800000

Proactor=Linux_RT leader=DEDICATED, start_in=ANY 120694374.400000