GNAT Zero Cost Exceptions and Asynchronous Task Aborting

February 12th, 2019

Recently, Diana Coman has discovered a problem in the asynchronous task aborting code of GNAT runtime, and has kindly provided test cases, which were really helpful to see what is going wrong. The debug tool that is most helpful when dealing with misbehaving Linux software is strace:

$ strace -f ./adatests
execve("./adatests", ["./adatests"], [/* 66 vars */]) = 0
arch_prctl(ARCH_SET_FS, 0x7f2efdda9ba8) = 0
set_tid_address(0x7f2efdda9be0)         = 23157
mprotect(0x7f2efdda6000, 4096, PROT_READ) = 0
mprotect(0x64b000, 4096, PROT_READ)     = 0
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
rt_sigaction(SIGABRT, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, NULL, 8) = 0
rt_sigaction(SIGFPE, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, NULL, 8) = 0
rt_sigaction(SIGILL, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, NULL, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, NULL, 8) = 0
sigaltstack({ss_sp=0x6530e0, ss_flags=0, ss_size=16384}, NULL) = 0
rt_sigaction(SIGSEGV, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_STACK|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, NULL, 8) = 0
clock_getres(CLOCK_REALTIME, {tv_sec=0, tv_nsec=1}) = 0
sched_getaffinity(0, 128, [0, 1])       = 64
rt_sigaction(SIGFPE, {sa_handler=0x40f310, sa_mask=[ILL BUS FPE SEGV], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, 8) = 0
rt_sigaction(SIGILL, {sa_handler=0x40f310, sa_mask=[ILL BUS FPE SEGV], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, 8) = 0
rt_sigaction(SIGSEGV, {sa_handler=0x40f310, sa_mask=[ILL BUS FPE SEGV], sa_flags=SA_RESTORER|SA_STACK|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_STACK|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, 8) = 0
rt_sigaction(SIGBUS, {sa_handler=0x40f310, sa_mask=[ILL BUS FPE SEGV], sa_flags=SA_RESTORER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, 8) = 0
gettid()                                = 23157
sigaltstack({ss_sp=0x6530e0, ss_flags=0, ss_size=16384}, NULL) = 0
rt_sigaction(SIGABRT, {sa_handler=0x406e50, sa_mask=[], sa_flags=SA_RESTORER, sa_restorer=0x7f2efdb61f5d}, {sa_handler=0x42d690, sa_mask=[], sa_flags=SA_RESTORER|SA_RESTART|SA_NODEFER|SA_SIGINFO, sa_restorer=0x7f2efdb61f5d}, 8) = 0
sched_getaffinity(0, 128, [0, 1])       = 64
sched_getaffinity(0, 128, [0, 1])       = 64
sched_setscheduler(23157, SCHED_OTHER, [0]) = 0
sched_getaffinity(0, 128, [0, 1])       = 64
fstat(2, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
fstat(0, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
fstat(1, {st_mode=S_IFCHR|0620, st_rdev=makedev(136, 3), ...}) = 0
writev(1, [{iov_base="", iov_len=0}, {iov_base="Creating  10 tasks.\n", iov_len=20}], 2Creating  10 tasks.
) = 20
brk(NULL)                               = 0x10f2000
brk(0x10f5000)                          = 0x10f5000
rt_sigprocmask(SIG_UNBLOCK, [RT_1 RT_2], NULL, 8) = 0
mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efd911000
mprotect(0x7f2efd912000, 2117632, PROT_READ|PROT_WRITE) = 0
clone(strace: Process 23158 attached
child_stack=0x7f2efdb16ab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efdb16b20, tls=0x7f2efdb16ae8) = 23158
[pid 23158] gettid()                    = 23158
[pid 23158] prctl(PR_SET_NAME, "a(1)" 
[pid 23157] sched_setscheduler(23158, SCHED_OTHER, [0] 
[pid 23158] <... prctl resumed> )       = 0
[pid 23157] <... sched_setscheduler resumed> ) = 0
[pid 23158] sigaltstack({ss_sp=0x7f2efdb12a70, ss_flags=0, ss_size=16384},  
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULL 
[pid 23158] <... sigaltstack resumed> NULL) = 0
[pid 23158] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1 
[pid 23157] <... futex resumed> )       = 0
[pid 23158] <... futex resumed> )       = 1
[pid 23157] futex(0x7f2efddaa31c, FUTEX_WAIT_PRIVATE, 2147483664, NULL 
[pid 23158] futex(0x7f2efddaa31c, FUTEX_WAKE_PRIVATE, 1 
[pid 23157] <... futex resumed> )       = -1 EAGAIN (Resource temporarily unavailable)
[pid 23158] <... futex resumed> )       = 0
[pid 23157] brk(0x10f6000)              = 0x10f6000
[pid 23157] brk(0x10f9000)              = 0x10f9000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efd70b000
[pid 23157] mprotect(0x7f2efd70c000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(strace: Process 23159 attached
child_stack=0x7f2efd910ab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efd910b20, tls=0x7f2efd910ae8) = 23159
[pid 23159] gettid()                    = 23159
[pid 23159] prctl(PR_SET_NAME, "a(2)")  = 0
[pid 23159] sigaltstack({ss_sp=0x7f2efd90ca70, ss_flags=0, ss_size=16384},  
[pid 23157] sched_setscheduler(23159, SCHED_OTHER, [0] 
[pid 23159] <... sigaltstack resumed> NULL) = 0
[pid 23157] <... sched_setscheduler resumed> ) = 0
[pid 23159] futex(0x64f244, FUTEX_WAIT_PRIVATE, 2147483664, NULL 
[pid 23157] futex(0x64f244, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23159] <... futex resumed> )       = 0
[pid 23157] brk(0x10fc000)              = 0x10fc000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efd505000
[pid 23157] mprotect(0x7f2efd506000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(child_stack=0x7f2efd70aab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efd70ab20, tls=0x7f2efd70aae8) = 23160
[pid 23157] sched_setscheduler(23160, SCHED_OTHER, [0]) = 0
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 23160 attached
 
[pid 23160] gettid()                    = 23160
[pid 23160] prctl(PR_SET_NAME, "a(3)")  = 0
[pid 23160] sigaltstack({ss_sp=0x7f2efd706a70, ss_flags=0, ss_size=16384}, NULL) = 0
[pid 23160] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23157] <... futex resumed> )       = 0
[pid 23157] brk(0x10fd000)              = 0x10fd000
[pid 23157] brk(0x1100000)              = 0x1100000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efd2ff000
[pid 23157] mprotect(0x7f2efd300000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(child_stack=0x7f2efd504ab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efd504b20, tls=0x7f2efd504ae8) = 23161
[pid 23157] sched_setscheduler(23161, SCHED_OTHER, [0]) = 0
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 23161 attached
 
[pid 23161] gettid()                    = 23161
[pid 23161] prctl(PR_SET_NAME, "a(4)")  = 0
[pid 23161] sigaltstack({ss_sp=0x7f2efd500a70, ss_flags=0, ss_size=16384}, NULL) = 0
[pid 23161] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23157] <... futex resumed> )       = 0
[pid 23157] futex(0x7f2efddaa31c, FUTEX_WAIT_PRIVATE, 2147483664, NULL 
[pid 23161] futex(0x7f2efddaa31c, FUTEX_WAKE_PRIVATE, 1 
[pid 23157] <... futex resumed> )       = 0
[pid 23161] <... futex resumed> )       = 1
[pid 23157] brk(0x1103000)              = 0x1103000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efd0f9000
[pid 23157] mprotect(0x7f2efd0fa000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(child_stack=0x7f2efd2feab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efd2feb20, tls=0x7f2efd2feae8) = 23162
[pid 23157] sched_setscheduler(23162, SCHED_OTHER, [0]) = 0
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 23162 attached
 
[pid 23162] gettid()                    = 23162
[pid 23162] prctl(PR_SET_NAME, "a(5)")  = 0
[pid 23162] sigaltstack({ss_sp=0x7f2efd2faa70, ss_flags=0, ss_size=16384}, NULL) = 0
[pid 23162] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23157] <... futex resumed> )       = 0
[pid 23157] brk(0x1106000)              = 0x1106000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efcef3000
[pid 23157] mprotect(0x7f2efcef4000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(child_stack=0x7f2efd0f8ab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efd0f8b20, tls=0x7f2efd0f8ae8) = 23163
[pid 23157] sched_setscheduler(23163, SCHED_OTHER, [0]) = 0
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 23163 attached
 
[pid 23163] gettid()                    = 23163
[pid 23163] prctl(PR_SET_NAME, "a(6)")  = 0
[pid 23163] sigaltstack({ss_sp=0x7f2efd0f4a70, ss_flags=0, ss_size=16384}, NULL) = 0
[pid 23163] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1 
[pid 23157] <... futex resumed> )       = 0
[pid 23163] <... futex resumed> )       = 1
[pid 23157] futex(0x7f2efddaa31c, FUTEX_WAIT_PRIVATE, 2147483664, NULL 
[pid 23163] futex(0x7f2efddaa31c, FUTEX_WAKE_PRIVATE, 1 
[pid 23157] <... futex resumed> )       = 0
[pid 23163] <... futex resumed> )       = 1
[pid 23157] brk(0x1107000)              = 0x1107000
[pid 23157] brk(0x110a000)              = 0x110a000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efcced000
[pid 23157] mprotect(0x7f2efccee000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(child_stack=0x7f2efcef2ab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efcef2b20, tls=0x7f2efcef2ae8) = 23164
[pid 23157] sched_setscheduler(23164, SCHED_OTHER, [0]) = 0
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 23164 attached
 
[pid 23164] gettid()                    = 23164
[pid 23164] prctl(PR_SET_NAME, "a(7)")  = 0
[pid 23164] sigaltstack({ss_sp=0x7f2efceeea70, ss_flags=0, ss_size=16384}, NULL) = 0
[pid 23164] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23157] <... futex resumed> )       = 0
[pid 23157] brk(0x110d000)              = 0x110d000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efcae7000
[pid 23157] mprotect(0x7f2efcae8000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(child_stack=0x7f2efccecab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efccecb20, tls=0x7f2efccecae8) = 23165
[pid 23157] sched_setscheduler(23165, SCHED_OTHER, [0]) = 0
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 23165 attached
 
[pid 23165] gettid()                    = 23165
[pid 23165] prctl(PR_SET_NAME, "a(8)")  = 0
[pid 23165] sigaltstack({ss_sp=0x7f2efcce8a70, ss_flags=0, ss_size=16384}, NULL) = 0
[pid 23165] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23157] <... futex resumed> )       = 0
[pid 23157] futex(0x7f2efddaa31c, FUTEX_WAIT_PRIVATE, 2147483664, NULL 
[pid 23165] futex(0x7f2efddaa31c, FUTEX_WAKE_PRIVATE, 1 
[pid 23157] <... futex resumed> )       = 0
[pid 23165] <... futex resumed> )       = 1
[pid 23157] brk(0x110e000)              = 0x110e000
[pid 23157] brk(0x1111000)              = 0x1111000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efc8e1000
[pid 23157] mprotect(0x7f2efc8e2000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(child_stack=0x7f2efcae6ab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efcae6b20, tls=0x7f2efcae6ae8) = 23166
[pid 23157] sched_setscheduler(23166, SCHED_OTHER, [0]) = 0
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 23166 attached
 
[pid 23166] gettid()                    = 23166
[pid 23166] prctl(PR_SET_NAME, "a(9)")  = 0
[pid 23166] sigaltstack({ss_sp=0x7f2efcae2a70, ss_flags=0, ss_size=16384}, NULL) = 0
[pid 23166] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1) = 1
[pid 23157] <... futex resumed> )       = 0
[pid 23157] brk(0x1114000)              = 0x1114000
[pid 23157] mmap(NULL, 2121728, PROT_NONE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f2efc6db000
[pid 23157] mprotect(0x7f2efc6dc000, 2117632, PROT_READ|PROT_WRITE) = 0
[pid 23157] clone(child_stack=0x7f2efc8e0ab8, flags=CLONE_VM|CLONE_FS|CLONE_FILES|CLONE_SIGHAND|CLONE_THREAD|CLONE_SYSVSEM|CLONE_SETTLS|CLONE_PARENT_SETTID|0x400000, parent_tidptr=0x7f2efc8e0b20, tls=0x7f2efc8e0ae8) = 23167
[pid 23157] sched_setscheduler(23167, SCHED_OTHER, [0]) = 0
[pid 23157] futex(0x7ffee67e6cd4, FUTEX_WAIT_PRIVATE, 2, NULLstrace: Process 23167 attached
 
[pid 23167] gettid()                    = 23167
[pid 23167] prctl(PR_SET_NAME, "a(10)") = 0
[pid 23167] sigaltstack({ss_sp=0x7f2efc8dca70, ss_flags=0, ss_size=16384}, NULL) = 0
[pid 23167] futex(0x7ffee67e6cd4, FUTEX_WAKE_PRIVATE, 1 
[pid 23157] <... futex resumed> )       = 0
[pid 23167] <... futex resumed> )       = 1
[pid 23157] futex(0x7f2efddaa31c, FUTEX_WAIT_PRIVATE, 2147483664, NULL 
[pid 23167] futex(0x7f2efddaa31c, FUTEX_WAKE_PRIVATE, 1 
[pid 23157] <... futex resumed> )       = 0
[pid 23167] <... futex resumed> )       = 1
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="PASS: created max_tasks.\n", iov_len=25}], 2PASS: created max_tasks.
) = 25
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="Aborting  10 tasks.\n", iov_len=20}], 2Aborting  10 tasks.
) = 20
[pid 23157] tkill(23158, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  1 NOT aborted.\n", iov_len=27}], 2FAIL: Task  1 NOT aborted.
) = 27
[pid 23157] tkill(23159, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  2 NOT aborted.\n", iov_len=27}], 2FAIL: Task  2 NOT aborted.
) = 27
[pid 23157] tkill(23160, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  3 NOT aborted.\n", iov_len=27}], 2FAIL: Task  3 NOT aborted.
) = 27
[pid 23157] tkill(23161, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  4 NOT aborted.\n", iov_len=27}], 2FAIL: Task  4 NOT aborted.
) = 27
[pid 23157] tkill(23162, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  5 NOT aborted.\n", iov_len=27}], 2FAIL: Task  5 NOT aborted.
) = 27
[pid 23157] tkill(23163, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  6 NOT aborted.\n", iov_len=27}], 2FAIL: Task  6 NOT aborted.
) = 27
[pid 23157] tkill(23164, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  7 NOT aborted.\n", iov_len=27}], 2FAIL: Task  7 NOT aborted.
) = 27
[pid 23157] tkill(23165, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  8 NOT aborted.\n", iov_len=27}], 2FAIL: Task  8 NOT aborted.
) = 27
[pid 23157] tkill(23166, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  9 NOT aborted.\n", iov_len=27}], 2FAIL: Task  9 NOT aborted.
) = 27
[pid 23157] tkill(23167, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  10 NOT aborted.\n", iov_len=28}], 2FAIL: Task  10 NOT aborted.
 
[pid 23167] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=23157, si_uid=1000} ---
[pid 23157] <... writev resumed> )      = 28
[pid 23161] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=23157, si_uid=1000} ---
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: after abort:  10 tasks ali"..., iov_len=36}], 2FAIL: after abort:  10 tasks alive.
) = 36
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="Will abort main program with C's"..., iov_len=38}], 2Will abort main program with C's exit
) = 38
[pid 23157] exit_group(0 
[pid 23167] rt_sigreturn({mask=[]}

[pid 23157] <... exit_group resumed>)   = ?
[pid 23167] <... rt_sigreturn resumed>) = ?
[pid 23167] +++ exited with 0 +++
[pid 23163] +++ exited with 0 +++
[pid 23161] +++ exited with 0 +++
[pid 23158] +++ exited with 0 +++
[pid 23165] +++ exited with 0 +++
[pid 23166] +++ exited with 0 +++
[pid 23164] +++ exited with 0 +++
[pid 23162] +++ exited with 0 +++
[pid 23160] +++ exited with 0 +++
[pid 23159] +++ exited with 0 +++
+++ exited with 0 +++

It's immediately obvious that there is some interval of time between signalling an abort to a task and the signal delivery, so the code for checking if the task has aborted would need some changes.

[pid 23157] tkill(23167, SIGABRT)       = 0
[pid 23157] writev(1, [{iov_base="", iov_len=0}, {iov_base="FAIL: Task  10 NOT aborted.\n", iov_len=28}], 2FAIL: Task  10 NOT aborted.
 
[pid 23167] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=23157, si_uid=1000} ---

It's also clear that the signal is indeed raised in the victim task, and then the signal handler exits.

[pid 23167] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=23157, si_uid=1000} ---
...
[pid 23167] rt_sigreturn({mask=[]}

Looking at the signal handler code System.Task_Primitives.Operations.Abort_Handler, you can see that it exits immediately when the ZCX_By_Default is set:

procedure Abort_Handler (signo : Signal) is
   pragma Unreferenced (signo);

   Self_Id : constant Task_Id := Self;
   Result  : Interfaces.C.int;
   Old_Set : aliased sigset_t;

begin
   --  It's not safe to raise an exception when using GCC ZCX mechanism.
   --  Note that we still need to install a signal handler, since in some
   --  cases (e.g. shutdown of the Server_Task in System.Interrupts) we
   --  need to send the Abort signal to a task.

   if ZCX_By_Default then
      return;
   end if;

   if Self_Id.Deferral_Level = 0
     and then Self_Id.Pending_ATC_Level < Self_Id.ATC_Nesting_Level
     and then not Self_Id.Aborting
   then
      Self_Id.Aborting := True;

      --  Make sure signals used for RTS internal purpose are unmasked

      Result :=
        pthread_sigmask
          (SIG_UNBLOCK,
           Unblocked_Signal_Mask'Access,
           Old_Set'Access);
      pragma Assert (Result = 0);

      raise Standard'Abort_Signal;
   end if;
end Abort_Handler;

What could this ZCX mean? It's the zero cost exception mechanism of GCC, the money quote from GNAT manual being:

Note however that the ZCX run-time does not support asynchronous abort of tasks (abort and select-then-abort constructs) and will instead implement abort by polling points in the runtime. You can also add additional polling points explicitly if needed in your application via pragma Abort_Defer.

What are the ways to get asynchronous abort in GNAT? One alternative is setjump/longjump runtime, which seems to work, at some unclear-yet runtime cost. I decided to try out another option, pragma Polling:

Syntax:

pragma Polling (ON | OFF);

This pragma controls the generation of polling code. This is normally off. If pragma Polling (ON) is used then periodic calls are generated to the routine Ada.Exceptions.Poll. This routine is a separate unit in the runtime library, and can be found in file a-excpol.adb.

Pragma Polling can appear as a configuration pragma (for example it can be placed in the gnat.adc file) to enable polling globally, or it can be used in the statement or declaration sequence to control polling more locally.

A call to the polling routine is generated at the start of every loop and at the start of every subprogram call. This guarantees that the Poll routine is called frequently, and places an upper bound (determined by the complexity of the code) on the period between two Poll calls.

The primary purpose of the polling interface is to enable asynchronous aborts on targets that cannot otherwise support it (for example Windows NT), but it may be used for any other purpose requiring periodic polling. The standard version is null, and can be replaced by a user program. This will require re-compilation of the Ada.Exceptions package that can be found in files a-except.ads and a-except.adb.

A standard alternative unit (in file 4wexcpol.adb in the standard GNAT distribution) is used to enable the asynchronous abort capability on targets that do not normally support the capability. The version of Poll in this file makes a call to the appropriate runtime routine to test for an abort condition.

Note that polling can also be enabled by use of the -gnatP switch. See the section on switches for gcc in the GNAT User’s Guide.

Modify the application:

  task body TestTask is
    Stop: Boolean;
  begin
    pragma Polling(ON);
    loop
      Stop := True;
    end loop;
  end TestTask;

  procedure Create_Tasks(N: in Task_Count) is
  begin
    for I in 1..N loop
      A(I) := new TestTask(I);
    end loop;
  end Create_Tasks;

  procedure Abort_Tasks is
  begin
    -- delay 2.0;
    for I in A'Range loop
      if A(I)/= null then
        abort A(I).all;
      end if;
    end loop;
    delay 2.0;
    for I in A'Range loop
      if A(I)/= null then
        abort A(I).all;
        if A(I)'Terminated and
           (not A(I)'Callable) then
          Put_Line("PASS: Task " & Natural'Image(I) & " aborted.");
        else
          Put_Line("FAIL: Task " & Natural'Image(I) & " NOT aborted.");
        end if;
      end if;
    end loop;
  end Abort_Tasks;

and...

$ ./adatests
Creating  10 tasks.
PASS: created max_tasks.
Aborting  10 tasks.
FAIL: Task  1 NOT aborted.
FAIL: Task  2 NOT aborted.
FAIL: Task  3 NOT aborted.
FAIL: Task  4 NOT aborted.
FAIL: Task  5 NOT aborted.
FAIL: Task  6 NOT aborted.
FAIL: Task  7 NOT aborted.
FAIL: Task  8 NOT aborted.
FAIL: Task  9 NOT aborted.
PASS: Task  10 aborted.
FAIL: after abort:  9 tasks alive.
Will abort main program with C's exit

There is some sort of non-determinism involved. Perhaps related to the task initialization order? That is, task 10 is not fully initialized when it's aborted? Uncommenting first delay makes last created task not abort reliably.

Also, as the documentation above tells us, this pragma is wired to noop on x86_64 Linux (a-excpol.adb). The functioning implementation is in a-excpol-abort.adb. After some messing with GNAT Makefile and rebuilding GNAT (this is still GNAT2017-musl, not ave1's GNAT2016), with same source files:

 # test with one task
Creating  1 tasks.
PASS: created max_tasks.
Aborting  1 tasks.
PASS: Task  1 aborted.
PASS: aborted all tasks.
Will abort main program with C's exit
 # works reliably. test with ten tasks
Creating  10 tasks.
PASS: created max_tasks.
Aborting  10 tasks.
Segmentation fault

Segfault; to be clear, this crash is also non-deterministic, but extremely probable.

A single line of code is responsible for aborting a task:

raise Standard'Abort_Signal;

and the segfault happens inside gcc runtime when executing this line1.

The same effect can be obtained with the following code for task (this is a race condition, increasing the task count to 20 increases its probability a lot).

  task body TestTask is
    Stop: Boolean;
  begin
    raise Constraint_Error;
  end TestTask;

This would require additional investigations, because it is not normal at all. Given that Abort_Handler also aborts the task using the same exception mechanism, removing the highlighed check from the Abort_Handler code should cause the same segfault. One of the immediate questions is whether is the same problem is reproducible with ave1's GNAT; it's not reproducible with Adacore binary distribution. I consider conclusions about what exactly is broken here premature: could be that gcc5 breakage extends into the Ada world as well, or maybe musl implements some functionality is a way unexpected for GNAT runtime. Expect more results in Part 2, or you can always test this yourself.

  1. More precisely, the following assertion fails inside GCC runtime. []

One Response to “GNAT Zero Cost Exceptions and Asynchronous Task Aborting”

  1. [...] by itself) that it won't abort its tasks but also that it does this on the quiet as it were - the abort code is effectively hijacked to do nothing for as long as one uses the default ZCX1. Now why exactly is ZCX the default given that it's not [...]

Leave a Reply