Say you have a very large array of elements, and you want to shift a large sub array of it from position a..b to c..d. This sounds simple enough doesn’t it? Just shift the elements one by one in serial, or several chunks in parallel.
Well it is… as long as you are doing it in serial, and mostly while doing it in parallel as well. However the is an easy to miss issue while parallelizing this. Everything goes perfectly fine unless the sub ranges a..b and c..d share a common region.
The issue arises due to a race condition in the ‘danger zone’ where elements from the beginning of the sub array can overwrite the elements in the danger zone before they are shifted, corrupting the data.
This may seem like a trivial issue, but it is exactly what I faced while trying to implement the TimSort algorithm using the Chapel Language. The issue only showed up while dealing with huge array sizes, making it agonizing to debug.
TimSort is a hybrid sort combining several sorting techniques like insertion and merge sort. Within, it has a phase known as Galloping Phase where, when initiated while merging two sub arrays, the algorithm starts to move larger chunks of elements per comparison.
in Chapel, the syntax of an array copy is:
Since Chapel is a parallel programming language, this copy is done in parallel. Hence, if array1 and array2 are the same, and a..b overlaps with c..d, we have a problem.
- Make the copy in serial. In the end correctness > performance.
- Copy the entire array into a temporary and then to the new location in parallel. But then there would be the cost of synchronizing the threads after copying into the temp.
- Copy only the overlapping region in serial, and then the rest in parallel.
In the latter 2 cases, the cost of thread creation should also be taken into account. The size of the sub array to be shifted may not be big enough to pay off. In the end, due to this reason, I chose to keep the shifting to be done in serial.