skip to Main Content

I was able to write an even faster sort for integers!
It sorts faster than the array can be generated. It works by declaring an array to be of length equal to the max value of the integer array to be sorted and initialized to zero. Then, the array to be sorted is looped through using its values as an index to the counting array – which increments each time the value is encountered. Subsequently, the counting array is looped through and assigns its index the counted number of times to the input/output array in order.
Code below:

SUBROUTINE icountSORT(arrA, nA)
  ! This is a count sort.  It counts the frequency of
  ! each element in the integer array to be sorted using
  ! an array with a length of MAXVAL(arrA)+1 such that
  ! 0's are counted at index 1, 1's are counted at index 2,
  ! etc.
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA
  INTEGER(KIND=8),DIMENSION(nA),INTENT(INOUT) :: arrA

  INTEGER(KIND=8),ALLOCATABLE,DIMENSION(:) :: arrB
  INTEGER(KIND=8) :: i, j, k, maxA
  INTEGER ::  iStat

  maxA = MAXVAL(arrA)
  ALLOCATE(arrB(maxA+1),STAT=iStat)

  arrB = 0

  DO i = 1, nA
    arrB(arrA(i)+1) = arrB(arrA(i)+1) + 1
  END DO

  k = 1
  DO i = 1, SIZE(arrB)
    DO j = 1, arrB(i)
      arrA(k) = i - 1
      k = k + 1
    END DO
  END DO

END SUBROUTINE icountSORT

Posting more evidence. nlogn predicts too high execution times at large array sizes. Further, the Fortran program posted near the end of this question writes the array (unsorted and sorted) to files and posts the write and sort times. File writing is a known O(n) process. The sort runs faster than the file writing all the way to the largest arrays. If the sort was running at O(nlogn), at some point, the sorting time would cross the write time and become longer at large array sizes. Therefore, it has been shown that this sort routine executes with O(n) time complexity.

I’ve added a complete Fortran program for compilation at the bottom of this post such that the output can be reproduced. The execution times are linear.

More timing data in a clearer format using the code below from a Debian environment in Win 10:

dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ for (( i=100000; i<=50000000; i=2*i )); do ./derrelSORT-example.py $i; done | awk  'BEGIN {print "N      Time(s)"}; {if ($1=="Creating") {printf $4" "} else if ($1=="Sorting" && $NF=="seconds") {print $3}}'
N      Time(s)
100000 0.01
200000 0.02
400000 0.04
800000 0.08
1600000 0.17
3200000 0.35
6400000 0.76
12800000 1.59
25600000 3.02

This code executes linearly with respect to the number of elements (integer example given here). It achieves this by exponentially increasing the size of the sorted chunks as the (merge)sort proceeds. To facilitate the exponentially growing chunks:

  1. The number of iterations needs to be calculated before the sort begins
  2. Indices transformations need to be derived for the chunks (language specific depending upon the indexing protocol) for passage to merge()
  3. Gracefully handle the remainder at the tail of the list when the chunk size is not evenly divisible by a power of 2

With these things in mind and starting, traditionally, by merging pairs of single value arrays, the merged chunks can be grown from 2 to 4 to 8 to 16 to — to 2^n. This single case is the exception that breaks the speed limit of O(nlogn) time complexity for comparative sorts. This routine sorts linearly with respect to the number of elements to be sorted.

Can anyone sort faster? 😉

Fortran Code (derrelSort.f90):

! Derrel Walters © 2019
! These sort routines were written by Derrel Walters ~ 2019-01-23


SUBROUTINE iSORT(arrA, nA)
  ! This implementation of derrelSORT is for integers,
  ! but the same principles apply for other datatypes.
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA
  INTEGER,DIMENSION(nA),INTENT(INOUT) :: arrA

  INTEGER,DIMENSION(nA) :: arrB
  INTEGER(KIND=8) :: lowIDX, highIDX, midIDX
  INTEGER ::  iStat
  INTEGER(KIND=8) :: i, j, A, B, C, thisHigh, mergeSize, nLoops
  INTEGER,DIMENSION(:),ALLOCATABLE :: iterMark
  LOGICAL,DIMENSION(:),ALLOCATABLE :: moreToGo

  arrB = arrA
  mergeSize = 2
  lowIDX = 1 - mergeSize
  highIDX = 0

  nLoops = INT(LOG(REAL(nA))/LOG(2.0))
  ALLOCATE(iterMark(nLoops), moreToGo(nLoops), STAT=iStat)
  moreToGo = .FALSE.
  iterMark = 0

  DO i = 1, nLoops
    iterMark(i) = FLOOR(REAL(nA)/2**i)
    IF (MOD(nA, 2**i) > 0) THEN
      moreToGo(i) = .TRUE.
      iterMark(i) = iterMark(i) + 1
    END IF
  END DO

  DO i = 1, nLoops
      DO j = 1, iterMark(i)
        A = 0
        B = 1
        C = 0
        lowIDX = lowIDX + mergeSize
        highIDX = highIDX + mergeSize
        midIDX = (lowIDX + highIDX + 1) / 2
        thisHigh = highIDX
        IF (j == iterMark(i).AND.moreToGo(i)) THEN
          lowIDX = lowIDX - mergeSize
          highIDX = highIDX - mergeSize
          midIDX = (lowIDX + highIDX + 1) / 2
          A = midIDX - lowIDX
          B = 2
          C = nA - 2*highIDX + midIDX - 1
          thisHigh = nA
        END IF
        CALL imerge(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),    &
                    arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C,   &
                    arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)
        arrA(lowIDX:thisHigh) = arrB(lowIDX:thisHigh)
      END DO
      mergeSize = 2*mergeSize
      lowIDX = 1 - mergeSize
      highIDX = 0
  END DO

END SUBROUTINE iSORT

SUBROUTINE imerge(arrA, nA, arrB, nB, arrC, nC)
  ! This merge is a faster merge.  Array A arrives
  ! just to the left of Array B, and Array C is
  ! filled from both ends simultaneously - while
  ! still preserving the stability of the sort.
  ! The derrelSORT routine is so fast, that
  ! the merge does not affect the O(n) time
  ! complexity of the sort in practice
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC

  INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
  INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
  INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC

  INTEGER(KIND=8) :: i, j, k, x, y, z

  arrC = 0
  i = 1
  j = 1
  k = 1
  x = nA
  y = nB
  z = nC

  DO
    IF (i > x .OR. j > y) EXIT
    IF (arrB(j) < arrA(i)) THEN
      arrC(k) = arrB(j)
      j = j + 1
    ELSE
      arrC(k) = arrA(i)
      i = i + 1
    END IF
    IF (arrA(x) > arrB(y)) THEN
      arrC(z) = arrA(x)
      x = x - 1
    ELSE
      arrC(z) = arrB(y)
      y = y - 1
    END IF
    k = k + 1
    z = z - 1
  END DO

  IF (i <= x) THEN
    DO
      IF (i > x) EXIT
        arrC(k) = arrA(i)
        i = i + 1
        k = k + 1
    END DO
  ELSEIF (j <= y) THEN
    DO
      IF (j > y) EXIT
        arrC(k) = arrB(j)
        j = j + 1
        k = k + 1
    END DO
  END IF
END SUBROUTINE imerge

Times using f2py3 to convert the above fortran file (derrelSORT.f90) into something callable in python. Here is the python code and times it produced (derrelSORT-example.py):

#!/bin/python3

import numpy as np
import derrelSORT as dS
import time as t
import random as rdm
import sys

try:
  array_len = int(sys.argv[1])
except IndexError:
  array_len = 100000000

# Create an array with array_len elements
print(50*'-')
print("Creating array of", array_len, "random integers.")
t0 = t.time()
x = np.asfortranarray(np.array([round(100000*rdm.random(),0)
                      for i in range(array_len)]).astype(np.int32))
t1 = t.time()
print('Creation time:', round(t1-t0, 2), 'seconds')


# Sort the array using derrelSORT
print("Sorting the array with derrelSORT.")
t0 = t.time()
dS.isort(x, len(x))
t1 = t.time()
print('Sorting time:', round(t1-t0, 2), 'seconds')
print(50*'-')

Output from the command line. Please note the times.

dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 1000000
--------------------------------------------------
Creating array of 1000000 random integers.
Creation time: 0.78 seconds
Sorting the array with derrelSORT.
Sorting time: 0.1 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 10000000
--------------------------------------------------
Creating array of 10000000 random integers.
Creation time: 8.1 seconds
Sorting the array with derrelSORT.
Sorting time: 1.07 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 20000000
--------------------------------------------------
Creating array of 20000000 random integers.
Creation time: 15.73 seconds
Sorting the array with derrelSORT.
Sorting time: 2.21 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 40000000
--------------------------------------------------
Creating array of 40000000 random integers.
Creation time: 31.64 seconds
Sorting the array with derrelSORT.
Sorting time: 4.39 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 80000000
--------------------------------------------------
Creating array of 80000000 random integers.
Creation time: 64.03 seconds
Sorting the array with derrelSORT.
Sorting time: 8.92 seconds
--------------------------------------------------
dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ ./derrelSORT-example.py 160000000
--------------------------------------------------
Creating array of 160000000 random integers.
Creation time: 129.56 seconds
Sorting the array with derrelSORT.
Sorting time: 18.04 seconds
--------------------------------------------------

More output:

dwalters@Lapper3:~/PROGRAMMING/DATA-WATER$ for (( i=100000; i<=500000000; i=2*i )); do
> ./derrelSORT-example.py $i
> done
--------------------------------------------------
Creating array of 100000 random integers.
Creation time: 0.08 seconds
Sorting the array with derrelSORT.
Sorting time: 0.01 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 200000 random integers.
Creation time: 0.16 seconds
Sorting the array with derrelSORT.
Sorting time: 0.02 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 400000 random integers.
Creation time: 0.32 seconds
Sorting the array with derrelSORT.
Sorting time: 0.04 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 800000 random integers.
Creation time: 0.68 seconds
Sorting the array with derrelSORT.
Sorting time: 0.08 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 1600000 random integers.
Creation time: 1.25 seconds
Sorting the array with derrelSORT.
Sorting time: 0.15 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 3200000 random integers.
Creation time: 2.57 seconds
Sorting the array with derrelSORT.
Sorting time: 0.32 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 6400000 random integers.
Creation time: 5.23 seconds
Sorting the array with derrelSORT.
Sorting time: 0.66 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 12800000 random integers.
Creation time: 10.09 seconds
Sorting the array with derrelSORT.
Sorting time: 1.35 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 25600000 random integers.
Creation time: 20.25 seconds
Sorting the array with derrelSORT.
Sorting time: 2.74 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 51200000 random integers.
Creation time: 41.84 seconds
Sorting the array with derrelSORT.
Sorting time: 5.62 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 102400000 random integers.
Creation time: 93.19 seconds
Sorting the array with derrelSORT.
Sorting time: 11.49 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 204800000 random integers.
Creation time: 167.55 seconds
Sorting the array with derrelSORT.
Sorting time: 24.13 seconds
--------------------------------------------------
--------------------------------------------------
Creating array of 409600000 random integers.
Creation time: 340.84 seconds
Sorting the array with derrelSORT.
Sorting time: 47.21 seconds
--------------------------------------------------

When the array size doubles, the time doubles – as demonstrated. Thus, Mr. Mischel’s initial assessment was incorrect. The reason why is because that, while the outer loop determines the number of cycles at each chunk size (which is log2(n)), the inner loop counter decreases exponentially as the sort proceeds. The proverbial proof is the pudding, however. The times demonstrate the linearity clearly.

If anyone needs any assistance reproducing the results, please let me know. I’m happy to help.

The Fortran program found at the end of this is an as-is copy of that I wrote in 2019. It is meant to be used on the command-line. To compile it:

  1. Copy the fortran code to a file with an .f90 extension
  2. Compile the code using a command, such as:
gfortran -o derrelSORT-ex.x derrelSORT.f90
  1. Give yourself permission to run the executable:
chmod u+x derrelSORT-ex.x
  1. Execute the program from the command-line with or without an integer argument:
./derrelSORT-ex.x

or

./derrelSORT-ex.x 10000000

The output should look something like this (here, I’ve used a bash c-style loop to call the command repeatedly). Notice that as the array sizes double with each iteration, the execution time also doubles.

SORT-RESEARCH$ for (( i=100000; i<500000000; i=2*i )); do
> ./derrelSORT-2022.x $i
> done

Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:           100000
Time =    0.0000 seconds
Writing Array to rand-in.txt:
Time =    0.0312 seconds
Sorting the Array
Time =    0.0156 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.0469 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:           200000
Time =    0.0000 seconds
Writing Array to rand-in.txt:
Time =    0.0625 seconds
Sorting the Array
Time =    0.0312 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.0312 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:           400000
Time =    0.0156 seconds
Writing Array to rand-in.txt:
Time =    0.1250 seconds
Sorting the Array
Time =    0.0625 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.0938 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:           800000
Time =    0.0156 seconds
Writing Array to rand-in.txt:
Time =    0.2344 seconds
Sorting the Array
Time =    0.1406 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.2031 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:          1600000
Time =    0.0312 seconds
Writing Array to rand-in.txt:
Time =    0.4219 seconds
Sorting the Array
Time =    0.2969 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.3906 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:          3200000
Time =    0.0625 seconds
Writing Array to rand-in.txt:
Time =    0.8281 seconds
Sorting the Array
Time =    0.6562 seconds
Writing Array to rand-sorted-out.txt:
Time =    0.7969 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:          6400000
Time =    0.0938 seconds
Writing Array to rand-in.txt:
Time =    1.5938 seconds
Sorting the Array
Time =    1.3281 seconds
Writing Array to rand-sorted-out.txt:
Time =    1.6406 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:         12800000
Time =    0.2500 seconds
Writing Array to rand-in.txt:
Time =    3.3906 seconds
Sorting the Array
Time =    2.7031 seconds
Writing Array to rand-sorted-out.txt:
Time =    3.2656 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:         25600000
Time =    0.4062 seconds
Writing Array to rand-in.txt:
Time =    6.6250 seconds
Sorting the Array
Time =    5.6094 seconds
Writing Array to rand-sorted-out.txt:
Time =    6.5312 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:         51200000
Time =    0.8281 seconds
Writing Array to rand-in.txt:
Time =   13.2656 seconds
Sorting the Array
Time =   11.5000 seconds
Writing Array to rand-sorted-out.txt:
Time =   13.1719 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:        102400000
Time =    1.6406 seconds
Writing Array to rand-in.txt:
Time =   26.3750 seconds
Sorting the Array
Time =   23.3438 seconds
Writing Array to rand-sorted-out.txt:
Time =   27.0625 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:        204800000
Time =    3.3438 seconds
Writing Array to rand-in.txt:
Time =   53.1094 seconds
Sorting the Array
Time =   47.3750 seconds
Writing Array to rand-sorted-out.txt:
Time =   52.8906 seconds


Derrel Walters © 2019

Demonstrating derrelSORT©
WARNING: This program can produce LARGE files!

Generating random array of length:        409600000
Time =    6.6562 seconds
Writing Array to rand-in.txt:
Time =  105.1875 seconds
Sorting the Array
Time =   99.5938 seconds
Writing Array to rand-sorted-out.txt:
Time =  109.9062 seconds

This is the program as-is from 2019 without modification:

SORT-RESEARCH$ cat derrelSORT.f90
! Derrel Walters © 2019
! These sort routines were written by Derrel Walters ~ 2019-01-23

PROGRAM sort_test
  ! This program demonstrates a linear sort routine
  ! by generating a random array (here integer), writing it
  ! to a file 'rand-in.txt', sorting it with an
  ! implementation of derrelSORT (here for integers -
  ! where the same principles apply for other applicable
  ! datatypes), and finally, printing the sorted array
  ! to a file 'rand-sorted-out.txt'.
  !
  ! To the best understanding of the author, the expert
  ! consensus is that a comparative sort can, at best,
  ! be done with O(nlogn) time complexity. Here a sort
  ! is demonstrated which experimentally runs O(n).
  !
  ! Such time complexity is currently considered impossible
  ! for a sort. Using this sort, extremely large amounts of data can be
  ! sorted on any modern computer using a single processor core -
  ! provided the computer has enough memory to hold the array! For example,
  ! the sorting time for a given array will be on par (perhaps less than)
  ! what it takes the same computer to write the array to a file.
  !
  ! ~ Derrel Walters

  IMPLICIT NONE

  INTEGER,PARAMETER :: in_unit = 21
  INTEGER,PARAMETER :: out_unit = 23

  INTEGER,DIMENSION(:),ALLOCATABLE :: iArrA
  REAL,DIMENSION(:),ALLOCATABLE :: rArrA
  CHARACTER(LEN=15) :: cDims
  CHARACTER(LEN=80) :: ioMsgStr
  INTEGER(KIND=8) :: nDims, i
  INTEGER :: iStat
  REAL :: start, finish

  WRITE(*,*) ''
  WRITE(*,'(A)') 'Derrel Walters © 2019'
  WRITE(*,*) ''
  WRITE(*,'(A)') 'Demonstrating derrelSORT©'
  WRITE(*,'(A)') 'WARNING: This program can produce LARGE files!'
  WRITE(*,*) ''

  CALL GET_COMMAND_ARGUMENT(1, cDims)
  IF (cDims == '') THEN
    nDims = 1000000
  ELSE
    READ(cDims,'(1I15)') nDims
  END IF
  ALLOCATE(iArrA(nDims),rArrA(nDims),STAT=iStat)

  WRITE(*,'(A,1X,1I16)') 'Generating random array of length:', nDims
  CALL CPU_TIME(start)
  CALL RANDOM_NUMBER(rArrA)
  iArrA = INT(rArrA*1000000)
  CALL CPU_TIME(finish)
  WRITE(*,'(A,1X,f9.4,1X,A)') 'Time =',finish-start,'seconds'
  DEALLOCATE(rArrA,STAT=iStat)

  WRITE(*,'(A)') 'Writing Array to rand-in.txt: '
  OPEN(UNIT=in_unit,FILE='rand-in.txt',STATUS='REPLACE',ACTION='WRITE',IOSTAT=iStat,IOMSG=ioMsgStr)
  IF (iStat /= 0) THEN
    WRITE(*,'(A)') ioMsgStr
  ELSE
    CALL CPU_TIME(start)
    DO i=1, nDims
      WRITE(in_unit,*) iArrA(i)
    END DO
    CLOSE(in_unit)
    CALL CPU_TIME(finish)
    WRITE(*,'(A,1X,f9.4,1X,A)') 'Time =',finish-start,'seconds'
  END IF
  WRITE(*,'(A)') 'Sorting the Array'

  CALL CPU_TIME(start)
  CALL iderrelSORT(iArrA, nDims) !! SIZE(iArrA))
  CALL CPU_TIME(finish)
  WRITE(*,'(A,1X,f9.4,1X,A)') 'Time =',finish-start,'seconds'

  WRITE(*,'(A)') 'Writing Array to rand-sorted-out.txt: '
  OPEN(UNIT=out_unit,FILE='rand-sorted-out.txt',STATUS='REPLACE',ACTION='WRITE',IOSTAT=iStat,IOMSG=ioMsgStr)
  IF (iStat /= 0) THEN
    WRITE(*,'(A)') ioMsgStr
  ELSE
    CALL CPU_TIME(start)
    DO i=1, nDims
      WRITE(out_unit,*) iArrA(i)
    END DO
    CLOSE(out_unit)
    CALL CPU_TIME(finish)
    WRITE(*,'(A,1X,f9.4,1X,A)') 'Time =',finish-start,'seconds'
  END IF
  WRITE(*,*) ''

END PROGRAM sort_test

SUBROUTINE iderrelSORT(arrA, nA)
  ! This implementation of derrelSORT is for integers,
  ! but the same principles apply for other datatypes.
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA
  INTEGER,DIMENSION(nA),INTENT(INOUT) :: arrA

  INTEGER,DIMENSION(nA) :: arrB
  INTEGER(KIND=8) :: lowIDX, highIDX, midIDX
  INTEGER ::  iStat
  INTEGER(KIND=8) :: i, j, A, B, C, thisHigh, mergeSize, nLoops
  INTEGER,DIMENSION(:),ALLOCATABLE :: iterMark
  LOGICAL,DIMENSION(:),ALLOCATABLE :: moreToGo

  arrB = arrA
  mergeSize = 2
  lowIDX = 1 - mergeSize
  highIDX = 0

  nLoops = INT(LOG(REAL(nA))/LOG(2.0))
  ALLOCATE(iterMark(nLoops), moreToGo(nLoops), STAT=iStat)
  moreToGo = .FALSE.
  iterMark = 0

  DO i = 1, nLoops
    iterMark(i) = FLOOR(REAL(nA)/2**i)
    IF (MOD(nA, 2**i) > 0) THEN
      moreToGo(i) = .TRUE.
      iterMark(i) = iterMark(i) + 1
    END IF
  END DO

  DO i = 1, nLoops
      DO j = 1, iterMark(i)
        A = 0
        B = 1
        C = 0
        lowIDX = lowIDX + mergeSize
        highIDX = highIDX + mergeSize
        midIDX = (lowIDX + highIDX + 1) / 2
        thisHigh = highIDX
        IF (j == iterMark(i).AND.moreToGo(i)) THEN
          lowIDX = lowIDX - mergeSize
          highIDX = highIDX - mergeSize
          midIDX = (lowIDX + highIDX + 1) / 2
          A = midIDX - lowIDX
          B = 2
          C = nA - 2*highIDX + midIDX - 1
          thisHigh = nA
        END IF
!! The traditional merge can also be used (see subroutine for comment). !!
!                                                                        !
!        CALL imerge(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),   &     !
!                    arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C, &      !
!                    arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)           !
!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
        CALL imerge2(arrA(lowIDX:midIDX-1+A), B*(midIDX-lowIDX),   &
                    arrA(midIDX+A:thisHigh), highIDX-midIDX+1+C,   &
                    arrB(lowIDX:thisHigh), thisHigh-lowIDX+1)
        arrA(lowIDX:thisHigh) = arrB(lowIDX:thisHigh)
      END DO
      mergeSize = 2*mergeSize
      lowIDX = 1 - mergeSize
      highIDX = 0
  END DO

END SUBROUTINE iderrelSORT

SUBROUTINE imerge(arrA, nA, arrB, nB, arrC, nC)
  ! This merge is a traditional merge that places
  ! the lowest element first. The form that the
  ! time complexity takes, O(n), is not affected
  ! by the merge routine - yet this routine
  ! does not run as fast as the merge used in
  ! imerge2.
  !
  ! ~Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC

  INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
  INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
  INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC

  INTEGER(KIND=8) :: i, j, k

  arrC = 0
  i = 1
  j = 1
  k = 1

  DO
    IF (i > nA .OR. j > NB) EXIT
    IF (arrB(j) < arrA(i)) THEN
      arrC(k) = arrB(j)
      j = j + 1
    ELSE
      arrC(k) = arrA(i)
      i = i + 1
    END IF
    k = k + 1
  END DO

  IF (i <= nA) THEN
    DO
      IF (i > nA) EXIT
        arrC(k) = arrA(i)
        i = i + 1
        k = k + 1
    END DO
  ELSEIF (j <= nB) THEN
    DO
      IF (j > nB) EXIT
        arrC(k) = arrB(j)
        j = j + 1
        k = k + 1
    END DO
  END IF

END SUBROUTINE imerge

SUBROUTINE imerge2(arrA, nA, arrB, nB, arrC, nC)
  ! This merge is a faster merge.  Array A arrives
  ! just to the left of Array B, and Array C is
  ! filled from both ends simultaneously - while
  ! still preserving the stability of the sort.
  ! The derrelSORT routine is so fast, that
  ! the merge does not affect the O(n) time
  ! complexity of the sort in practice
  ! (perhaps, making its execution more linear
  ! at small numbers of elements).
  !
  ! ~ Derrel Walters
  IMPLICIT NONE

  INTEGER(KIND=8),INTENT(IN) :: nA, nB , nC

  INTEGER,DIMENSION(nA),INTENT(IN) :: arrA
  INTEGER,DIMENSION(nB),INTENT(IN) :: arrB
  INTEGER,DIMENSION(nC),INTENT(INOUT) :: arrC

  INTEGER(KIND=8) :: i, j, k, x, y, z

  arrC = 0
  i = 1
  j = 1
  k = 1
  x = nA
  y = nB
  z = nC

  DO
    IF (i > x .OR. j > y) EXIT
    IF (arrB(j) < arrA(i)) THEN
      arrC(k) = arrB(j)
      j = j + 1
    ELSE
      arrC(k) = arrA(i)
      i = i + 1
    END IF
    IF (arrA(x) > arrB(y)) THEN
      arrC(z) = arrA(x)
      x = x - 1
    ELSE
      arrC(z) = arrB(y)
      y = y - 1
    END IF
    k = k + 1
    z = z - 1
  END DO

  IF (i <= x) THEN
    DO
      IF (i > x) EXIT
        arrC(k) = arrA(i)
        i = i + 1
        k = k + 1
    END DO
  ELSEIF (j <= y) THEN
    DO
      IF (j > y) EXIT
        arrC(k) = arrB(j)
        j = j + 1
        k = k + 1
    END DO
  END IF
END SUBROUTINE imerge2

MOAR data using the Fortran version. Anyone into straight lines?

SORT-RESEARCH$ for (( i=100000; i<500000000; i=2*i )); do ./derrelSORT-2022.x $i; done | awk 'BEGIN {old_1="Derrel"; print "N      Time(s)"};{if ($1 == "Generating") {printf $NF" "; old_1=$1} else if (old_1 == "Sorting") {print $3; old_1=$1} else {old_1=$1}}'
N      Time(s)
100000 0.0000
200000 0.0312
400000 0.0625
800000 0.1562
1600000 0.2969
3200000 0.6250
6400000 1.3594
12800000 2.7500
25600000 5.5625
51200000 11.8906
102400000 23.3750
204800000 47.3750
409600000 96.4531

Appears linear, doesn’t it? 😉
Fortran sorting times from above plotted.

4

Answers


  1. Your algorithm is not O(n). Your calculated number of loops (nLoops) is log2(n). The number of inner loops (the values in iterMark) will be essentially n/2, n/4, n/8, etc. But the segment sizes really don’t matter because every time through the outer loop you look at every item in the list.

    No matter how you obfuscate it, you’re doing log2(n) passes over n items: O(n log n).

    Your code is a fairly standard merge sort, which is proven to be O(n log n). It’s well proven that the general case for comparison sorts is O(n log n). Sure, some algorithms can sort some specific cases more quickly. Conversely, the same algorithms have pathological cases that will take O(n^2). Other comparison sorts (heap sort, merge sort, for example) are not highly subject to the order of items. But in the general case comparison sorts make on the order of n log n comparisons. See https://www.cs.cmu.edu/~avrim/451f11/lectures/lect0913.pdf for a detailed explanation.

    But don’t take my word for it. You can easily test yourself by doing some simple timings. Time how long it takes to sort, say, 100K items. If your algorithm is indeed O(n), then it should take approximately twice as long to sort 200K items, and ten times as long to sort 1 million items. But if it’s O(n log n), as I suspect, then the timings will be somewhat longer.

    Consider: log(2) of 100K is 16.61. log(2) of 200K is 17.61. So sorting 100K items (if the algorithm is O(log n)) will take time proportional to 100K * 16.61. Sorting 200K items will take time proportional to 200K * 17.71. Doing the arithmetic:

    100K * 16.61 = 1,661,000
    200K * 17.61 = 3,522,000
    

    So 200K items will take approximately 2.12 times (3,522,000/1,661,000) as long. Or, about 10% longer than if the algorithm is linear.

    If you’re still unsure, pump it up to a million items. If the algorithm is linear, then a million items will take 10x the time that 100K items took. If it’s O(n log n), then it will take 12 times as long.

    1M * 19.93 = 19,930,000
    (19,930,000 / 1,661,000) = 11.9987 (call it 12)
    
    Login or Signup to reply.
  2. My f2py skills are not strong, so I wrote a pure fortran wrapper for your code (posted below, if you want to check it), and the timings I got were:

     n                     time (s)          0.1*n/1e6       0.1*n*log(n)/1e6*log(1e6)
                  1000000  0.109375000      0.100000001      0.100000001
                  2000000  0.203125000      0.200000003      0.210034326
                  4000000  0.453125000      0.400000006      0.440137327
                  8000000  0.937500000      0.800000012      0.920411944
                 16000000   1.92187500       1.60000002       1.92109859
                 32000000   4.01562500       3.20000005       4.00274658
                 64000000   8.26562500       6.40000010       8.32659149
                128000000   17.0468750       12.8000002       17.2953815
                256000000   35.1406250       25.6000004       35.8751564
    

    This is… not looking good for your O(n) theory, I’m afraid.

    My wrapper:

    module m
    contains
    ! Your code goes here
    end module
    
    program p
      use m
      implicit none
    
      integer(8) :: i,n
      real, allocatable :: real_array(:)
      integer, allocatable :: int_array(:)
      real :: start
      real :: stop
    
      real_array = [0]
      int_array = [0]
    
      write(*,*) "n                     time (s)          0.1*n/1e6       0.1*n*log(n)/1e6*log(1e6)"
    
      do i=0,30
        n = 2**i*1e6
        deallocate(real_array, int_array)
        allocate(real_array(n), int_array(n))
        call random_number(real_array)
        int_array = -huge(0)*real_array + 2.0*huge(0)
    
        call cpu_time(start)
        call isort(int_array, n)
        call cpu_time(stop)
    
        write(*,*) n, stop-start, 0.1*n/1.0e6, 0.1*n*log(1.0*n)/(1.0e6*log(1.0e6))
      enddo
    end program
    
    Login or Signup to reply.
  3. I don’t doubt that your sort is fast, and I can believe that it compares favorably with the sort command-line utility. But it is an O(N log(N)) iterative merge sort, not an O(N) sort (nor a novel algorithm).

    Observe,

    • Your outer loop iterates O(log(N)) times.
    • On each of those iterations, the inner loop iterates O(N / 2k) times for some k.
    • And the main work of each inner-loop iteration is to split O(2k) items into two halves and merge those together, which involves examining and moving every single item. (And then moving them all again back to the original array.) That costs O(2k) operations per inner-loop iteration.

    Those factors all multiply together:

    O(log(N)) * O(N / 2k) * O(2k)

    The factors of 2k cancel each other, and you’re left with O(N log(N)). (The ks are functions of N, so they cannot simply be ignored as constants.)

    The logarithm grows very slowly, so if you don’t look too closely then it is easy to be fooled into thinking you see linear growth when it’s really N log(N). You need to look at wide ranges of values to see the superlinearity, and some is indeed visible in your data.

    As for your plot, there’s a problem with the result of your curve fitting: the y intercept is significantly negative (for the scale of the data, and especially for the concentration of points with small y). Your data may fit a linear model well (if not entirely sensically), but they sure appear to fit an N log(N) model better.

    Login or Signup to reply.
  4. The other answers have explained why you do not have a linear comparison sort.

    I’ll try to explain why execution times will never prove a time complexity.

    Many times you could come up with some specific cases and an algorithm that uses various CPU-specific optimizations that does its job (whether that job is sorting or something else) better than O(n) according to a plot: if the time for x items is y, then according to the graph the time for 2x items is less than 2y. And this can happen for as large of an x as you can fit into memory.

    Still, this proves nothing about time complexity. This could be an algorithm with time complexity O(n), or O(n log n) or maybe even O(log n) or O(n*n).

    Big-Oh notation hides the various constants that describe the number of operations performed by the algorithm, so such an algorithm could just be O(n log n) with a very small constant (as in, a constant < 1) or O(log n) with a huge constant.

    Big-Oh also does not care about real-life aspects such as system memory or disk space or how fast some CPU executes one instruction as opposed to another. Maybe the operations you use just execute really fast on that CPU. Regardless, if you have an O(n log n) algorithm, for large enough n, you would eventually see the graph look like an n log n graph.

    A real example of this could be the Disjoint set data structure, which uses something called an iterated logarithm and its complexity is O(m log* n). In practice, log* n will be something <= 5 for all practical values, so if you plot it for practical values, you might think it’s O(m) with a big constant, but that’s not the case.

    You could change your algorithm to read each number from a different file and write it back to that file, at each step, and remove your input array completely. It wouldn’t affect its time complexity, but it would definitely affect the nice execution time measurements you’re seeing, because storage is obviously slower than memory. Well, they’re all the same to Big-Oh.

    Login or Signup to reply.
Please signup or login to give your own answer.
Back To Top
Search