Chapter 3: Find Smallest K Numbers

Problem Description

Given n integers, output the smallest k numbers.

Analysis and Solutions

Solution One

To get the smallest k numbers, the simplest way is, we sort the array, then output the first k numbers.

As for the sorting algorithms,I guess quick-sort is a good choice ( as we known, the average time complexity of Quick-Sort is n*logn), then we just output the first k numbers. So the total time complexity is: O(n * log n) + O(k)=O(n * log n).

Solution Two

The problem does not require the smallest k numbers to be sorted, neither are the last n-k numbers. So we do not need to sort all the numbers. We can use Selection-Sort or Swap-Sort:

  1. Traverse the n numbers, put the first k numbers into a new array, assume they are the smallest k numbers;
  2. For these k numbers, use Selection-Sort or Swap-Sort to get the biggest element kmax (need to traverse these k numbers, the time complexity is O(k));
  3. Traverse the rest n-k numbers. Suppose each time the new element is x, compare x with kmax: if x < kmax, replace kmax with x, then jump to step 2 and find the biggest element kmax; if x >= kmax, just continue traversing.

In each traverse, the time complexity is O(k) if we update the array, O(1) if we do not. So totally the time complexity is n*O(k) = O(n*k).

Solution Three

A better way is maintaining a max-heap with capacity k, it is similiar with the solution two:

  1. Store the first k numbers in a max-heap with capacity k, assume they are the smallest k numbers;
  2. Sort the heap, make k1<k2<...<kmax (kmax is the biggest number in the heap);
  3. Traverse the rest n-k numbers. Suppose each time the new element is x, compare x with kmax: if x < kmax, replace kmax with x and reheapify the heap (O(logk)); Otherwise do nothing.

In this case the time complexity is: O(k+(n-k)*logk)=O(n*logk). This is because in the heap we just need O(logk) to operate a search or reheapify (In solution two, to find the biggest number in an array we need O(k)).

Solution Four

The book \<Date Structures and Algorithm Analysis in C>, in Chapter7, Section7.7.6, introduced a Quick-Selection algorithm. In average case, its time complexity is O(n):

  • Suppose the number set is S, in S we choose a number v as the pivot. Divide S-{v} into S1 and S2, like the Quick-Sort.
  • If k <= |S1|, the k-th smallest number is in S1. In this case, return QuickSelect(S1, k).
  • If k = 1 + |S1|, the pivot is the k-th smallest number, return it.
  • Otherwise, the k-th smallest number is in S2. It is the (k - |S1| - 1) smallest number in S2, so we return QuickSelect(S2, k - |S1| - 1).

The time complexity is O(n).

You can find the code below:

//Quick-Select, put the k-th smallest number in a[k-1]  
void QuickSelect( int a[], int k, int left, int right )
{
    int i, j;
    int pivot;

    if( left + cutoff <= right )
    {
        pivot = median3( a, left, right );
        //set the median as the pivot, this can avoid the wort case, to some extent
        i = left; j = right - 1;
        for( ; ; )
        {
            while( a[ ++i ] < pivot ){ }
            while( a[ --j ] > pivot ){ }
            if( i < j )
                swap( &a[ i ], &a[ j ] );
            else
                break;
        }
        //reset the pivot
        swap( &a[ i ], &a[ right - 1 ] );  

        if( k <= i )
            QuickSelect( a, k, left, i - 1 );
        else if( k > i + 1 )
            QuickSelect( a, k, i + 1, right );
    }
    else  
        InsertSort( a + left, right - left + 1 );
}

This Quick-Select algorithm is kind of like the partition method in Quick-Sort. Store the N numbers in array S, set the 'median of median' as the pivot. Then divide array into Sa and Sb, Sa<=X<=Sb. If there are more than k elements in Sa, return the smallest k numbers in Sa, otherwise return all the elements in Sa and the smallest k-|Sa| numbers in Sb. In average case the time complexity is O(n).

Furthermore, the book \<Introduction to Algorithms>, in Chapter9 Section9.3, introduced a Select Algorithm whose time complexity is O(n) in the worst case. I recommend you to have a look.

Think Deeper

1.Google interview: input two arrays consisting of integers, sum each two of them we can get a new array, how to find the k-th smallest numbers in this new array?

Analysis:

 ‘Suppose the two arrays are A and B, they all have N numbers, sum each two of them we can get a new array C, which has N^2 elements.
  Then consider these as N sorted arrays:
          A[1]+B[1] <= A[1]+B[2] <= A[1]+B[3] <=…
          A[2]+B[1] <= A[2]+B[2] <= A[2]+B[3] <=…
          …
         A[N]+B[1] <= A[N]+B[2] <= A[N]+B[3] <=…
  The problem turns into finding the smallest k numbers in these N^2 sorted array’

2.Suppose there are two increasing arrays A and B, A=(a1,a2,...,ak), B=(b1,b2,...,bk). For 1<=i, j<=k, find the smallest k (ai+bj). The algorithm is required to be efficient.

3.Given an array a1,a2,a3,...,an and m triples, each triple represents a query. For each query (i, j, k), output the k-th smallest number in ai,ai+1,...,aj.