How to sort 10 billion numbers





 Approach 1 :-  (External Sorting)
Here k is the size of maximum elements which  can be sorted internally.
Since the size of numbers is huge, we can't make use of the primary memory to serve the purpose.
1) Divide the numbers in group of k.
2) Sort each group independently in O(klogk). ( using internal sort - Quick Sort )
    Merge sort all the groups.
3) Create a min heap of size=10 billion/k.
4) Update the min heap with the minimum from each group.
5) Extract the minimum from the heap and write to a file.
6) Get the next number from the group from which extracted number belonged.
7) Update the min heap.
8) Go to step 5.



Approach 2:- Using Doubly Ended Priority Queue (DEPQ)

1) Read in as many elements as will fit into DEPQ.
   The elements in DEPQ will eventually be the middle group of elements.
2) Proces the remaining elements one at a time.
   If the next element <= smallest element in the DEPQ, output this element as a part of the left group.
   Else if the next element is greater  than the largest element in DEPQ, output it as a part of right group.
3) Otherwise, remove  either the min or max element from the DEPQ.
4) Output the elements in DEPQ in sorted order.
5) Sort the left and right groups recursively.

In approach 2, we are simulating the quick sort, In quick sort , the array is broken into  L,M,R.
where M is just a single pivot element. But in approach 2 , the whole DEPQ is acting as a pivot.


 package com.kartik.org;

import java.util.Random;

public class SortOneMilionData {

 // Main function to test performance sorting 1 million integers.
    // Results in about 220 ms on a 2.3 Ghz Core i5 processor w/4GB 1333 Mhz RAM
    public static void main(String[] args){
        final int SIZE = 1000000;

        Random r = new Random();
        int[] test = new int[SIZE];

        for (int i = 0; i < SIZE; i++){
            test[i] = r.nextInt(Integer.MAX_VALUE);
        }

        long start = System.currentTimeMillis();
        test = sort(test);
        long end = System.currentTimeMillis();

        for (Integer i : test){
            System.out.println(i);
        }

        System.out.println(end-start);
    }

    // Sort the numbers beginning with least-significant digit
    public static int[] sort(int[] input){

        // Largest place for a 32-bit int is the 1 billion's place
        for(int place=1; place <= 1000000000; place *= 10){
            // Use counting sort at each digit's place
            input = countingSort(input, place);
        }

        return input;
    }

    private static int[] countingSort(int[] input, int place){
        int[] out = new int[input.length];

        int[] count = new int[10];

        for(int i=0; i < input.length; i++){
            int digit = getDigit(input[i], place);
            count[digit] += 1;
        }

        for(int i=1; i < count.length; i++){
            count[i] += count[i-1];
        }

        for(int i = input.length-1; i >= 0; i--){
            int digit = getDigit(input[i], place);

            out[count[digit]-1] = input[i];
            count[digit]--;
        }

        return out;

    }

    private static int getDigit(int value, int digitPlace){
        return ((value/digitPlace ) % 10);
    }

}


       


If you find this post helpful, I would really appreciate if you can share it with your friends. Also you can check more questions and analysis here.








Previous
Next Post »