Tuesday, 15 November 2011

External merge sort

One archetype of alien allocation is the alien absorb arrangement algorithm, which sorts chunks that anniversary fit in RAM, again merges the sorted chunks together.[1][2] For example, for allocation 900 megabytes of abstracts application alone 100 megabytes of RAM:

Apprehend 100 MB of the abstracts in capital anamnesis and arrangement by some accepted method, like quicksort.

Address the sorted abstracts to disk.

Repeat accomplish 1 and 2 until all of the abstracts is in sorted 100 MB chunks (there are 900MB / 100MB = 9 chunks), which now charge to be alloyed into one distinct achievement file.

Apprehend the aboriginal 10 MB (= 100MB / (9 chunks + 1)) of anniversary sorted block into ascribe buffers in capital anamnesis and admeasure the actual 10 MB for an achievement buffer. (In practice, it ability accommodate added good achievement to accomplish the achievement absorber beyond and the ascribe buffers hardly smaller.)

Perform a 9-way absorb and abundance the aftereffect in the achievement buffer. If the achievement absorber is full, address it to the final sorted file, and abandoned it. If any of the 9 ascribe buffers gets empty, ample it with the abutting 10 MB of its associated 100 MB sorted block until no added abstracts from the block is available. This is the key footfall that makes alien absorb arrangement assignment evidently -- because the absorb algorithm alone makes one canyon sequentially through anniversary of the chunks, anniversary block does not accept to be loaded completely; rather, consecutive genitalia of the block can be loaded as needed.

Additional passes

That archetype shows a two-pass sort: a arrangement canyon followed by a absorb pass. Note that we had one absorb canyon that alloyed all the chunks at once, rather than in approved absorb sort, area we absorb two chunks at anniversary step, and booty log n absorb passes total. The acumen for this is that every absorb canyon requires account and autograph every bulk in the arrangement from and to deejay once. Deejay admission is usually slow, and so reads and writes should be abhorred as abundant as possible.

However, there is a accommodation with application beneath absorb passes. As the cardinal of chunks increases, the bulk of abstracts we can apprehend from anniversary block at a time during the absorb action decreases. For sorting, say, 50 GB in 100 MB of RAM, application a distinct absorb canyon isn't efficient: the deejay seeks appropriate to ample the ascribe buffers with abstracts from anniversary of the 500 chunks (we apprehend 100MB / 501 ~ 200KB from anniversary block at one time) booty up best of the arrangement time. Application two absorb passes solves the problem. Again the allocation action ability attending like this:

Run the antecedent chunk-sorting canyon as before.

Run a aboriginal absorb canyon accumulation 25 chunks at a time, consistent in 20 beyond sorted chunks.

Run a additional absorb canyon to absorb the 20 beyond sorted chunks.

Like in-memory sorts, able alien sorts crave O(n log n) time: exponential increases in abstracts admeasurement crave beeline increases in the cardinal of passes. If one makes advanced use of the gigabytes of RAM provided by avant-garde computers, the logarithmic agency grows actual slowly: beneath reasonable assumptions, one could arrangement at atomic 500 GB of abstracts application 1 GB of capital anamnesis afore a third canyon became advantageous, and could arrangement abounding times that afore a fourth canyon became useful.[3]

Tuning performance

The Arrangement Benchmark, created by computer scientist Jim Gray, compares alien allocation algorithms implemented application cautiously acquainted accouterments and software. Winning implementations use several techniques:

Application parallelism

Assorted deejay drives can be acclimated in alongside in adjustment to advance consecutive apprehend and address speed. This can be a actual cost-efficient improvement: a contempo Arrangement Benchmark champ in the cost-centric Penny Arrangement class uses six adamantine drives in an contrarily midrange machine.[4]

Allocationcomputer application can use assorted threads, to acceleration up the action on avant-garde multicore computers.

Computer application can use asynchronous I/O so that one run of abstracts can be sorted or alloyed while added runs are actuality apprehend from or accounting to disk.

Assorted machines affiliated by fast arrangement links can anniversary arrangement allotment of a huge dataset in parallel.[5]

Increasing accouterments speed

Application added RAM for allocation can abate the cardinal of deejay seeks and abstain the charge for added passes.

Fast alien memory, like 15K RPM disks or solid-state drives, can acceleration sorts (but adds abundant costs proportional to the abstracts size).

Abounding altered factors can affect hardware's best allocation speed: CPU acceleration and cardinal of cores, RAM admission latency, input/output bandwidth, deejay read/write speed, deejay seek time, and others.

Cost-efficiency as able-bodied as complete acceleration can be critical, abnormally in array environments area lower bulge costs acquiesce purchasing added nodes.

Increasingcomputer application speed

Some Arrangement Benchmark entrants use a aberration on basis arrangement for the aboriginal appearance of sorting: they abstracted abstracts into one of abounding "bins" based on the alpha of its value. Arrangement Benchmark abstracts is accidental and abnormally adapted to this optimization.

Compacting the input, average files, and achievement can abate time spent on I/O, but is not accustomed in the Arrangement Benchmark.

Because the Arrangement Benchmark sorts continued (100-byte) annal application abbreviate (10-byte) keys, allocationcomputer application sometimes rearranges the keys alone from the ethics to abate anamnesis I/O volume.

No comments:

Post a Comment