RP66V1 Performance

This describes the performance of processing RP66V1 binary files by TotalDepth.

Note

This data refers to version 0.3.0 and may or may not be relavent to the current version, 0.4.0.

Scanning RP66V1 to produce an HTML Summary

The tast set was around 100+ files that ranged in size from 80kb to 4GB totalling 17GB. The average file size was about 150Mb. These tests were run on a 2.7 GHz Intel Core i7 machine with 4 cores and hyper-threading.

An archive of RP66V1 can be scanned to provide a summary in HTML by TotalDepth.RP66V1.ScanHTML This produces and HTML page with every EFLR and a summary of all the log frame data. This is exposed as a command line tool tdrp66v1scanhtml.

By default this processes every byte of the file so can take a long time with large files. Here is the execution time for processing all frames by RP66V1 file size and the size of each HTML file produced.

../_images/RP66V1_ScanHTML_time_size.svg.png

The asymptotic processing rate is around 800 ms/Mb.

And here is the memory and CPU usage:

../_images/RP66V1_ScanHTML_process.svg.png

Performance Improvements

With very large sets of frame data not every frame needs to be processed. The option --frame-slice can be used to sample a subset of frames. For example:

  • --frame-slice=1024,2048,64 will process every 64th frame from frame 1024 to 2048
  • --frame-slice=64 will process only 64 frames of those available (roughly evenly spaced from those available).

Multiprocessing will help proportionally. In one test case the processing time for the archive fell from 11,000 seconds to 800 seconds using --frame-slice=,,64 and --jobs=4 on a four core machine. The performance improvement of around x15 was attributed to x5 (frame slicing) and x3 (multiprocessing).

Converting RP66V1 to LAS

The test set was around 52 files that ranged in size from 400kb to 2.2GB totalling 12GB in all. The average file size was about 235Mb. These tests were run on a 2.7 GHz Intel Core i7 machine with 4 cores and hyper-threading.

An Example File

An example file in the tests archive is 2GB in size and contains this Log Pass, the size of the numpy frame to hold all the data is also shown:

Frame Array Channels Frames Spacing Size of Numpy array (bytes)
1B (O: 35 C: 0) 12 653880 0.1 inch 31,386,240
2B (O: 35 C: 0) 6 326940 0.2 inch 7,846,560
10B (O: 35 C: 0) 19 65388 1 inch 10,200,528
15B (O: 35 C: 0) 14 43592 1.5 inch 2,441,152
20B (O: 35 C: 0) 83 32694 2 inch 10,462,080
60B (O: 35 C: 0) 204 10899 6 inch 2,294,980,632
120B (O: 35 C: 0) 6 5449 12 inch 11,028,776

The 60B Frame Array contains some complex waveform data. Processing this produces seven LAS files, each LAS file contains all the parameter data and data from a single Frame Array. Depending on the frame slice the following processing times and LAS sizes were observed.

Frame Slice Time (s) LAS Size (bytes) ms/Mb
1 2922 312,604,951 1282
4 855 100,570,379 375
16 303 47,562,819 133.4
64 188 34,309,331 82.8
256 156 30,995,584 68.6
512 152 30,444,007 66.9
1024 148 30,166,341 65.0

And or course limiting the number of channels has a proportionately similar effect.

Processing the Test Archive to LAS

Here is the time taken to process each file plotted against the RP66V1 file size. Also plotted on the right scale is the total size of the LAS file(s). This is converting every frame to LAS (click to see the original):

../_images/tdrp66v1tolas_time_size_1.svg.png

The asymptotic processing rate is around 1230 ms/Mb. LAS files (in this test set) below 10Mb tend to be larger than the original, above 100Mb they tend to be around 20% smaller.

And this is the memory usage (click to see the original):

../_images/tdrp66v1tolas_process_1.svg.png

The peaks are caused by the multi-gigabyte Numpy arrays needed for some files.

Sub-sampling

Here is the performance of the same data with --frame-slice=64 that just writes every 64th frame to LAS (click to see the original):

../_images/tdrp66v1tolas_time_size_64.svg.png

And this is the memory usage (click to see the original):

../_images/tdrp66v1tolas_process_64.svg.png

This is vastly reduced from the every frame case.

Frame slicing can improve the performance dramatically. Here is the time to process the test archive and the size of the finished LAS archive by slice, for example 64 on the X-axis is write only every 64th frame:

../_images/tdrp66v1tolas_time_size_by_frame.svg.png

Multi-Processing

Using --jobs can also improve performance proportionally if you have lots of cores and good I/O.