Comparison of different operating systems on a multi-core computer running Linrad.
(Aug 25 2009)
Hardware is developing rapidly and the current trend is towards systems with many processor cores. Linrad is multithreaded since year 2006. Originally it could not take much advantage of more than one CPU core, but it has gradually improved over time and today, (year 2009) Linrad will benefit from having access to three CPU cores.

Operating systems with support for SMP, symmetric multiprocessing, have been around for quite a while. Table 1 shows the timing for Linrad when processing a file from the hard disk. The file has a sampling rate of 2 MHz and was recorded as a wav file using Perseus.exe. The Linrad parameters were deliberately selected to cause a high cpu load on several threads. It is obvious from the table that different operating systems behave differently. The tests were performed with Linrad-03.07 using these parameters: xeontest.zip (2981 bytes) (Saved here mainly for my own reference.)


OS Bit Tot Rxfi RxDA Scre Wdsp Ndsp fft2 kernel Debian squeeze X11 32 23.90 0.77 0.10 6.85 91.83 26.70 64.75 2.6.30 Debian squeeze X11 SHM 32 23.68 0.79 0.10 5.33 91.72 26.67 64.74 2.6.30 Debian squeeze svgalib 32 24.64 0.86 0.12 13.57 91.48 26.69 64.26 2.6.30 Fedora-11 X11 64 22.76 0.73 0.14 11.24 87.70 27.00 55.09 2.6.29 Fedora-11 X11 SHM 64 22.21 0.70 0.14 6.95 87.61 27.00 55.20 2.6.29 Ubuntu 9.04 svgalib 32 25.02 0.92 0.10 17.10 89.62 27.13 65.41 2.6.28 Ubuntu 9.04 X11 32 24.62 0.92 0.12 9.38 92.16 27.55 66.66 2.6.28 Ubuntu 9.04 X11 SHM 32 24.31 0.81 0.11 7.17 92.06 27.51 66.70 2.6.28 Ubuntu 9.04 X11 64 24.53 0.72 0.11 7.81 92.30 27.47 67.48 2.6.28 Ubuntu 9.04 X11 SHM 64 24.38 0.91 0.12 6.67 92.18 27.51 67.33 2.6.28 Mandriva2009.0 X11 32 24.59 0.76 0.11 7.06 93.72 27.15 67.76 2.6.27 Mandriva2009.0 X11 SHM 32 24.26 0.80 0.10 5.36 92.71 27.27 67.59 2.6.27 Mandriva2009.0 svgalib 32 24.78 1.03 0.08 13.78 90.69 26.73 65.95 2.6.27 openSUSE11.1 X11 64 23.81 0.69 0.10 6.13 91.86 26.70 64.54 2.6.27 openSUSE11.1 X11 SHM 64 23.79 0.77 0.09 5.14 92.32 26.79 64.87 2.6.27 Debian lenny svgalib 32 25 51 1 74 87 79 77 2.6.26 Debian lenny X11 32 24 35 1 16 89 76 78 2.6.26 Debian lenny X11 SHM 32 not OK 2.6.26 Ubuntu 6.10 X11 32 not OK 2.6.17 Ubuntu 6.10 X11 SHM 32 not OK 2.6.17 Ubuntu 6.10 svgalib 32 not OK 2.6.17 Windows XP pro 32 26.54 0.00 0.15 5.36 98.22 51.39 57.30 Windows XP pro 64 25.96 0.01 0.24 5.13 91.94 50.98 59.31 Windows Vista 32 26.71 0.02 0.10 12.39 91.47 50.77 58.95 Windows Vista 64 26.25 0.06 0.12 7.86 90.99 51.51 59.40 Windows 7 32 26.10 0.06 0.10 7.33 90.66 51.33 59.18 Windows 7 64 21.40 0.00 0.15 1.28 71.81 53.06 44.69 Windows 7 64 24.20 0.10 0.07 5.34 94.89 52.72 40.35
Table 1. CPU load in percent running Linrad with the same parameters on a 8-core Xeon E5410 system under various operating systems. Tot is the total load. About 75% of the CPUs are not used, most of the time most of the cpu cores are idle. Some of the threads have a high CPU load. It means that they run nearly always and that just a little more processing would cause malfunction.

Comments to table 1:

Ubuntu 6.0: The thread doing fft1 transforms runs at 100% and does not have the time to fully do what it should. There are underrun errors and short interruptions in the loudspeaker output. Presumably the sceduler

Debian lenny: The thread reading the hard disk sometimes runs with very low CPU load, but sometimes it reports a very high CPU load. (0.5% or 50%) The reported timings are averages over about 10 minutes. The timings for individual threads are probably incorrect with the 2.6.26 kernels of Debian. MIT-SHM does not work because the X11 server does not report keyboard and mouse events to the event handler until after a long delay (seconds) This seems to be an X11 bug because there are plenty of idle CPUs and none of the CPUs is near 100% load.

Windows 7, 64 bit: This operating system runs Linrad in several distinctly different modes. The mode seems to be selected at random at the moment Linrad opens the processing threads. Once Linad has started, the threads have a fixed CPU load. It seems the scheduler can adopt different strategies that make good or poor use of the cache in the processors. Figures 1 and 2 show the Linrad screen with the different thread loads as well as the system monitor that displays how work is distributed between the processors.



Fig 1. Sometimes all of the work is done by three of the CPU cores. Then the speed becomes good because data is likely to be in the cache memory pretty often.



Fig 2. Sometimes the work is distributed over more CPU cores. Then the speed becomes slow because data is not likely to be in the cache so often.



The main reason for doing the tests presented on this page was not to check the effectiveness of the schedulers in the different operating systems but to look for possible errors in the Linrad code that might become visible.

The ratio fastest/slowest for the different threads are like this:

File input Rxfi 0.0
Soundcard output RxDA 0.29
Screen Scre 0.07
Wideband dsp (fft1) Wdsp 0.75
Narrowband dsp (fft3, FIR) Ndsp 0.50
Second FFT fft2 0.59

To what extent the fairly large differences are only due to cache usage or whether there are also differences in program flow remains to be investigated. The testing reported on this page did not reveal any obvious malfunction of Linrad-03.07.


To SM 5 BSZ Main Page