Posted at 2007-05-17 14.34
I asked an interviewee how long it takes for a disk seek, and he replied that he thought it was between 4 and 8 milliseconds. Speaking with colleagues, the conventional wisdom was that it was around 10 ms. I was unsatisfied so I thought I would try it for myself.
time sudo perl -we '$disk="/dev/sda"; $n=1500; $blocks=`blockdev --getsz $disk`; if (!$blocks) {print "Enter capacity in manufacturer GB\n: "; $blocks=1953125*(<>)}; use Time::HiRes "time"; open DISK, $disk; $start=time; for (0..$n) {seek DISK, int(rand($blocks))*512, 0; sysread DISK, $x, 512 || die; $now=time; $times[int(($now-$start)*1000)]++; $x=$now-$start; $s2+=$x**2; $s+=$x; $start=$now}; for (0..$#times) {if ($t=$times[$_]) { $tot+=$t; $median||=$_ if $tot>=$n/2; printf "%3d %s\n", $_, "x" x ($t/2) . ($t%2?":":"")}}; printf "\nTook %3.4gs for %d seeks of %s (%d GB)\n", $s, $n, $disk, $blocks/2097152; printf "Mean: %2.03gms; Median: %d-%dms; Std dev: %2.03gms\n", 1000*$s/$n, $median-1, $median, 1000*sqrt($s2/$n - ($s/$n)**2);'
It should be obvious that before you run this, you should check for yourself that it doesn't do anything dangerous. Or at least check that $disk is set appropriately for your hardware and operating system. If you don't have blockdev, estimate the number of 512-byte blocks and set $blocks to that value. (There are 1953125 blocks in a hard disk manufacturer’s "Gigabyte".)
The graph it produces prints an 'x' for two seeks of a given number of milliseconds and a trailing ':' if there was one left over. For me, on my one year old Linux 2.6.18 workstation with a 160 "GB" Western Digital (WDC WD1600JS — quoted seek time 8.9 ms) the typical output is:
0 : 3 : 4 : 5 x: 6 xxxx: 7 xxxxxxx 8 xxxxxxxxxxx: 9 xxxxxxxxxxxxxxxxx 10 xxxxxxxxxxxxxxxxxxxxxxx 11 xxxxxxxxxxxxxxxxxxxxxxxxxxxx: 12 xxxxxxxxxxxxxxxxxxxxxxxxx 13 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: 14 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 15 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: 16 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 17 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: 18 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: 19 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 20 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 21 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 22 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: 23 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: 24 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxx: 25 xxxxxxxxxxxxxxxxxxxxxxxxxxxxx: 26 xxxxxxxxxxxxxxxxxx 27 xxxxxxxxxxxxxxx 28 xxxxxx: 29 xxxxx: 30 xxxx 31 : 32 : 34 x 35 : 57 : Took 27.69s for 1500 seeks of /dev/sda (149 GB) Mean: 18.5ms; Median: 17-18ms; Std dev: 5.24ms real 0m27.859s user 0m0.116s sys 0m0.044s
Why might I care? 18 milliseconds is almost a lifetime compared with anything else a modern PC does. It’s slower than my monitors' refresh period! I can read around 1 MB from disk, 10 MB over a GigE link or 20 MB from RAM in this time. I can ping from Ireland to England, crossing 40 routers there and back, in the time it takes for my disk head to seek.
Every time you read something from a previously unread file, it costs on average EIGHTEEN MILLISECONDS even before it starts reading. That’s just 55 in a second. This has obvious implications for I/O program performance, i.e. that of most servers.
If anyone has a concern about my method, I'd be interested to hear it.
I'd also like to see the timings for different disks. Either post the whole histogram or just the stats, plus the make and model of the disk. On Linux you can get this from dmesg or hdparm -I device.
Update: I've made the script more robust under non-existence of blockdev.