.NH An Experiment .PP In order to substantiate our claim that the suggested optimizations provided substantial increase in speed, we implemented four versions of the faster DES algorithm, and four versions of the faster .UX password algorithm: .de Ip .IP \(bu \w'\(bu\0'u .. .RS .Ip one without #G# and with a 6 bit data path to #F sub 2# .Ip one with #G# and with a 6 bit data path to #F sub 2# .Ip one without #G# and with a 12 bit data path to #F sub 2# .Ip one with #G# and with a 12 bit data path to #F sub 2# .RE .LP The routines were timed on several different architectures, and the results collected below. As a control, the standard .UX implementation, which stores one bit per storage unit (byte) and uses a straightforward translation of the algorithm into C, was also timed. .LP .KS .TS center, doublebox; c s s s c c c c c c c c l n n l. Table I. The Computers = computer timer intervals bits manufacturer \^ (per sec) (per word) \^ _ Amdahl 5880 60 32 Amdahl Corp. Convex 1 (32) 100 32 Convex Computer Corp. Convex 1 (64) 100 64 Convex Computer Corp. Cray 2 243902439 64 Cray Research Inc. IBM PC/RT 100 32 IBM Corp. IRIS 2500T 60 32 Silicon Graphics Inc. Sequent 21000 100 32 Sequent Computer Systems Sun 3/50 60 32 Sun Microsystems, Inc. VAX 11/780 100 32 Digital Equipment Corp. VAX 11/785 100 32 Digital Equipment Corp. .TE .KE .PP Table I summarizes the machines and their relevant characteristics. (More detailed information is in appendix IV.) In particular, note the clock rates; with all except the Cray 2, the routines were expected to execute much faster than the clock tick. The usual way to make timings is to start the clock, run the routine, stop the clock, and determine the elapsed time. However, since this would be on the order of one tick, the results gleaned this way would be highly suspect. So, what was done instead was to start the clock and execute a loop for approximately 10 seconds (virtual time.) A counter in this loop was incremented every time the routine completed execution. When the time was up, the next return ended the loop, and the clock was stopped and the elapsed time computed. Another loop, just like the first but without the call to the routine, was executed for the same number of iterations; the time to complete this loop was then subtracted from the elapsed time. This way, the total time was subject to an error of at most two clock ticks. This procedure was repeated ten times, and the times and ratios computed by averaging over the ten results. (The tables in Appendix V give the actual timings.) .PP We should also note that the timings are not meaningful when comparing among many machines, because the loads were very different. On the assumption that none of the loads changed dramatically during the testing (an assumption that in fact held), the ratios express the amount of speedup quite accurately; this is why we use them in this section, rather than the times. .LP .KS .TS center, doublebox; c s s s s s c s s s s s c | c s | c s | c c | c | c | c | c | c c | c | c | c | c | c l n n n n n. Table II. Ratio of Mean Execution Times: DES Encryption Routines \*(Ux Interface = computer 6 bit path 12 bit path standard \^ _ _ _ _ \^ without \f2G\fP with \f2G\fP without \f2G\fP with \f2G\fP function _ Amdahl 5880 7.18 7.19 7.97 8.00 1.00 Convex 1 (32) 4.29 4.26 5.09 5.04 1.00 Convex 1 (64) 4.85 4.84 5.62 5.60 1.00 Cray 2 9.50 9.51 10.28 10.30 1.00 IBM PC/RT 7.88 7.89 8.67 8.74 1.00 IRIS 2500T 8.37 8.32 10.98 10.95 1.00 Sequent 21000 8.89 8.67 9.74 9.77 1.00 Sun 3/50 7.33 7.21 9.45 9.29 1.00 VAX 11/780 6.12 5.88 6.15 6.00 1.00 VAX 11/785 6.78 6.82 7.56 7.77 1.00 .TE .KE .PP Table II gives the ratio of the mean execution time of the DES encryption routines to the .UX standard DES encryption routines. The interface is the same; the standard .UX routines .I setkey and .I encrypt (see .I crypt (3) in [.bsd reference manual.] or .I crypt (3C) in [.system five reference manual.]) can be replaced by these simply by naming a library to the linking loader. In all cases, there is a substantial speedup, from a factor of 4.3 for the Convex running with 32 bits per word and using the 6 bit path version not using #G#, to 11.0 on the IRIS 2500T running the 12 bit path version. .LP .KS .TS center, doublebox; c s s s s s c | c s | c s | c c | c | c | c | c | c c | c | c | c | c | c l n n n n n. Table III. Ratio of Mean Execution Times: DES Encryption Routines = computer 6 bit path 12 bit path standard \^ _ _ _ _ \^ without \f2G\fP with \f2G\fP without \f2G\fP with \f2G\fP function _ Amdahl 5880 11.99 12.04 14.46 14.80 1.00 Convex 1 (32) 5.85 5.87 7.52 7.31 1.00 Convex 1 (64) 6.99 7.00 8.64 8.59 1.00 Cray 2 18.51 18.55 21.60 21.66 1.00 IBM PC/RT 13.14 13.24 15.39 15.69 1.00 IRIS 2500T 12.99 12.96 20.69 20.66 1.00 Sequent 21000 17.14 17.22 20.97 21.27 1.00 Sun 3/50 11.29 11.07 17.36 16.71 1.00 VAX 11/780 12.24 11.44 11.41 11.69 1.00 VAX 11/785 15.66 14.55 17.65 20.07 1.00 .TE .KE .PP Based on these figures, we suspect that a good portion of the time involved is bit packing; the interface packs 64 bits, each initially in its own byte, into 8 bytes. So, if we eliminate this packing, we should see the new routines speed up. In fact this is what happened. As evidence of this claim, consider table III. Except for the last column, all counts are from calling the routines directly, without the .UX interface; so, there is no bit packing done. In all cases, the new routines are substantially faster than the standard .UX functions, with the improvement ranging from a factor of 5.9 for the Convex using 32 bits per word and running the 6 bit path version without #G# to 21.7 on the Cray 2 running the 12 bit path version without #G#. .LP .KS .TS center, doublebox; c s s s s s c s s s s s c | c s | c s | c c | c | c | c | c | c c | c | c | c | c | c l n n n n n. Table IV. Ratio of Mean Execution Times: Password Encryption Routines \*(Ux Interface = computer 6 bit path 12 bit path standard \^ _ _ _ _ \^ without \f2G\fP with \f2G\fP without \f2G\fP with \f2G\fP function _ Amdahl 5880 17.25 16.82 19.52 19.50 1.00 Convex 1 (32) 10.56 10.55 16.29 16.11 1.00 Convex 1 (64) 13.15 13.01 19.60 19.67 1.00 Cray 2 42.09 41.99 59.35 59.92 1.00 IBM PC/RT 19.25 18.93 24.12 23.96 1.00 IRIS 2500T 16.66 16.87 30.55 31.47 1.00 Sequent 21000 23.62 23.39 29.96 29.41 1.00 Sun 3/50 14.70 14.88 26.00 26.49 1.00 VAX 11/780 17.01 15.67 16.57 15.90 1.00 VAX 11/785 19.71 17.89 24.10 22.36 1.00 .TE .KE .PP The increase in speed of the .UX password encryption routine is far more dramatic; Table IV documents them. Although the password encryption algorithm is essentially 25 iterations of the DES encryption routine, all #F sub 1# and all but one of the #F sub 3# are omitted; hence, we expect the ratios to be more dramatic than those in Table III. Table IV shows our expectations are fulfilled. The factors of improvement range from a low of 10.6 for the Convex using 32 bits per word and running the 6 bit path version (with or without #G#) to a high of 60.0 for the \s-2IRIS\s0 2500T running the 12 bit path version using #G#. .LP .KS .TS center, doublebox; c s s s s s c | c s | c s | c c | c | c | c | c | c c | c | c | c | c | c l n n n n n. Table V. Ratio of Mean Execution Times: Password Encryption Routines = computer 6 bit path 12 bit path standard \^ _ _ _ _ \^ without \f2G\fP with \f2G\fP without \f2G\fP with \f2G\fP function _ Amdahl 5880 18.24 17.89 21.56 21.58 1.00 Convex 1 (32) 11.13 10.99 17.55 17.38 1.00 Convex 1 (64) 14.11 14.03 22.08 21.91 1.00 Cray 2 47.40 47.06 71.55 71.26 1.00 IBM PC/RT 22.77 22.77 28.00 27.66 1.00 IRIS 2500T 17.33 17.58 32.82 33.78 1.00 Sequent 21000 25.04 24.12 32.63 31.97 1.00 SUN 3/50 15.26 15.38 27.71 28.12 1.00 VAX 11/780 18.87 15.41 17.74 17.06 1.00 VAX 11/785 20.42 19.97 27.61 25.11 1.00 .TE .KE .PP Finally consider the speedup mentioned in the previous section, namely omitting the transformation #F sub 3#. How much this affects the speed depends quite a bit on the architecture of the machine; if it can handle bytes well, there should be little effect, but if it is optimized to work with words the omission may improve things substantially. Table V bears this out; the incerase for byte-oriented machines is typically 1 or 2 more iterations than when #F sub 3# is included, but for machines such as the Cray 2, the speedup is far more substantial (on the order of 11 iterations when a 12 bit data path is used). .PP These timings demonstrate that there is a substantial performance gain by using the suggested techniques to speed up the DES routines and the .UX algorithm. In fact, the speedup is so substantial that trying each word in a dictionary to see if it matches a user's encrypted password becomes feasible. (One of the goals of salting was to avoid this attack [.morris thompson password security case history.].) Say an on-line dictionary contains 25,000 words. Using the standard password encryption function on a \s-2VAX\s0 11/780, it would take 17,730.5 seconds (about 5 hours) to check a particular encrypted password; to check a collection of 100 passwords, it would take 1,773,049.6 seconds (about 20.5 days) if each used a different salt. But using this method, it would take 1002 seconds (about 17 minutes) to check a particular encrypted password, and 100,200.0 seconds (about 28 hours) to check 100 encrypted passwords each with a different salt. This shows the danger of relying on a routine's computing something slowly to provide protection; it also demonstrates the old adage that making a cipher more complex does not make it more secure.