.NH
An Experiment
.PP
In order to substantiate our claim
that the suggested optimizations provided substantial increase
in speed,
we implemented four versions of the faster DES algorithm,
and four versions of the faster
.UX
password algorithm:
.de Ip
.IP \(bu \w'\(bu\0'u
..
.RS
.Ip
one without #G# and with a 6 bit data path to #F sub 2#
.Ip
one with #G# and with a 6 bit data path to #F sub 2#
.Ip
one without #G# and with a 12 bit data path to #F sub 2#
.Ip
one with #G# and with a 12 bit data path to #F sub 2#
.RE
.LP
The routines were timed on several different architectures,
and the results collected below.
As a control,
the standard
.UX
implementation,
which stores one bit per storage unit (byte)
and uses a straightforward translation of the algorithm into C,
was also timed.
.LP
.KS
.TS
center, doublebox;
c s s s
c c c c
c c c c
l n n l.
Table I.  The Computers
=
computer	timer intervals	bits	manufacturer
\^	(per sec)	(per word)	\^
_
Amdahl 5880	60	32	Amdahl Corp.
Convex 1 (32)	100	32	Convex Computer Corp.
Convex 1 (64)	100	64	Convex Computer Corp.
Cray 2	243902439	64	Cray Research Inc.
IBM PC/RT	100	32	IBM Corp.
IRIS 2500T	60	32	Silicon Graphics Inc.
Sequent 21000	100	32	Sequent Computer Systems
Sun 3/50	60	32	Sun Microsystems, Inc.
VAX 11/780	100	32	Digital Equipment Corp.
VAX 11/785	100	32	Digital Equipment Corp.
.TE
.KE
.PP
Table I summarizes the machines and their relevant characteristics.
(More detailed information is in appendix IV.)
In particular,
note the clock rates;
with all except the Cray 2,
the routines were expected to execute much faster than the clock tick.
The usual way to make timings is to start the clock,
run the routine,
stop the clock,
and determine the elapsed time.
However,
since this would be on the order of one tick,
the results gleaned this way would be highly suspect.
So,
what was done instead was to start the clock
and execute a loop for approximately 10 seconds (virtual time.)
A counter in this loop was incremented every time
the routine completed execution.
When the time was up,
the next return ended the loop,
and the clock was stopped and the elapsed time computed.
Another loop,
just like the first but without the call to the routine,
was executed for the same number of iterations;
the time to complete this loop was then subtracted from the elapsed time.
This way,
the total time was subject to an error of at most two clock ticks.
This procedure was repeated ten times,
and the times and ratios computed by averaging over the ten results.
(The tables in Appendix V give the actual timings.)
.PP
We should also note that the timings are not meaningful when comparing
among many machines,
because the loads were very different.
On the assumption that none of the loads changed dramatically during the
testing
(an assumption that in fact held),
the ratios express the amount of speedup quite accurately;
this is why we use them in this section,
rather than the times.
.LP
.KS
.TS
center, doublebox;
c   s   s   s   s   s
c   s   s   s   s   s
c | c   s | c   s | c
c | c | c | c | c | c
c | c | c | c | c | c
l   n   n   n   n   n.
Table II. Ratio of Mean Execution Times: DES Encryption Routines
\*(Ux Interface
=
computer	6 bit path	12 bit path	standard
\^	_	_	_	_
\^	without \f2G\fP	with \f2G\fP	without \f2G\fP	with \f2G\fP	function
_
Amdahl 5880	7.18	7.19	7.97	8.00	1.00
Convex 1 (32)	4.29	4.26	5.09	5.04	1.00
Convex 1 (64)	4.85	4.84	5.62	5.60	1.00
Cray 2	9.50	9.51	10.28	10.30	1.00
IBM PC/RT	7.88	7.89	8.67	8.74	1.00
IRIS 2500T	8.37	8.32	10.98	10.95	1.00
Sequent 21000	8.89	8.67	9.74	9.77	1.00
Sun 3/50	7.33	7.21	9.45	9.29	1.00
VAX 11/780	6.12	5.88	6.15	6.00	1.00
VAX 11/785	6.78	6.82	7.56	7.77	1.00
.TE
.KE
.PP
Table II gives the ratio of the mean execution time
of the DES encryption routines to the
.UX
standard DES encryption routines.
The interface is the same;
the standard
.UX
routines
.I setkey
and
.I encrypt
(see
.I crypt (3)
in
[.bsd reference manual.]
or
.I crypt (3C)
in
[.system five reference manual.])
can be replaced by these simply by naming a library
to the linking loader.
In all cases, there is a substantial speedup,
from a factor of 4.3 for the Convex running with 32 bits per word and using
the 6 bit path version not using #G#,
to 11.0 on the IRIS 2500T running the 12 bit path version.
.LP
.KS
.TS
center, doublebox;
c   s   s   s   s   s
c | c   s | c   s | c
c | c | c | c | c | c
c | c | c | c | c | c
l   n   n   n   n   n.
Table III. Ratio of Mean Execution Times: DES Encryption Routines
=
computer	6 bit path	12 bit path	standard
\^	_	_	_	_	
\^	without \f2G\fP	with \f2G\fP	without \f2G\fP	with \f2G\fP	function
_
Amdahl 5880	11.99	12.04	14.46	14.80	1.00
Convex 1 (32)	5.85	5.87	7.52	7.31	1.00
Convex 1 (64)	6.99	7.00	8.64	8.59	1.00
Cray 2	18.51	18.55	21.60	21.66	1.00
IBM PC/RT	13.14	13.24	15.39	15.69	1.00
IRIS 2500T	12.99	12.96	20.69	20.66	1.00
Sequent 21000	17.14	17.22	20.97	21.27	1.00
Sun 3/50	11.29	11.07	17.36	16.71	1.00
VAX 11/780	12.24	11.44	11.41	11.69	1.00
VAX 11/785	15.66	14.55	17.65	20.07	1.00
.TE
.KE
.PP
Based on these figures,
we suspect that a good portion of the time involved is bit packing;
the interface packs 64 bits,
each initially in its own byte,
into 8 bytes.
So,
if we eliminate this packing,
we should see the new routines speed up.
In fact this is what happened.
As evidence of this claim,
consider table III.
Except for the last column,
all counts are from calling the routines directly,
without the
.UX
interface;
so,
there is no bit packing done.
In all cases,
the new routines are substantially faster than the standard
.UX
functions,
with the improvement ranging from
a factor of 5.9 for the Convex using 32 bits per word
and running the 6 bit path version without #G#
to 21.7 on the Cray 2 running the 12 bit path version without #G#.
.LP
.KS
.TS
center, doublebox;
c   s   s   s   s   s
c   s   s   s   s   s
c | c   s | c   s | c
c | c | c | c | c | c
c | c | c | c | c | c
l   n   n   n   n   n.
Table IV. Ratio of Mean Execution Times: Password Encryption Routines
\*(Ux Interface
=
computer	6 bit path	12 bit path	standard
\^	_	_	_	_
\^	without \f2G\fP	with \f2G\fP	without \f2G\fP	with \f2G\fP	function
_
Amdahl 5880	17.25	16.82	19.52	19.50	1.00
Convex 1 (32)	10.56	10.55	16.29	16.11	1.00
Convex 1 (64)	13.15	13.01	19.60	19.67	1.00
Cray 2	42.09	41.99	59.35	59.92	1.00
IBM PC/RT	19.25	18.93	24.12	23.96	1.00
IRIS 2500T	16.66	16.87	30.55	31.47	1.00
Sequent 21000	23.62	23.39	29.96	29.41	1.00
Sun 3/50	14.70	14.88	26.00	26.49	1.00
VAX 11/780	17.01	15.67	16.57	15.90	1.00
VAX 11/785	19.71	17.89	24.10	22.36	1.00
.TE
.KE
.PP
The increase in speed of the
.UX
password encryption routine is far more dramatic;
Table IV documents them.
Although the password encryption algorithm
is essentially 25 iterations of the DES encryption routine,
all #F sub 1# and all but one of the #F sub 3# are omitted;
hence,
we expect the ratios to be more dramatic than those in Table III.
Table IV shows our expectations are fulfilled.
The factors of improvement range from a low of
10.6 for the Convex using 32 bits per word and
running the 6 bit path version (with or without #G#)
to a high of 60.0 for the \s-2IRIS\s0 2500T
running the 12 bit path version using #G#.
.LP
.KS
.TS
center, doublebox;
c   s   s   s   s   s
c | c   s | c   s | c
c | c | c | c | c | c
c | c | c | c | c | c
l   n   n   n   n   n.
Table V. Ratio of Mean Execution Times: Password Encryption Routines
=
computer	6 bit path	12 bit path	standard
\^	_	_	_	_
\^	without \f2G\fP	with \f2G\fP	without \f2G\fP	with \f2G\fP	function
_
Amdahl 5880	18.24	17.89	21.56	21.58	1.00
Convex 1 (32)	11.13	10.99	17.55	17.38	1.00
Convex 1 (64)	14.11	14.03	22.08	21.91	1.00
Cray 2	47.40	47.06	71.55	71.26	1.00
IBM PC/RT	22.77	22.77	28.00	27.66	1.00
IRIS 2500T	17.33	17.58	32.82	33.78	1.00
Sequent 21000	25.04	24.12	32.63	31.97	1.00
SUN 3/50	15.26	15.38	27.71	28.12	1.00
VAX 11/780	18.87	15.41	17.74	17.06	1.00
VAX 11/785	20.42	19.97	27.61	25.11	1.00
.TE
.KE
.PP
Finally consider the speedup mentioned in the previous section,
namely omitting the transformation #F sub 3#.
How much this affects the speed depends quite a bit on the
architecture of the machine;
if it can handle bytes well,
there should be little effect,
but if it is optimized to work with words the omission may improve
things substantially.
Table V bears this out;
the incerase for byte-oriented machines is typically
1 or 2 more iterations than when #F sub 3# is included,
but for machines such as the Cray 2,
the speedup is far more substantial
(on the order of 11 iterations when a 12 bit data path is used).
.PP
These timings demonstrate that there is a substantial performance gain
by using the suggested techniques to speed up the DES routines
and the
.UX
algorithm.
In fact,
the speedup is so substantial that trying each word in a dictionary
to see if it matches a user's encrypted password becomes feasible.
(One of the goals of salting was to avoid this attack
[.morris thompson password security case history.].)
Say an on-line dictionary contains 25,000 words.
Using the standard password encryption function on a \s-2VAX\s0 11/780,
it would take 17,730.5 seconds (about 5 hours) to check a particular
encrypted password;
to check a collection of 100 passwords,
it would take 1,773,049.6 seconds (about 20.5 days)
if each used a different salt.
But using this method,
it would take 1002 seconds (about 17 minutes) to check a particular
encrypted password,
and 100,200.0 seconds (about 28 hours) to check 100 encrypted passwords
each with a different salt.
This shows the danger of relying on a routine's computing something slowly
to provide protection;
it also demonstrates the old adage that making a cipher more complex
does not make it more secure.