The National Semiconductor 32K family
"Elegance and regular design was a main goal of this
processor, as well as completeness. It was similar to the 68000
in basic features, such as byte addressing, 24-bit address bus in
the first version, memory to memory instructions, and so on (The
320xx also includes a string and array instruction). Unlike the
68000, the 320xx had eight instead of sixteen 32-bit registers,
and they were all general purpose, not split into data and
address registers. There was also a useful scaled-index
addressing mode, and unlike other CPUs of the time, only a few
operations affected the condition codes (as in more modern CPUs).
Also different, the PC and stack registers were separate from the
general register set - they were special purpose registers, along
with the interrupt stack, and several "base registers"
to provide multitasking support - the base data register pointed
to the working memory of the current module (or process), the
interrupt base register pointed to a table of interrupt handling
procedures anywhere in memory (rather than a fixed location), and
the module register pointed to a table of active modules.
The 320xx also had a coprocessor bus, similar to the 8-bit
Ferranti F100-L CPU, and coprocessor instructions. Coprocessors
included an MMU, and a Floating Point unit which included eight
32-bit registers, which could be used as four 64-bit registers.
The series found use mainly in embedded applications, and was
expanded to that end, with timers, graphics enhancements, and
even a Digital Signal Processor unit in the Swordfish version
(1991, also known as 32732 and 32764). The Swordfish was among
the first truly superscalar microprocessors, with two 5-stage
pipelines (integer A, and B, which consisted of an integer and
floating point pipeline - an instruction dispatched to B would
execute in the appropriate pipe, leaving the other with an empty
slot. The integer pipe could cycle twice in the memory stage to
synchronise with the result of the floating point pipe, to ensure
in-order completion when floating point operations could trap. B
could also execute branches). This strategy was influenced by the
Multiflow VLIW design. Instructions were always fetched two at a
time from the instruction cache which partially decoded the
instruction pairs and set a bit to indicate whether they were
dependent or could be issued simultaneously (effectively
generating two-word VLIWs in the cache from an external stream of
instructions). The cache decoder also generated branch target
addresses to reduce branch latency as in the AT&T
CRISP/Hobbit CPU.
The Swordfish implemented the NS32K instruction set using a
reduced instruction core - NS32K instructions were translated by
the cache decoder into either: one internal instruction, a pair
of internal instructions in the cache, or a partially decoded
NS32K instruction which would be fully decoded into internal
instructions after being fetched by the CPU. The Swordfish also
had dynamic bus resizing (8, 16, 32, or 64 bits, allowing 2
instructions to be fetched at once) and clock doubling, 2 DMA
channels, and in circuit emulation (ICE) support for debugging.
The Swordfish was later simplified into a load-store design and
used to implement an instruction set called CompactRISC (also
known as Pirhana, an implementation independent instruction set
supporting designs from 8 to 64 bits)." Great
Microprocessors of the Past and Present (V 12.1.2)
One additional note since the internet magazine for modern myths, wikipedia, trys to shed a bad light on these processors: at those times you had to build the whole logic necessary for communication between the different chips amd memory by yourself. This was usually done with ASIC's. And this is a complicated bussiness as every one can tell who was in this bussiness those days. But this was the case with all processors at that time.At least with processors with many support chips. What's more all processors at that time needed a very careful layout and a very sophisticated power supply and power supply blocking sheme. The NS-processor was a little bit - if not to say very - delicate in this aspect, that's true. So also in this aspect it was very similar to more modern CPUS's. You almost had to observe rules of analog design if you wanted a stable running layout. Adnitted: those who were used the rock-steady functioning of a Z-80 or the 68000 sure had problems when working with the 32.000 family. But these were the exception to the rule: they worked in nearly every design.
What is surely true is that the 16032 had in its infancy many bugs. But this was in these years absolutely no peculiarity. Especially with the more complex designs. As an example: when the Intel FPU 387 (delated by 2 years ) appeared, it had so many bugs that it was almost useless. There are till today test programs in the web to rule out these errors. The problem those days were not the bugs themselves but the knowledge about them (they were not communicated by the manufacterers), so that you could program around them.
Developpers (the whole design phase was accompanied by former developpers of the
vax team as consultants):
National Semiconductor 32K - Dan O'Dowd and Les Kohn
32016/16032 - Avraham Menachem (microarchitecture and chip
design), Asher Kaminker (microcode), and Yoav Lavy (BIU,
processor buses, external MMU, and interrupt controller)
32332 - Ran Talmudi
32532, 1987 - Uri Weiser, Don Alpert, Gigi Licht, Jonathan Levy
(BIU, MMU, and dcache), and Sidi Yom Tov (design manager)
See B. Maytal, S. Iacobovici, D. Alpert, D. Biran, J. Levy, and
S.Y. Tov, "Design Considerations for a General Purpose
Microprocessor," IEEE Computer, January 1989, pp. 66-76.
See D. Alpert, J. Levy, and B. Maytal, "Architecture of the
NS32532 Microprocessor," Proceedings ICCD, October 1987, pp.
168-172.
32732 (a.k.a. 32764 and Swordfish, superscalar design, not
delivered as N32K family member), 1991 - Don Alpert (see
Swordfish web page and CompactRISC