View AN_367883.PDF datasheet online --- IC-ON-LINE

Datasheet File OCR Text:

AN Overview of the ZSPTM Architecture
For more information contact: Karen Suty LSI Logic Corporation Public Relations Tel: (408)433-6855 Fax: (408)433-8989
Email: suty@lsil.com
September 27, 1999
Copyright (c) 1999 LSI Logic Corporation. All rights reserved.
The ZSP Architecture
The ZSP Architecture offers the optimum solution for mANy next-generation DSP applications. The Architecture leverages design techniques from advANced microprocessors ANd is optimized for digital communications ANd wireless network applications. Signal Processing PerformANce * * * * * * * Superscalar DSP architecture Dispatches up to four instructions per cycle Dual Multiply Accumulate (MAC) ANd dual Arithmetic Logic Unit (ALU) Short five-stage pipeline minimizes penalties of brANch misprediction RISC-based instruction set architecture: ALL instructions execute in a single cycle Fixed-length instructions allow efficient pre-fetch ANd decoding to increase throughput Load/Store architecture: De-couples load ANd store operations from instructions
Ease of Use * * * * * * Register-based orthogonal instruction set Highly programmable: compiler AND programmer friendly Familiar programming paradigm minimizes software risk Hardware scheduling: eliminates conflicts ANd ensures deterministic behavior Compiler produces efficient DSP assembly from high-level lANguages Hardware caches ANd pre-fetch logic minimize programmer exposure to memory access bottlenecks
Cost-effective * High code efficiency: High performANce without unrolling inner loops * Microprocessor-based programming paradigm for control applications * Efficient context save ANd a multi-level interrupt structure for multitasking Flexibility * High performANce enables programmable solutions for new applications * Hidden pipeline enables software compatibility across ZSP Processor families * Scalable: Architecture may be reconfigured for varying processing needs without affecting the programming model
(c) 1999 LSI Logic Corporation
3
Functional Units
The ZSP Architecture comprises the Instruction Unit, Data Unit, Pipeline Control Unit, two MAC units, two ALUs, ANd a register file. Figure 1 shows AN example of a complete system with the core of the ZSP Architecture indicated by a dotted line. The complete system includes boot ROM, dual access internal RAM, memory interface unit (MIU) with DMA controller, serial port, host processor interface(HPI), debugger unit (DEU), ANd JTAG interface.
Figure 1. ZSP Processor Architecture
The Instruction Unit (IU)
The Instruction Unit comprises a direct-mapped instruction cache, pre-fetch unit, ANd the instruction dispatch unit. The tasks performed by the IU every cycle include the pre-fetch of four instructions, instruction decode ANd dispatch, ANd the placing of instructions into cache. The pre-fetch unit utilizes static brANch prediction ANd pre-fetch techniques to minimize cache miss penalties ANd reduce pipeline stalls. Typically, DSP programs execute in tight loops ANd the instruction cache reduces memory accesses, thereby reducing power consumption. The ZSP Architecture places no restriction on the size of cacheable loops that are nestable ANd interruptible. The Architecture also provides hardware-supported zero overhead loops.
(c) 1999 LSI Logic Corporation
4
The Data Unit (DU)
The DU comprises the direct mapped data cache, data pre-fetch unit, circular buffer unit, ANd load/ store arbiter. It is the task of the DU to pre-fetch ANd buffer four data words per cycle. This unit also has the ability to write two data words per cycle, if required. Like the instruction cache the data cache reduces power dissipation by minimizing memory accesses. The Data Unit provides hardware for the implementation of two circular buffers ANd for the sustained data throughput required by DSP applications.
The Pipeline Control Unit (PCU)
The PCU' role is to group instructions for parallel execution. In this task, the PCU resolves data ANd s resource dependencies in the program sequence. Stated ANother way, this hardware schedules instructions for execution by the four functional units (two MACs ANd two ALUs), simplifying the tasks of the programmer or the compiler. The PCU also synchronizes the entire operation of the pipeline, arrANges operANd bypass, ANd processes interrupt requests. The ZSP Processor is four-way scalar ANd employs a five-stage pipeline. At ANy time, there may be a maximum of twenty instructions in various stages of execution in the pipeline. The five pipeline stages of this machine are Fetch/Decode (F/D), Group (G), Read (R), Execute (E), ANd Write (W). The Fetch/Decode stage is where instructions are fetched, decoded, ANd issued. The Group stage is where instructions are grouped for parallel execution after thorough checking of dependencies. The operANd register file ANd the data cache are read, ANd functional unit bypassing is performed during the Read stage. Bypassing allows a functional unit to access the result of the previous instruction without waiting for the result to be written to the operANd register file. The Execution stage is where ALU, MAC, ANd ANy internal memory access operations are completed. Finally, the operANd register file is written during the Write stage, ANd store data is written to memory. As previously stated, the PCU processes interrupt requests. The interrupt structure provides four user-defined priority levels. When AN interrupt occurs ANd is enabled (unmasked), its priority is compared to the current execution priority level of the machine. If the new interrupt is of equal or higher priority, its level is saved as the current execution priority level ANd the interrupt is taken. If the new interrupt is of lower priority thAN the execution priority level of the machine, the new interrupt will remain pending until the execution priority level of the machine drops. The user cAN chANge the interrupt priority of AN interrupt source " on-the-fly" by explicitly writing a new priority level to the appropriate field of the Interrupt Priority registers. This method cAN be used to raise the priority level of ANother interrupt source while executing a lower priority interrupt routine. The user cAN also chANge the execution priority level of the machine " on-the-fly" by writing to the Interrupt Priority register, thereby enabling interrupts of lower priority to be taken without modifying ANy of the assigned interrupt priorities. This flexible ANd programmable interrupt mechANism enables quick response to real-time tasks ANd without ANy risk of task starvation. It takes five cycles for the processor to respond to AN interrupt. The double word load ANd store instructions enable fast context saving during interrupt.
(c) 1999 LSI Logic Corporation
5
The Multiply Accumulate Unit (MAC)
The ZSP Architecture has two MAC units that cAN work together or independently. Independently, they cAN each perform a 16-bit by 16-bit multiply with a single 40-bit accumulation in a single cycle.
Together, they work to perform a single 32-bit by 32-bit multiply with 40-bit accumulation in a single cycle. The MACs also contain hardware support for complex multiplies ANd the functionality to perform a single-cycle add-compare-select for Viterbi decoding. The MACs also support parallel add ANd parallel subtraction. MAC operations affect the v (32-bit overflow), gv (40-bit overflow), c (carry) ANd ge (greater thAN or equal to zero) flags, ANd results are immediately available in the next cycle for ANy other functional unit while being written to the register file. There are " sticky" overflow flags for 32- ANd 40-bit overflow conditions. The sticky flags cAN only be cleared by software. The Functional Mode register controls most MAC operations in terms of rounding, saturation on overflow, ANd fractional number support.
The Arithmetic Logic Unit (ALU)
The ZSP Architecture provides two identical 16-bit ALUs that cAN work independently in parallel or together to form a single 32-bit ALU. In addition to traditional ALU functionality, these ALUs also provide bit mANipulation ANd normalization capability. All ALU operations are single-cycle, ANd affect the gt (greater thAN zero), ge (greater thAN or equal to zero), z (zero), v (overflow) ANd c (carry) flags. The result of ANy ALU operation is available for ANy functional unit on the next cycle while being written to the register file. The Functional Mode register controls rounding, saturation, ANd fractional number support of most ALU operations.
The Register Files
There are two types of register files: the operANd register file (ORF) ANd the control register file (CRF). All registers are 16-bit, memory-mapped, ANd cAN be read or written by the user. The operANd register file (ORF) has a total of sixteen 16-bit registers. ANy of these registers may be used as the input or output of ANy ALU operation, or as a pointer to memory for register indirect addressing. For MAC ANd MUL instructions, all the registers cAN be used as input but only a subset of these cAN be destination registers. In addition, ANy operANd register may be used as a stack pointer. The registers are denoted by rX, where 0 X 15. The ORF is accessed via load/store instructions. The ZSP Architecture also provides support for extended precision (32-bit) operations. In these operations (typically denoted in the instruction set with a " extension), registers are used in pairs .e" ANd are referenced by the lower, even numbered register. For example, the instruction add.e r2, r4 describes the addition of the 32-bit operANds contained in register pairs {r3 r2} ANd {r5 r4}. The result of this operation is stored in {r3 r2}. The control register file (CRF) provides mode control as well as status ANd flag information. The CRF contains thirty-two 16-bit registers.
(c) 1999 LSI Logic Corporation
6
System Memory ANd the Memory Interface Unit (MIU)
The system memory is logically segmented into internal program, external program, internal data, ANd external data spaces. Each segment is 64Kwords in size (16-bit address). The total program ANd data external address space is extended from 16-bit to 20-bit via the use of a page register, giving a total of 1Mword of address space. The example from Figure 1 demonstrates AN internal memory of 2Kwords boot ROM ANd 62 Kwords of dual access RAM for instruction ANd data. This dual access RAM is connected to both the IU ANd DU by separate 64-bit data buses ANd cAN be simultANeously accessed from both the instruction ANd data memory spaces with no speed penalty. These buses provide the bANdwidth to fetch four instructions ANd four 16-bit data words per cycle. The Memory Interface Unit (MIU) is a pin-multiplexed 20-bit address, 32-bit data external memory interface for glueless external memory or peripheral expANsion. The MIU provides bus arbitration logic for easy interface to external shared memory. The pin multiplexing allows the sharing of instruction ANd data address lines to reduce the pin count. The 4-bit page register resides in MIU to enable the memory address space extension to 20-bits. The ZSP Architecture cAN also support Harvard (separate program ANd data) ANd low-power, highdensity memory systems based upon single-port memories.
Programming Example
The ZSP Instruction Set is based on a load/store RISC paradigm. All instructions cAN use ANy of the general-purpose registers as the source ANd destination. The entire instruction set provides the following features: * * * * * * * * * * * * * * * * * Single word (16-bit) instruction Single cycle execution of all instructions ANy two 16-bit ALU operations ANy 32-bit ALU operations Two 16-bit X 16-bit MUL with a single 40-bit accumulation One 32-bit X 32-bit MUL with 40-bit accumulation Exponent detection for 16/32 bit variable Square of 16/32 bit variable 16/3- bit min ANd max instructions Majority of basic functions defined by ETSI for speech coding applications Perform two additions or subtractions in MAC units to support ALU intensive code Single-cycle compare-select instruction that supports two cycle Viterbi butterfly operation 16-bit complex multiplication or multiply-accumulate in two cycles Extensive bit mANipulation instructions, bit reversal, bit shift, bit set, bit clear, ANd bit invert Excellent conditional brANch support Zero overhead looping Circular buffer support
The following assembly code implements the vector add, Z = X + Y, where X, Y, Z are 32-bit vectors. .segment " text" .global main
(c) 1999 LSI Logic Corporation
7
main:
lda r14, ARRAY_X lda r15, ARRAY_Y mov %loop0, 7
/* r14 is a pointer to input data X */ /* r15 is a pointer to input data Y */ /* execute this loop 8 times */ /* r4=M(r14), r5=M(r14+1), r14=r14+2 */ /* r8=M(r15), r9=M(r15+1), r15=r15+2 */ /* (r5 r4) = (r5 r4) + (r9 r8) */ /* M(r13)=r4, M(r13+1)=r5, r13=r13+2 */ /* test, decrement counter, then brANch */
lda r13, RESULT/* r13 points to result array Z */ vector_add_32: lddu r4, r14, 2 lddu r8, r15, 2 add.e r4, r8 stdu r4, r13, 2 agn0 vector_add_32 end_vector_add: halt .segment " data" RESULT: .wspace 16 /* data space for result array Z */
Development Tools
The ZSP Processor family is fully supported by a GNU-based compiler, linker ANd assembler, available for Windows 95, Windows NT, ANd Solaris 2 platforms. The ZSP Architecture enables the C compiler to produce code unrivaled in code density ANd execution speed by ANy DSP in its class, offering fast to time to market with minimal compromise on performANce ANd cost. On-chip debug is facilitated by the JTAG port of ZSP devices, interfaced to host PC platforms via a convenient PCMCIA interface. Development platforms are available, offering the following features:
* Integrated Debug Environment * FLASH EPROM * RS232C ANd JTAG-based host communication ANd code download * ExpANsion to 64K words data ANd program memory * Two voice-bANd codecs * ANalog I/O interfaces
(c) 1999 LSI Logic Corporation
8

▲Up To Search▲

Price & Availability of AN

	To Download AN Datasheet File
If you can't view the Datasheet, Please click here to try to view without PDF Reader .