|
If you can't view the Datasheet, Please click here to try to view without PDF Reader . |
|
Datasheet File OCR Text: |
hitachi superh risc engine sh-1/sh-2/sh-dsp programming manual ade-602-063c rev. 4.0 3/25/99 hitachi ,ltd
cautions 1. hitachi neither warrants nor grants licenses of any rights of hitachi? or any third party? patent, copyright, trademark, or other intellectual property rights for information contained in this document. hitachi bears no responsibility for problems that may arise with third party? rights, including intellectual property rights, in connection with use of the information contained in this document. 2. products and product specifications may be subject to change without notice. confirm that you have received the latest product standards or specifications before final design, purchase or use. 3. hitachi makes every attempt to ensure that its products are of high quality and reliability. however, contact hitachi? sales office before using the product in an application that demands especially high quality and reliability or where its failure or malfunction may directly threaten human life or cause risk of bodily injury, such as aerospace, aeronautics, nuclear power, combustion control, transportation, traffic, safety equipment or medical equipment for life support. 4. design your application so that the product is used within the ranges guaranteed by hitachi particularly for maximum rating, operating supply voltage range, heat radiation characteristics, installation conditions and other characteristics. hitachi bears no responsibility for failure or damage when used beyond the guaranteed ranges. even within the guaranteed ranges, consider normally foreseeable failure rates or failure modes in semiconductor devices and employ systemic measures such as fail-safes, so that the equipment incorporating hitachi product does not cause bodily injury, fire or other consequential damage due to operation of the hitachi product. 5. this product is not designed to be radiation resistant. 6. no one is permitted to reproduce or duplicate, in any form, the whole or part of this document without written approval from hitachi. 7. contact hitachi? sales office for any questions regarding this document or hitachi semiconductor products. introduction the sh-1 and sh-2 incorporates a risc (reduced instruction set computer) type cpu. a basic instruction can be executed in one clock cycle, realizing high performance operation. a built-in multiplier can execute multiplication and addition as quickly as dsp. the sh-dsp is a 32 bit microcontroller based on hitachi?s super tm risc engine that realizes the same signal processing capability as a general usage dsp (digital signal processor). the sh-dsp offers an improvement on the dsp functions of multiplication and multiply and accumulate in superh microprocessors by using a dsp style data path function. it maintains upward compatibility at the object code level with the sh-1 and sh-2 microprocessors and has the many functions, low power usage, and low price of other superh microprocessors. the sh-dsp achieves high performance in processing operations by using a risc cpu core and a dsp unit with dsp functions. this new type of single chip risc-dsp simultaneously integrates the peripheral functions needed to build systems into the sh-dsp and provides the lower-power consumption vital to microprocessor applications. this programming manual describes in detail the basic architecture and instructions for the sh-1, sh2, and sh-dsp and is intended as a reference on instruction operation and architecture. it also covers the operation of pipelines, which are a feature of the superh microprocessor. for software development environment system, contact your hitachi sales office. note: superh tm is a trademark of hitachi, ltd. i contents section 1 features .............................................................................................................. 1 1.1 sh-1 and sh-2 features .................................................................................................... 1 1.2 sh-dsp features ............................................................................................................. .. 2 section 2 register configuration .................................................................................. 5 2.1 general registers ........................................................................................................... .... 5 2.2 control registers........................................................................................................... ..... 8 2.3 system registers............................................................................................................ .... 11 2.4 dsp registers ............................................................................................................... ..... 12 2.5 precautions for handling of guard bit and overflow ....................................................... 14 2.6 initial values of registers................................................................................................. .14 section 3 data formats .................................................................................................... 15 3.1 data format in registers.................................................................................................... 15 3.2 data format in memory..................................................................................................... 15 3.3 immediate data format ..................................................................................................... 16 3.4 dsp type data formats .................................................................................................... 16 3.5 dsp instructions and data formats ................................................................................... 18 3.5.1 dsp data processing ............................................................................................ 18 3.5.2 x and y data transfers ........................................................................................ 18 3.5.3 single data transfers............................................................................................ 18 section 4 instruction features ........................................................................................ 23 4.1 risc-type instruction set................................................................................................. 23 4.2 addressing modes............................................................................................................ .. 26 4.3 instruction format.......................................................................................................... .... 29 4.4 dsp ......................................................................................................................... ........... 32 4.5 dsp data addressing ........................................................................................................ 3 3 4.5.1 x and y data addressing ..................................................................................... 33 4.5.2 single data addressing ........................................................................................ 35 4.5.3 modulo addressing .............................................................................................. 36 4.5.4 dsp addressing operation ................................................................................... 37 4.6 instruction formats for dsp instructions .......................................................................... 39 4.6.1 double and single data transfer instructions...................................................... 39 4.6.2 parallel processing instructions............................................................................ 42 4.7 alu fixed decimal point operations............................................................................... 46 4.7.1 function ................................................................................................................ 46 4.7.2 instructions and operands .................................................................................... 47 4.7.3 dc bit ................................................................................................................... 4 8 ii 4.7.4 condition bits ....................................................................................................... 51 4.7.5 overflow prevention function (saturation operation) ........................................ 51 4.8 alu integer operations .................................................................................................... 51 4.9 alu logical operations.................................................................................................... 53 4.9.1 function ................................................................................................................ 53 4.9.2 instructions and operands .................................................................................... 54 4.9.3 dc bit ................................................................................................................... 5 5 4.9.4 condition bits ....................................................................................................... 55 4.10 fixed decimal point multiplication................................................................................... 55 4.11 shift operations ........................................................................................................... ...... 57 4.11.1 arithmetic shift operations.................................................................................. 58 4.11.2 logical shift operations....................................................................................... 59 4.12 the msb detection instruction ......................................................................................... 61 4.12.1 function ................................................................................................................ 6 1 4.12.2 instructions and operands .................................................................................... 65 4.12.3 dc bit ................................................................................................................... 65 4.12.4 condition bits ....................................................................................................... 66 4.13 rounding................................................................................................................... ......... 66 4.13.1 operation function ............................................................................................... 66 4.13.2 instructions and operands .................................................................................... 68 4.13.3 dc bit ................................................................................................................... 68 4.13.4 condition bits ....................................................................................................... 69 4.13.5 overflow prevention function (saturation operation) ........................................ 69 4.14 condition select bits (cs) and the dsp condition bit (dc)............................................ 69 4.15 overflow prevention function (saturation operation)...................................................... 71 4.16 data transfers............................................................................................................. ....... 72 4.16.1 x and y memory data transfer ........................................................................... 72 4.16.2 single data transfers............................................................................................ 73 4.17 operand contention ......................................................................................................... .. 76 4.18 dsp repeat (loop) control ............................................................................................... 78 4.18.1 actual programming ............................................................................................. 81 4.19 conditional instructions and data transfers...................................................................... 85 section 5 instruction set .................................................................................................. 87 5.1 instruction set for cpu instructions.................................................................................. 87 5.1.1 data transfer instructions .................................................................................... 91 5.1.2 arithmetic instructions ......................................................................................... 93 5.1.3 logic operation instructions ................................................................................ 95 5.1.4 shift instructions................................................................................................... 96 5.1.5 branch instructions ............................................................................................... 97 5.1.6 system control instructions.................................................................................. 98 5.1.7 cpu instructions that support dsp functions.................................................... 100 5.2 dsp data transfer instruction set..................................................................................... 102 iii 5.2.1 double data transfer instructions (x memory data).......................................... 103 5.2.2 double data transfer instructions (y memory data).......................................... 103 5.2.3 single data transfer instructions ......................................................................... 104 5.3 dsp operation instruction set ........................................................................................... 105 5.3.1 alu arithmetic operation instructions ............................................................... 109 5.3.2 alu logical operation instructions .................................................................... 113 5.3.3 fixed decimal point multiplication instructions.................................................. 113 5.3.4 shift operation instructions.................................................................................. 114 5.3.5 system control instructions.................................................................................. 116 5.3.6 nopx and nopy instruction code ..................................................................... 116 section 6 instruction descriptions ................................................................................ 119 6.1 instruction descriptions .................................................................................................... . 119 6.1.1 sample description (name): classification.......................................................... 119 6.1.2 add (add binary): arithmetic instruction........................................................ 123 6.1.3 addc (add with carry): arithmetic instruction............................................... 124 6.1.4 addv (add with v flag overflow check): arithmetic instruction ................. 125 6.1.5 and (and logical): logic operation instruction.............................................. 126 6.1.6 bf (branch if false): branch instruction.............................................................. 128 6.1.7 bf/s (branch if false with delay slot): branch instruction................................ 129 6.1.8 bra (branch): branch instruction ....................................................................... 131 6.1.9 braf (branch far): branch instruction .............................................................. 133 6.1.10 bsr (branch to subroutine): branch instruction ................................................. 135 6.1.11 bsrf (branch to subroutine far): branch instruction ........................................ 137 6.1.12 bt (branch if true): branch instruction .............................................................. 139 6.1.13 bt/s (branch if true with delay slot): branch instruction................................. 140 6.1.14 clrmac (clear mac register): system control instruction ........................... 142 6.1.15 clrt (clear t bit): system control instruction ................................................. 143 6.1.16 cmp/cond (compare conditionally): arithmetic instruction.............................. 144 6.1.17 div0s (divide step 0 as signed): arithmetic instruction ................................... 148 6.1.18 div0u (divide step 0 as unsigned): arithmetic instruction .............................. 149 6.1.19 div1 (divide 1 step): arithmetic instruction...................................................... 150 6.1.20 dmuls.l (double-length multiply as signed): arithmetic instruction............ 155 6.1.21 dmulu.l (double-length multiply as unsigned): arithmetic instruction....... 157 6.1.22 dt (decrement and test): arithmetic instruction................................................ 159 6.1.23 exts (extend as signed): arithmetic instruction ............................................... 160 6.1.24 extu (extend as unsigned): arithmetic instruction .......................................... 161 6.1.25 jmp (jump): branch instruction........................................................................... 162 6.1.26 jsr (jump to subroutine): branch instruction (class: delayed branch instruction)..................................................................... 163 6.1.27 ldc (load to control register): system control instruction (class: interrupt disabled instruction).................................................................. 165 6.1.28 ldre (load effective address to re register): system control instruction .... 168 iv 6.1.29 ldrs (load effective address to rs register): system control instruction..... 170 6.1.30 lds (load to system register): system control instruction .............................. 172 6.1.31 mac.l (multiply and accumulate calculation long): arithmetic instruction........................................................................................... 177 6.1.32 mac.w (multiply and accumulate calculation word): arithmetic instruction........................................................................................... 180 6.1.33 mov (move data): data transfer instruction ..................................................... 183 6.1.34 mov (move immediate data): data transfer instruction ................................... 189 6.1.35 mov (move peripheral data): data transfer instruction.................................... 191 6.1.36 mov (move structure data): data transfer instruction ..................................... 194 6.1.37 mova (move effective address): data transfer instruction ............................. 197 6.1.38 movt (move t bit): data transfer instruction.................................................. 198 6.1.39 mul.l (multiply long): arithmetic instruction ................................................. 199 6.1.40 muls.w (multiply as signed word): arithmetic instruction ............................ 200 6.1.41 mulu.w (multiply as unsigned word): arithmetic instruction........................ 201 6.1.42 neg (negate): arithmetic instruction.................................................................. 202 6.1.43 negc (negate with carry): arithmetic instruction ............................................ 203 6.1.44 nop (no operation): system control instruction................................................ 204 6.1.45 not (not-logical complement): logic operation instruction ........................ 205 6.1.46 or (or logical) logic operation instruction..................................................... 206 6.1.47 rotcl (rotate with carry left): shift instruction ............................................. 208 6.1.48 rotcr (rotate with carry right): shift instruction........................................... 209 6.1.49 rotl (rotate left): shift instruction.................................................................. 210 6.1.50 rotr (rotate right): shift instruction................................................................ 211 6.1.51 rte (return from exception): system control instruction ................................. 212 6.1.52 rts (return from subroutine): branch instruction (class: delayed branch instruction)..................................................................... 214 6.1.53 setrc (set repeat count to rc): system control instruction .......................... 216 6.1.54 sett (set t bit): system control instruction ..................................................... 218 6.1.55 shal (shift arithmetic left): shift instruction .................................................. 219 6.1.56 shar (shift arithmetic right): shift instruction................................................ 220 6.1.57 shll (shift logical left): shift instruction........................................................ 221 6.1.58 shlln (shift logical left n bits): shift instruction............................................ 222 6.1.59 shlr (shift logical right): shift instruction...................................................... 224 6.1.60 shlrn (shift logical right n bits): shift instruction ......................................... 225 6.1.61 sleep (sleep): system control instruction......................................................... 227 6.1.62 stc (store control register): system control instruction (interrupt disabled instruction) ............................................................................ 228 6.1.63 sts (store system register): system control instruction (interrupt disabled instruction) ............................................................................ 231 6.1.64 sub (subtract binary): arithmetic instruction.................................................... 236 6.1.65 subc (subtract with carry): arithmetic instruction ........................................... 237 6.1.66 subv (subtract with v flag underflow check): arithmetic instruction ........... 238 v 6.1.67 swap (swap register halves): data transfer instruction.................................. 239 6.1.68 tas (test and set): logic operation instruction ................................................ 241 6.1.69 trapa (trap always): system control instruction............................................ 242 6.1.70 tst (test logical): logic operation instruction................................................. 243 6.1.71 xor (exclusive or logical): logic operation instruction................................ 245 6.1.72 xtrct (extract): data transfer instruction........................................................ 247 6.2 dsp data transfer instructions ......................................................................................... 248 6.2.1 x and y data transfers (movx.w and movy.w)........................................... 249 6.2.2 single data transfers (movs.w and movs.l)................................................. 251 6.2.3 sample description (name): classification.......................................................... 252 6.2.4 movs (move single data between memory and dsp register): dsp data transfer instruction.............................................................................. 255 6.2.5 movx (move between x memory and dsp register): dsp data transfer instruction.............................................................................. 257 6.2.6 movy (move between y memory and dsp register): dsp data transfer instruction.............................................................................. 258 6.2.7 nopx (no access operation for x memory): dsp data transfer instruction .. 260 6.3 dsp operation instructions ............................................................................................... 261 6.3.1 pabs (absolute): dsp arithmetic operation instruction ................................... 278 6.3.2 [if cc]padd (addition with condition): dsp arithmetic operation instruction ............................................................................................ 282 6.3.3 padd pmuls (addition & multiply signed by signed): dsp arithmetic operation instruction.................................................................. 286 6.3.4 paddc (addition with carry): dsp arithmetic operation instruction.............. 291 6.3.5 [if cc] pand (logical and): dsp logical operation instruction ..................... 294 6.3.6 [if cc] pclr (clear): dsp arithmetic operation instruction .............................. 298 6.3.7 pcmp (compare two data): dsp arithmetic operation instruction.................. 301 6.3.8 [if cc] pcopy (copy with condition): dsp arithmetic operation instruction .. 303 6.3.9 [if cc] pdec (decrement by 1): dsp arithmetic operation instruction ............. 307 6.3.10 [if cc] pdmsb (detect msb with condition): dsp arithmetic operation instruction ............................................................................................ 312 6.3.11 [if cc] pinc (increment by 1 with condition): dsp arithmetic operation instruction ............................................................................................ 317 6.3.12 [if cc] plds (load system register): dsp system control instruction ............. 322 6.3.13 pmuls (multiply signed by signed): dsp arithmetic operation instruction ... 326 6.3.14 [if cc] pneg (negate): dsp arithmetic operation instruction ........................... 329 6.3.15 [if cc] por (logical or): dsp logical operation instruction............................ 334 6.3.16 prnd (rounding): dsp arithmetic operation instruction ................................. 338 6.3.17 [if cc] psha (shift arithmetically with condition): dsp arithmetic shift instruction .................................................................................................... 342 6.3.18 [if cc] pshl (shift logically with condition): dsp logical shift instruction ... 350 6.3.19 [if cc] psts (store system register): dsp system control instruction.............. 357 6.3.20 [if cc]psub (subtract with condition): dsp arithmetic operation instruction.. 362 vi 6.3.21 psub pmuls (subtraction & multiply signed by signed): dsp arithmetic operation instruction ................................................................. 367 6.3.22 psubc (subtraction with carry): dsp arithmetic operation instruction .......... 372 6.3.23 [if cc] pxor (logical exclusive or): dsp logical operation instruction........ 375 section 7 pipeline operation .......................................................................................... 381 7.1 basic configuration of pipelines ....................................................................................... 381 7.1.1 the five-stage pipeline........................................................................................ 381 7.1.2 slot and pipeline flow.......................................................................................... 382 7.1.3 slot length............................................................................................................ 383 7.1.4 number of instruction execution cycles.............................................................. 384 7.2 contention.................................................................................................................. ........ 385 7.2.1 contention between instruction fetch (if) and memory access (ma) ............... 385 7.2.2 contention when the previous instruction's destination register is used........... 389 7.2.3 multiplier access contention ............................................................................... 392 7.2.4 contention between memory stores and dsp operations ................................... 393 7.3 programming guide ........................................................................................................... 393 7.3.1 types of contention and affected instructions.................................................... 393 7.3.2 increasing instruction execution speed................................................................ 395 7.3.3 cycles.................................................................................................................... 396 7.4 operation of instruction pipelines ..................................................................................... 396 7.4.1 data transfer instructions .................................................................................... 407 7.4.2 arithmetic instructions ......................................................................................... 410 7.4.3 logic operation instructions ................................................................................ 456 7.4.4 shift instructions................................................................................................... 459 7.4.5 branch instructions ............................................................................................... 460 7.4.6 system control instructions.................................................................................. 463 7.4.7 exception processing............................................................................................ 473 appendix a cpu instructions .......................................................................................... 475 a.1 cpu instructions............................................................................................................ .... 475 1 section 1 features 1.1 sh-1 and sh-2 features the sh-1 and sh-2 cpu have risc-type instruction sets. basic instructions are executed in one clock cycle, which dramatically improves instruction execution speed. the cpu also has an internal 32-bit architecture for enhanced data processing ability. table 1.1 lists the sh-1 and sh-2 cpu features. table 1.1 sh-1 and sh-2 cpu features item feature architecture original hitachi architecture 32-bit internal data bus general-register machine sixteen 32-bit general registers three 32-bit control registers four 32-bit system registers instruction set instruction length: 16-bit fixed length for improved code efficiency load-store architecture (basic arithmetic and logic operations are executed between registers) delayed branch system used for reduced pipeline disruption instruction set optimized for c language instruction execution time one instruction/cycle for basic instructions address space architecture makes 4 gbytes available on-chip multiplier (sh-1 cpu) multiplication operations (16 bits 16 bits ? 32 bits) executed in 1 to 3 cycles, and multiplication/accumulation operations (16 bits 16 bits + 42 bits ? 42 bits) executed in 3/(2)* cycles on-chip multiplier (sh-2 cpu) multiplication operations executed in 1 to 2 cycles (16 bits 16 bits ? 32 bits) or 2 to 4 cycles (32 bits 32 bits ? 64 bits), and multiplication/accumulation operations executed in 3/(2)*cycles (16 bits 16 bits + 64 bits ? 64 bits) or 3/(2 to 4)* cycles (32 bits 32 bits + 64 bits ? 64 bits) pipeline five-stage pipeline processing states reset state exception processing state program execution state power-down state bus release state power-down states sleep mode standby mode note: the normal minimum number of execution cycles (the number in parentheses in the mumber in contention with preceding/following instructions). 2 1.2 sh-dsp features the sh-dsp is a 32-bit microcontroller based on the hitachi superh risc engine (abbreviated below as ?uperh? and incorporating the signal processing performance of a general-use digital signal processor (dsp). the superh already supported some dsp type instructions, such as multiply and accumulate. in the sh-dsp, the dsp functions have been enhanced, and full dsp data bus have been implemented. the sh-dsp is backward compatible at the object code level with the sh-1 and sh-2 cpus. the superh only has 16-bit instructions. the sh-dsp basically has the same 16-bit instructions, but it also has additional 32-bit dsp instructions that it uses for parallel processing of dsp type instructions. the superh uses a standard neumann architecture, but the sh-dsp has the dsp data bus of the expanded harvard architecture. table 1-2 lists the added features of the sh-dsp. 3 table 1.2 features of sh-dsp series microprocessor cpus feature description dsp unit 1 cycle multiplier 16 bits 16 bits ? 32 bits (fixed decimal point) arithmetic logic unit (alu) barrel shifter dsp registers msb detection dsp registers two 40-bit data registers six 32-bit data registers dsp status register (dsr) modulo register (mod, 32 bits) added to control registers repeat counter (rc) added to status registers (sr) repeat start register (rs) and repeat end register (re) added to control registers dsp data bus expanded harvard architecture simultaneous access of two data bus and one instruction bus parallel processing maximum of four parallel processes (alu operation, multiplication, and two loads or stores) address operator two address operators address operations for accessing two memories dsp data addressing modes increment, decrement and index increment, decrement and index can have modulo addressing or not repeat control zero-overhead repeat control (loop) instruction set 16 or 32 bits ? 16 bits (for load or store only) ? 32 bits (including for alu operations and multiplication) superh microprocessor instructions added for accessing dsp registers. pipeline five-stage pipeline fifth stage is both the wb stage and the dsp stage. 4 5 section 2 register configuration the register set of the sh-1 and sh-2 consists of sixteen 32-bit general registers, three 32-bit control registers and four 32-bit system registers. the sh-dsp maintains upward compatibility with the sh-1 and sh-2 microprocessors on the object code level. to this end, it has the same registers as the superh microprocessors, with the addition of several other registers. three control registers have been added: the repeat start register (rs), the repeat end register (re), and the modulo register (mod). six other registers have also been added: the dsp status register (dsr), which is a system register, and eight dsp data registers (a0, a1, x0, x1, y0, y1, m0, and m1). the general registers are used the same as in the sh-1 and sh-2 when superh type instructions are involved. with dsp type instructions, however, they are used as address registers and index registers for accessing memory. 2.1 general registers there are 16 general registers (rn) numbered r0?15, which are 32 bits in length (figure 2.1). general registers are used for data processing and address calculation. r0 is also used as an index register. several instructions use r0 as a fixed source or destination register. r15 is used as the hardware stack pointer (sp). saving and recovering the status register (sr) and program counter (pc) in exception processing is accomplished by referencing the stack using r15. 6 r0 r1 r2 r3 r4 r5 r6 r7 r8 r9 r10 r11 r12 r13 r14 r15, sp 31 0 r0 functions as an index register in the indirect indexed register addressing mode and indirect indexed gbr addressing mode. in some instructions, r0 functions as a fixed source register or destination register. r15 functions as a hardware stack pointer (sp) during exception processing. 1. * 1 (hardware stack pointer) 2. * 2 figure 2.1 general registers (sh-1 and sh-2) with dsp type instructions, eight of the 16 general registers are used in addressing the x and y data memory and the data memory that uses the i bus (single data). to access x memory, r4 and r5 are used as the x address register [ax] and r8 is used as the x index register [ix]. to access the y memory, r6 and r7 are used as the y address register [ay] and r9 is used as the y index register [iy]. to access single data using the i bus, r2, r3, r4, and r5 are used as the single data address register and r8 as the single data index register [is]. dsp type instructions can simultaneously access x and y memory. there are two groups of address pointers for specifying the x and y data memory addresses. figure 2.2 shows the general registers. 7 r0* 1 r1 r2, [as]* 2 r3, [as]* 2 r4, [as, ax]* 2 r5, [as, ax]* 2 r6, [ay]* 2 r7, [ay]* 2 r8, [ix, is]* 2 r9, [iy]* 2 r10 r11 r12 r13 r14 r15, sp * 3 0 31 r0 functions as an index register in the indirect indexed register addressing mode and indirect indexed gbr addressing mode. in some instructions, r0 functions as a source register or destination register. used as memory address register and memory index register with dsp instructions. r15 functions as a hardware stack pointer (sp) during exception processing. notes: 1. 2. 3. figure 2.2 organization of general registers (sh-dsp) the symbols r2?9 are used by the assembler. to change a name to something that indicates the role of the register for dsp instructions, use an alias. the assembler writes as follows: ix: .reg (r8) the name ix becomes the alias r8. aliases are also assigned as follows: ax0: .reg (r4) ax1: .reg (r5) ix: .reg (r8) ay0: .reg (r6) ay1: .reg (r7) iy: .reg (r9) as0: .reg (r4); defined when an alias is needed for a single data transfer. as1: .reg (r5); defined when an alias is needed for a single data transfer. as2: .reg (r2); defined when an alias is needed for a single data transfer. 8 as3: .reg (r3); defined when an alias is needed for a single data transfer. is: .reg (r8); defined when an alias is needed for a single data transfer. 2.2 control registers the 32-bit control registers consist of the 32-bit status register (sr), global base register (gbr), and vector base register (vbr) (figure 2.3). the status register indicates processing states. the global base register functions as a base address for the indirect gbr addressing mode to transfer data to the registers of on-chip peripheral modules. the vector base register functions as the base address of the exception processing vector area (including interrupts). 9876543210 mqi3 i2 i1 i0 st 0 0 31 31 gbr vbr sr 31 s bit: used by the multiply/accumulate instruction. reserved bits: always reads as 0, and should always be written with 0. bits i3?0: interrupt mask bits. m and q bits: used by the div0u/s and div1 instructions. global base register (gbr): indicates the base address of the indirect gbr addressing mode. the indirect gbr addressing mode is used in data transfer for on-chip peripheral module register areas and in logic operations. vector base register (vbr): indicates the base address of the exception processing vector area. sr: status register t bit: the movt, cmp/cond, tas, tst, bt (bt/s), bf (bf/s), sett, and clrt instructions use the t bit to indicate true (1) or false (0). the addv/c, subv/c, div0u/s, div1, negc, shar/l, shlr/l, rotr/l, and rotcr/l instructions also use bit t to indicate carry/borrow or overflow/ underflow figure 2.3 control registers (sh-1 and sh-2) the sh-sdp additionally has a repeat start (rs) register, a repeat end (re) register, and a modulo (mod) register. 9 the rs and re registers are used to control program repetition (loops). the number of iterations is specified in the sr register? repeat counter (rc), the repeat start address is specified in the rs register, and the repeat end address is specified in the re register. the address values stored in the rs and re registers are not always the same as the physical starting address and ending address of the repeat. the mod register uses modulo addressing to buffer the repeat data. modulo addressing is specified by dmx or dmy, the modulo end address (me) is specified in the top 16 bits of the mod register, and the modulo start address (ms) is specified in the bottom 16 bits. the dmx and dmy bits cannot simultaneously specify modulo addressing. modulo addressing can be used for x and y data transfers (movx and movy). it cannot be used in single data transfers (movs). figure 2.4 shows the control registers. table 2.1 shows the bits of the sr register. st i3 i2 i1 i0 q m dmx dmy rc rf1 rf0 31 28 27 16 15 12 11 10 9 8 7 4 3 2 1 0 status register (sr) repeat start register (rs) repeat end register (re) modulo register (mod) me: modulo end address ms: modulo start address 31 31 31 0 0 0 16 15 rs re me ms figure 2.4 organization of the control registers (sh-dsp) 10 table 2.1 sr register bits bits name function 27?6 repeat counter (rc) specifies the number of iterations for repeat (loop) control (2 to 4095) 11 specification of modulo addressing for y pointer (dmy) 1: modulo addressing mode becomes valid for the y memory address register ay (r6, r7) 10 specification of modulo addressing for x pointer (dmx) 1: modulo addressing mode becomes valid for the x memory address register ax (r4, r5) 9 bit m used by the div0s/u and div1 instructions 8 bit q 7? interrupt request mask (imask) indicate the level of interrupt request accepted (0-15) 3? repeat flag (rf1, rf0) used to control zero-overhead repeating (loop) 00: 1 step repeat 01: 2 step repeat 11: 3 step repeat 10: repeat of 4 or more steps 1 saturation operation bit (s) used by mac and dsp instructions 1: specifies saturation operation (prevents overflows) 0 bit t for movt, cmp/cond, tas, tst, bt, bf, sett, clrt, and dt instructions: 0: false 1: true for addv/c, subv/c, div0u/s, div1, negc, shar/l, shlr/l, rotr/l and rotcr/l instructions: 1: indicates a carry, borrow, overflow or underflow 31?8, 15?2 reserved 0: always reads 0; always write 0. dedicated load and store instructions are used to access the rs, re, and mod registers. for example, to access the rs register, do the following: ldc rm, rs; rm ? rs ldc.l @rm+, rs; (rm) ? rs, rm+4 ? rm stc rs, rn; rs ? rn stc.l rs, @-rn; rn-4 ? rn, rs ? (rn) 11 the following instructions set addresses in the rs, re registers for zero overhead repeat control: ldrs @(disp, pc); disp 2 + pc ? rs ldre @(disp, pc); disp 2 + pc ? re the gbr and vbr registers are the same as the previous superh registers. four control bits (dmx, dmy, rf1, and rf0 bits) and an rc counter have been added to the sr register. the rs, re, and mod registers are new registers. 2.3 system registers system registers consist of four 32-bit registers: high and low multiply and accumulate registers (mach and macl), the procedure register (pr), and the program counter (pc). the multiply and accumulate registers store the results of multiply and multiply and accumulate operations. the procedure register stores the return address from the subroutine procedure. the program counter indicates the address of the program executing and controls the flow of the processing. the pc counter points to four bytes ahead of the instruction currently executing. these registers are the same as the superh microprocessor registers. macl pr pc mach 31 0 9 0 0 31 31 multiply and accumulate register high (mach) multiply and accumulate register low (macl) these are the registers for storing the results of multiply and accumulate operations. on the sh-2 cpu, mach has 32 valid bits. on the sh-1 cpu, only the lower 10 bits of mach are valid, and data is sign extended to 32 bits when read. procedure register (pr) this register is used to store the return destination addresses for subroutine procedures. program counter (pc) the pc indicates the next four bytes (two instructions) following the instruction currently being executed. note: these are used only when executing an instruction that was supported by sh-1 and sh-2. they are not used for multiplication instructions newly added for the sh-dsp (pmuls). figure 2.5 organization of the system registers 12 in addition, the sh-dsp also uses as its system registers the dsp status register (dsr) and five of the eight data registers (a0, x0, x1, y0, y1), which are all registers of the dsp unit and will be described later (dsp registers). the a0 register is a 40-bit register, but the guard bit section (a0g) is ignored in data read from a0. when data is input to the a0 register, the msb of the data is copied to the guard bit section (a0g). 2.4 dsp registers the dsp unit has nine dsp registers, divided into eight data registers and one control register. the dsp data registers include two 40-bit registers (a0 and a1) and six 32-bit registers (m0, m1, x0, x1, y0, and y1). the a1 and a0 registers each has eight guard bits, a0g and a1g. the dsp data registers are used in transferring and processing dsp data as the operand for the dsp instruction. there are three types of instructions that access the dsp data registers: dsp data processing, x data processing, and y data processing. the 32-bit dsp status register (dsr) is the control register, which indicates the results of operations. the dsr register has bits to display the results of the operation, which include a signed greater than bit (gt), a zero value bit (z), a negative value bit (n), an overflow bit (v), a dsp condition bit (dc), and condition select bits, which control the dc bit settings (cs). the dc bit is one of the status flags; it is very similar to the superh cpu core? t?it. in the case of conditional dsp type instructions, the execution of dsp data processing is controlled in accordance with the dc bit. this control is related to dsp unit execution only, and only the dsp registers are updated. it is not related to the execution instructions of the superh microprocessor? cpu core, such as address calculation and load/store instructions. the control bits cs (bits 0 to 2) specify the condition that the dc bits set. dsp instructions include both unconditional dsp instructions and conditioned dsp instructions. data processing of unconditional dsp instructions updates the condition bits and dc bits, except for the pmuls, pwad, pwsb, movx, movy, and movs instructions. conditional dsp type instructions are executed in accordance with the status of the dc bit. dsr registers are not updated, regardless of whether these instructions are executed or not. note that five registers, a0, x0, x1, y0, and y1, can also be used as system registers. figure 2.6 shows the dsp registers. table 2.2 lists the dsr register bit functions. 13 39 32 31 0 a0g a1g a0 a1 m0 m1 x0 x1 y0 y1 dsp data registers dsp status register (dsr) gt z n v cs[2:0] dc 876543210 31 figure 2.6 organization of the dsp registers table 2.2 dsr register bits bits name function 31? reserved 0: always reads 0. always write 0. 7 signed greater than bit (gt) indicates whether the operation result is positive (and nonzero) or whether operand 1 is larger than operand 2. 1: operation result is positive or operand 1 is larger. 6 zero value bit (z) indicates whether the operation result is zero or whether of operands 1 and 2 are the same. 1: operation result is zero or operands 1 and 2 are the same. 5 negative value bit (n) indicates whether the operation result is negative or whether operand 1 is smaller than operand 2. 1: operation result is negative or operand 1 is smaller. 4 overflow bit (v) indicates that the operation result overflowed. 1: operation result overflowed. 3? condition select bits (cs) specifies the mode for selecting the status of the operation result set in the dc bit. do not specify 110 or 111. 000: carry/borrow mode 001: negative value mode 010: zero value mode 011: overflow mode 100: signed greater than mode 101: signed equal or greater than mode 0 dsp condition bit (dc) sets the operation result status in the mode specified by the cs bits. 0: specified mode status not achieved 1: specified mode status achieved. 14 cpu core instructions use the a0, x0, x1, y0, y1, and dsr registers as a system registers. 2.5 precautions for handling of guard bit and overflow data operation in the dsp unit is basically executed in 32 bits. actual operation, however, is made in 40-bit length including 8 guard bits. when the guard bits are inconsistent with the value of msb of 32 bits, the operation result is handled as overflow. in this case, the n bit indicates the correct condition of the operation result whether overflow has occurred or not. this is also the same when the destination operand is a register of 32 bits in length. each status flag is updated always assuming guard bits of 8 bits. if line overflow occurs so that the result is not correctly indicated even though the guard bits are used, the n flag cannot show the correct condition. refer to section 8.1, alu fixed decimal point operation, dc bit, for details. 2.6 initial values of registers table 2.3 lists the values of the registers after reset. table 2.3 initial values of registers classification register initial value general registers r0?14 undefined r15 (sp) value of the stack pointer in the vector address table control registers sr bits i3 to i0 are 1111(h'f), reserved bits are 0, and other bits are undefined rc, dmy, dmx, rf1, and rf0 are 0 (additional bits on sh-dsp) rs undefined re gbr undefined vbr h'00000000 mod undefined system registers mach, macl, pr undefined pc value of the program counter in the vector address table dsp registers a0, a0g, a1, a1g, m0, m1, x0, x1, y0, y1 undefined dsr h'00000000 15 section 3 data formats 3.1 data format in registers register operands are always longwords (32 bits). when data in memory is loaded to a register and the memory operand is only a byte (8 bits) or a word (16 bits), it is sign-extended into a longword when stored into a register. 31 0 longword figure 3.1 data format in registers 3.2 data format in memory memory data formats are classified into bytes, words, and longwords. byte data can be accessed from any address, but an address error will occur if you try to access word data starting from an address other than 2n or longword data starting from an address other than 4n. in such cases, the data accessed cannot be guaranteed. the hardware stack area, which is referred to by the hardware stack pointer (sp, r15), uses only longword data starting from address 4n because this area stores the program counter (pc) and status register (sr). see the hardware manual for more information on address errors. 31 0 15 23 7 byte byte byte byte word word address 2n address 4n longword address m address m + 2 address m + 1 address m + 3 figure 3.2 data format in memory (big endian) byte data is arranged as shown below for products with a built-in little endian function. to determine whether a specific product supports little endian operation, refer to the corresponding hardware manual. 16 31 0 15 23 7 byte byte byte byte word word address 2n address 4n longword address m + 3 address m + 1 address m + 2 address m figure 3.3 data format in memory (little endian) 3.3 immediate data format byte immediate data is located in an instruction code. immediate data accessed by the mov, add, and cmp/eq instructions is sign-extended and is handled in registers as longword data. immediate data accessed by the tst, and, or, and xor instructions is zero-extended and is handled as longword data. consequently, and instructions with immediate data always clear the upper 24 bits of the destination register. word or longword immediate data is not located in the instruction code but rather is stored in a memory table. the memory table is accessed by a immediate data transfer instruction (mov) using the pc relative addressing mode with displacement. specific examples are given in section 7, cpu core instruction features, instruction 8, and table 7.4. 3.4 dsp type data formats the sh-dsp uses three different data formats for instructions: the fixed decimal point data format, the integer data format, and the logical data format. the dsp type of fixed decimal point data format places a binary decimal point between bits 31 and 30. this data format can have guard bits, no guard bits, or be multiplication input. the valid bit lengths and values displayed vary for each. dsp type integer data formats place a binary decimal point between bits 16 and 15. this data format can have guard bits, no guard bits, or be a shift amount. the valid bit lengths and values displayed vary for each. the shift amount for arithmetic shift (psha) is a seven-bit area between ?4 and +63, although only values between ?2 and +32 are valid. the shift amount for logical shifts is a six bit area, although, in the same fashion, only values between ?6 and +16 are valid. the dsp type logical data format has no decimal point. the data format and valid data length vary with the instruction and dsp register. 17 figure 3.4 shows the three dsp data formats and the position of the two binary decimal points, as well as the superh data format (as reference). s s s s s s s s (16 bits) dsp logical data superh integer (word) (reference) dsp integer data dsp fixed decimal point data with guard bits no guard bits multiplication input with guard bits no guard bits arithmetic shift (psha) logical shift (pshl) 39 39 39 39 32 32 31 31 31 31 31 31 31 31 31 22 21 0 0 0 0 0 0 0 0 0 ? 8 to +2 8 ?2 ?1 ? to +1 ?2 ?1 ? to +1 ?2 ?5 ? 23 to +2 23 ? ? 15 to +2 15 ? ?2 to +32 ?6 to +16 ? 31 to +2 31 ? 16 15 16 16 16 16 16 15 15 15 15 15 s: sign bit : binary decimal point : unrelated to processing (ignored) 30 30 30 figure 3.4 dsp data formats 18 3.5 dsp instructions and data formats the data format and valid data length varies with the instruction and dsp register. instructions that access the dsp data register fall into three categories: dsp data processing, x and y data transfer processing, and single data transfer processing. 3.5.1 dsp data processing when the a0 or a1 register is used as the source register in dsp fixed decimal point data processing, the guard bits (32?9) are enabled. when any other register is used as the source register (m0, m1, x0, x1, y0, or y1), the register data? sign-extended portion goes to bits 32?9. when the a0 or a1 register is used as the destination register, the guard bits (32?9) are enabled. when any other register is used as the destination register, the resulting data? bits 32?9 are ignored. dsp integer data processing is the same as dsp fixed decimal point data processing. the bottom word (the bottom 16 bits, or bits 0?5) of the source register, however, is ignored. the bottom word of the destination register is cleared with zeroes. the top word (top 16 bits, or bits 16?1) of the source register for dsp logical data processing is enabled. the bottom word and the guard bits of registers a0 and a1 are ignored. the top word of the destination register is enabled. the bottom word and the guard bits of registers a0 and a1 are cleared with zeroes. 3.5.2 x and y data transfers the movx.w and movy.w instructions access the x and y memory through the 16-bit x and y data buses. the part of data loaded to a register or stored from a register is the top word (bits 16?1). the bottom word is cleared with zeroes. 3.5.3 single data transfers the movs.w and movs.l instructions can access any memory through the instruction data bus (idb). all dsp registers are connected to the idb bus, which can serve as either the source and destination register during a data transfer. there are two data transfer modes: word and longword. in word mode, data is loaded to the top word of the dsp register or stored from the top word, except for the a0g and a1g registers. in longword mode, data is loaded to the 32 bits of the dsp register or stored from the 32 bits, except for the a0g and a1g registers. in single data transfers, the a0g and a1g registers can be handled as independent registers. eight bits of data can be loaded to or stored from the a0g and a1g registers. 19 when the a0g or a1g register is the source register, only eight bits are stored from the register. the top bits are sign extended. when the a0g or a1g register is the destination register, the bottom eight bits are loaded to the register. the a0 and a1 registers are not cleared with zeros, so the values are preserved. tables 3.1 and 3.2 list the data formats on the register with the dsp instructions. with some instructions, not all registers can be accessed. for example, the pmuls instruction can specified the a1 register as the source register, but not the a0 register. for more information, see the description of the instruction. figure 3.5 shows the relationship between the dsp registers and buses during data transfers. table 3.1 data format of dsp instruction source register guard bits register bits register instruction 39?2 31?6 15? a0, a1 dsp operation fixed decimal, pdmsb, psha 40 bit data integer 24 bit data logic, pshl, pmuls 16 bit data data transfer movx.w, movy.w, movs.w 16 bit data movs.l 32 bit data a0g, a1g data movs.w data transfer movs.l data x0, x1, y0, y1, m0, m1 dsp operation fixed decimal, pdmsb, psha sign* 32 bit data integer 16 bit data logic, pshl, pmuls 16 bit data data movs.w 16 bit data transfer movs.l 32 bit data note: the sign is extended and stored in the alu? guard bits. 20 table 3.2 data format of dsp instruction destination register guard bits register bits register instruction 39?2 31?6 15? a0, a1 dsp operation fixed decimal, psha, pmuls (sign extend) 40 bit result integer, pdmsb (sign extend) 24 bit result clear to 0 logic, pshl clear to 0 16 bit result clear to 0 data transfer movs.w sign extend 16 bit data clear to 0 movs.l sign extend 32 bit data a0g, a1g data transfer movs.w data not updated movs.l data not updated x0, x1, y0, y1, m0, m1 dsp operation fixed decimal, psha, pmuls 32 bit result integer, logic, pdmsb, pshl 16 bit result clear to 0 data transfer movx.w, movy.w, movs.w 16 bit data clear to 0 movs.l 32 bit data 21 39 32 31 0 a0g a1g a0 a1 m0 m1 x0 x1 y0 y1 dsr main bus xdb ydb movs.w, movs.l 32 bits 16 bits 16 bits 16 bits 32 bits movx.w, movy.w movs.w, movs.l 16 8 bits [7:0] 70 figure 3.5 relationship between dsp registers and buses during data transfer 22 23 section 4 instruction features 4.1 risc-type instruction set all instructions are risc type. their features are detailed in this section. 16-bit fixed length: all instructions are 16 bits long, increasing program coding efficiency. one instruction/cycle: basic instructions can be executed in one cycle using the pipeline system. instructions are executed in 50 ns at 20 mhz, in 35 ns at 28.7mhz. data length: longword is the standard data length for all operations. memory can be accessed in bytes, words, or longwords. byte or word data accessed from memory is sign-extended and calculated with longword data (table 4.1). immediate data is sign-extended for arithmetic operations or zero-extended for logic operations. it also is calculated with longword data. table 4.1 sign extension of word data sh-1/sh-2/sh-dsp cpu description example for other cpu mov.w @(disp,pc),r1 add r1,r0 ......... .data.w h'1234 data is sign-extended to 32 bits, and r1 becomes h'00001234. it is next operated upon by an add instruction. add.w #h'1234,r0 note: the address of the immediate data is accessed by @(disp, pc). load-store architecture: basic operations are executed between registers. for operations that involve memory access, data is loaded to the registers and executed (load-store architecture). instructions such as and that manipulate bits, however, are executed directly in memory. delayed branch instructions: unconditional branch instructions are delayed. pipeline disruption during branching is reduced by first executing the instruction that follows the branch instruction, and then branching (table 4.2). with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 24 table 4.2 delayed branch instructions sh-1/sh-2/sh-dsp cpu description example for other cpu bra trget add r1,r0 executes an add before branching to trget. add.w r1,r0 bra trget multiplication/accumulation operation: sh-1 cpu: 16bit 16bit ? 32-bit multiplication operations are executed in one to three cycles. 16bit 16bit + 42bit ? 42-bit multiplication/accumulation operations are executed in two to three cycles. sh-2/sh-dsp cpu: 16bit 16bit ? 32-bit multiplication operations are executed in one to two cycles. 16bit 16bit + 64bit ? 64-bit multiplication/accumulation operations are executed in two to three cycles. 32bit 32bit ? 64-bit multiplication and 32bit 32bit + 64bit ? 64-bit multiplication/accumulation operations are executed in two to four cycles. t bit: the t bit in the status register changes according to the result of the comparison, and in turn is the condition (true/false) that determines if the program will branch (table 4.3). the number of instructions after t bit in the status register is kept to a minimum to improve the processing speed. table 4.3 t bit sh-1/sh-2/sh-dsp cpu description example for other cpu cmp/ge r1,r0 bt trget0 bf trget1 t bit is set when r0 3 r1. the program branches to trget0. when r0 3 r1 and to trget1. when r0??1. cmp.w r1,r0 bge trget0 blt trget1 add #?,r0 cmp/eq #0,r0 bt trget t bit is not changed by add. t bit is set when r0 = 0. the program branches if r0 = 0. sub.w #1,r0 beq trget immediate data: byte immediate data is located in instruction code. word or longword immediate data is not input via instruction codes but is stored in a memory table. the memory table is accessed by an immediate data transfer instruction (mov) using the pc relative addressing mode with displacement (table 4.4). 25 table 4.4 immediate data accessing classification sh-1/sh-2/sh-dsp cpu example for other cpu 8-bit immediate mov #h'12,r0 mov.b #h'12,r0 16-bit immediate mov.w @(disp,pc),r0 ................. .data.w h'1234 mov.w #h'1234,r0 32-bit immediate mov.l @(disp,pc),r0 ................. .data.l h'12345678 mov.l #h'12345678,r0 note: the address of the immediate data is accessed by @(disp, pc). absolute address: when data is accessed by absolute address, the value already in the absolute address is placed in the memory table. loading the immediate data when the instruction is executed transfers that value to the register and the data is accessed in the indirect register addressing mode. table 4.5 absolute address classification sh-1/sh-2/sh-dsp cpu example for other cpu absolute address mov.l @(disp,pc),r1 mov.b @r1,r0 .................. .data.l h'12345678 mov.b @h'12345678,r0 16-bit/32-bit displacement: when data is accessed by 16-bit or 32-bit displacement, the pre- existing displacement value is placed in the memory table. loading the immediate data when the instruction is executed transfers that value to the register and the data is accessed in the indirect indexed register addressing mode. table 4.6 displacement accessing classification sh-1/sh-2/sh-dsp cpu example for other cpu 16-bit displacement mov.w @(disp,pc),r0 mov.w @(r0,r1),r2 .................. .data.w h'1234 mov.w @(h'1234,r1),r2 26 4.2 addressing modes addressing modes effective address calculation by the cpu core are described below. table 4.7 addressing modes and effective addresses addressing mode instruction format effective addresses calculation formula direct register addressing rn the effective address is register rn. (the operand is the contents of register rn.) indirect register addressing @rn the effective address is the content of register rn. rn rn rn post- increment indirect register addressing @rn + the effective address is the content of register rn. a constant is added to the content of rn after the instruction is executed. 1 is added for a byte operation, 2 for a word operation, or 4 for a longword operation. rn rn 1/2/4 + rn + 1/2/4 rn (after the instruction is executed) byte: rn + 1 ? rn word: rn + 2 ? rn longword: rn + 4 ? rn pre- decrement indirect register addressing @?n the effective address is the value obtained by subtracting a constant from rn. 1 is subtracted for a byte operation, 2 for a word operation, or 4 for a longword operation. rn 1/2/4 rn ?1/2/4 rn ?1/2/4 byte: rn ?1 ? rn word: rn ?2 ? rn longword: rn ?4 ? rn (instruction executed with rn after calculation) 27 table 4.7 addressing modes and effective addresses (cont) addressing mode instruction format effective addresses calculation formula indirect register addressing with displace- ment @(disp:4, rn) the effective address is rn plus a 4-bit displacement (disp). the value of disp is zero-extended, and remains the same for a byte operation, is doubled for a word operation, or is quadrupled for a longword operation. rn 1/2/4 rn + disp 1/2/4 + disp (zero-extended) byte: rn + disp word: rn + disp 2 longword: rn + disp 4 indirect indexed register addressing @(r0, rn) the effective address is the rn value plus r0. rn r0 rn + r0 + rn + r0 indirect gbr addressing with displace- ment @(disp:8, gbr) the effective address is the gbr value plus an 8-bit displacement (disp). the value of disp is zero- extended, and remains the same for a byte operation, is doubled for a word operation, or is quadrupled for a longword operation. gbr 1/2/4 gbr + disp 1/2/4 + disp (zero-extended) byte: gbr + disp word: gbr + disp 2 longword: gbr + disp 4 indirect indexed gbr addressing @(r0, gbr) the effective address is the gbr value plus r0. gbr r0 gbr + r0 + gbr + r0 28 table 4.7 addressing modes and effective addresses (cont) addressing mode instruction format effective addresses calculation formula pc relative addressing with displace- ment @(disp:8, pc) the effective address is the pc value plus an 8-bit displacement (disp). the value of disp is zero- extended, and disp is doubled for a word operation, or is quadrupled for a longword operation. for a longword operation, the lowest two bits of the pc are masked. pc h'fffffffc pc + disp 2 or pc&h'fffffffc + disp 4 + 2/4 x & (for longword) disp (zero-extended) word: pc + disp 2 longword: pc & h'fffffffc + disp 4 pc relative addressing disp:8 the effective address is the pc value sign-extended with an 8-bit displacement (disp), doubled, and added to the pc. pc 2 + disp (sign-extended) pc + disp 2 pc + disp 2 disp:12 the effective address is the pc value sign-extended with a 12-bit displacement (disp), doubled, and added to the pc. pc 2 + disp (sign-extended) pc + disp 2 pc + disp 2 29 table 4.7 addressing modes and effective addresses (cont) addressing mode instruction format effective addresses calculation formula pc relative addressing (cont) rn* the effective address is the register pc plus rn. pc r0 pc + r0 + pc + rn immediate addressing #imm:8 the 8-bit immediate data (imm) for the tst, and, or, and xor instructions are zero-extended. #imm:8 the 8-bit immediate data (imm) for the mov, add, and cmp/eq instructions are sign-extended. #imm:8 immediate data (imm) for the trapa instruction is zero-extended and is quadrupled. note: applies to the sh-2 and sh-dsp. this addressing mode is not supported by the sh-1. 4.3 instruction format the instruction format table, table 4.8, refers to the source operand and the destination operand. the meaning of the operand depends on the instruction code. the symbols are used as follows: xxxx: instruction code mmmm: source register nnnn: destination register iiii: immediate data dddd: displacement table 4.8 instruction formats instruction formats source operand destination operand example 0 format xxxx xxxx xxxx xxxx 15 0 nop n format nnnn: direct register movt rn xxxx xxxx xxxx nnnn 15 0 control register or system register nnnn: direct register sts mach,rn 30 table 4.8 instruction formats (cont) instruction formats source operand destination operand example n format (cont) control register or system register nnnn: indirect pre-decrement register stc.l sr,@-rn m format mmmm: direct register control register or system register ldc rm,sr xxxx mmmm xxxx xxxx 15 0 mmmm: indirect post-increment register control register or system register ldc.l @rm+,sr mmmm: direct register jmp @rm mmmm: pc relative using rm* braf rm nm format mmmm: direct register nnnn: direct register add rm,rn nnnn xxxx xxxx 15 0 mmmm mmmm: direct register nnnn: indirect register mov.l rm,@rn mmmm: indirect post-increment register (multiply/ accumulate) nnnn*: indirect post-increment register (multiply/ accumulate) mach, macl mac.w @rm+,@rn+ mmmm: indirect post-increment register nnnn: direct register mov.l @rm+,rn mmmm: direct register nnnn: indirect pre-decrement register mov.l rm,@-rn mmmm: direct register nnnn: indirect indexed register mov.l rm,@(r0,rn) md format xxxx dddd 15 0 mmmm xxxx mmmmdddd: indirect register with displacement r0 (direct register) mov.b @(disp,rm),r0 nd4 format dddd nnnn xxxx 15 0 xxxx r0 (direct register) nnnndddd: indirect register with displacement mov.b r0,@(disp,rn) note: in multiply/accumulate instructions, nnnn is the source register. 31 table 4.8 instruction formats (cont) instruction formats source operand destination operand example nmd format nnnn xxxx dddd 15 0 mmmm mmmm: direct register nnnndddd: indirect register with displacement mov.l rm,@(disp,rn) mmmmdddd: indirect register with displacement nnnn: direct register mov.l @(disp,rm),rn d format dddd xxxx 15 0 xxxx dddd dddddddd: indirect gbr with displacement r0 (direct register) mov.l @(disp,gbr),r0 r0(direct register) dddddddd: indirect gbr with displacement mov.l r0,@(disp,gbr) dddddddd: pc relative with displacement r0 (direct register) mova @(disp,pc),r0 dddddddd: pc relative bf label d12 format dddd xxxx 15 0 dddd dddd dddddddddddd: pc relative bra label (label = disp + pc) nd8 format dddd nnnn xxxx 15 0 dddd dddddddd: pc relative with displacement nnnn: direct register mov.l @(disp,pc),rn i format iiiiiiii: immediate indirect indexed gbr and.b #imm,@(r0,gbr) i i i i xxxx 15 0 xxxx i i i i iiiiiiii: immediate r0 (direct register) and #imm,r0 iiiiiiii: immediate trapa #imm ni format nnnn i i i i xxxx 15 0 i i i i iiiiiiii: immediate nnnn: direct register add #imm,rn note: applies to the sh-2 and sh-dsp. the braf instruction is not supported by the sh-1. 32 4.4 dsp dsp operations and data transfers are listed below: alu fixed decimal point operations: these are fixed decimal point operations with either 40- bit (with guard bits) or 32-bit (with no guard bits) fixed decimal point data. these include addition, subtraction, and comparison instructions. alu integer operations: these are integer arithmetic operations with either 24-bit (with guard bits) or 16-bit (with no guard bits) integer data. they include increment and decrement instructions. alu logical operations: these are logical operations with 16-bit logical data. they include and, or, and exclusive or. fixed decimal point multiplication: this is fixed decimal point multiplication (arithmetic operation) of the top 16 bits of fixed decimal point data. condition bits such as the dc bit are not updated. shift operations: these are arithmetic and logical shift operations. arithmetic shift operations are arithmetic shifts of 40 bits (with guard bits) or 32 bits (with no guard bits) of fixed decimal point data. logical shift operations are logical operations on 16 bits of logical data. the amount of the arithmetic shift operation is ?2 to +32 (negative for right shifts, positive for left shifts); for logical shifts, the amount is ?6 to +16. msb detection instruction: this operation finds the amount of the shift to normalize the data. it finds the position of the msb bit in either 40-bit (with guard bits) or 32-bit (with no guard bits) fixed decimal point data as either 24 bits (with guard bits) or 16 bits (with no guard bits) integer data. rounding operation: rounds 40-bit fixed decimal point data (with guard bits) to 24 bits or 32- bit (with no guard bits) fixed decimal point data to 16 bits. data transfers: data transfers consist of x and y data transfers, which load or store 16-bit data to and from x and y memory, and single data transfers, which load and store 16- or 32-bit data from all memories. two x and y data transfers can be processed in parallel. condition bits such as the dc bit are not updated. the operation instructions include both conditional operation instructions and instructions that are conditionally executed depending on the dc bit. condition bits such as the dc bit are not updated by conditional instructions. their settings vary for arithmetic operations, logical operations, arithmetic shifts, and logical shifts. or msb detection instructions and rounding instructions, set the condition bits like for arithmetic operations. 33 arithmetic operations include overflow preventing instructions (saturation operations). when saturation operation is specified with the s bit in the sr register, the maximum (positive) or minimum (negative) value is stored when the result of operation overflows. 4.5 dsp data addressing the dsp command performs two different types of memory accesses. one uses the x and y data transfer instructions (movx.w and movy.w) while the other uses the single data transfer instructions (movs.w and movs.l). data addressing for these two types of instructions also differs. table 4.10 summarizes the data transfer instructions. table 4.10 summary of data transfer instructions item x and y data transfer processing (movx.w and movy.w) single data transfer processing (movs.w and movs.l) address registers ax: r4, r5; ay: r6, r7 as: r2, r3, r4, r5 index registers ix: r8; iy: r9 is: r8 addressing nop/inc(+2)/index addition: post-increment nop/inc(+2, +4)/index addition: post-increment dec(?, ?): pre-decrement modulo addressing available not available data buses xdb, ydb idb data length 16 bits (word) 16 or 32 bits (word or longword) bus contention none occurs memory x and y data memories all memory spaces source registers da: a0, a1 ds: a0/a1, m0/m1, x0/x1, y0/y1, a0g, a1g destination registers dx: x0/x1; dy: y0/y1 ds: a0/a1, m0/m1, x0/x1, y0/y1, a0g, a1g 4.5.1 x and y data addressing the dsp command allows x and y data memories to be accessed simultaneously using the movx.w and movy.w instructions. dsp instructions have two pointers so they can access the x and y data memories simultaneously. dsp instructions have only pointer addressing; immediate addressing is not available. address registers are divided in two. the r4 and r5 registers become the x memory address register (ax) while the r6 and r7 registers become the y memory address register (ay). the following three types of addressing may be used with x and y data transfer instructions. 34 address registers with no update: the ax and ay registers are address pointers. they are not updated. addition index register addressing: the ax and ay registers are address pointers. the values of the ix and iy registers are added to the ax and ay registers respectively after data transfer (post-increment). increment address register addressing: the ax and ay registers are address pointers. +2 is added to them after data transfer (post-increment). each of the address pointers has an index register. register r8 becomes the index register (ix) for the x memory address register (ax); register r9 becomes the index register (iy) for the y memory address register (ay). x and y data transfer instructions are processed in words. x and y data memory is accessed in 16 bit units. increment processing for that purpose adds two to the address register. to decrement them, set -2 in the index register and specify addition index register addressing. for x and y data addressing, only bits 1 to 15 of the address pointer are valid. when performing x and y data addressing, make sure to write 0 to bit 0 of the address pointer and index register. figure 4.1 shows the x and y data transfer addressing. with using the x or y bus to access x memory or y memory, ax (r4 or r5) and ay (r6 or r7) upper reads [?? words] are ignored. also, the results of xx ay+, xx ay + iv are stored in the lower word of ay, and the previous value of the upper word is retained. alu au* 1 r8[ix] r4[ax] r5[ax] r9[iy] r6[ay] r7[ay] +2 (inc) +2 (inc) +0 (no update) +0 (no update) notes: 1. 2. adder added for dsp processing all three addressing methods (increment, index register addition (ix, iy), and no update) are post-increment methods. to decrement the address pointer, set the index register to ? or ?. figure 4.1 x and y data transfer addressing 35 4.5.2 single data addressing the dsp command has single data transfer instructions (movs.w and movs.l) that load data to dsp registers and store data from dsp registers. with these instructions, the r2?5 registers are used as address registers (as) for single data transfers. there are four types of data addressing for single data transfer instructions. address registers with no update: the as register is the address pointer. it is not updated. addition index register addressing: the as register is the address pointer. the value of the is register is added to the as register after data transfer (post-increment). increment address register addressing: the as register is the address pointer. +2 or +4 is added to it after data transfer (post-increment). decrement address register addressing: the as register is the address pointer. e2 or e4 (or +2 or +4) is added to it before data transfer (pre-decrement). the address pointer uses the r8 register as its index register (is). figure 4.2 shows the single data transfer addressing. alu r8[is] r4[as] r5[as] +2/+4 (inc) +0 (no update) note: there are four addressing methods (no update, index register addition (is), increment, and decrement). index register addition and increment are post-increment methods. decrement is a pre-decrement method. r3[as] r2[as] ?/? (dec) figure 4.2 single data transfer addressing 36 4.5.3 modulo addressing like other dsps, the sh-dsp has a modulo addressing mode. address registers are updated in the same way in this mode. when a modulo end address in which the address pointer value is already set is reached, the address pointer becomes the modulo start address. modulo addressing is only effective for x and y data transfer instructions (movx.w and movy.w). when the dmx bit of the sr register is set, the x address register enters modulo addressing mode; when the dmy bit is set, the y address register enters modulo addressing mode. modulo addressing cannot be used on both x and y address registers at once. accordingly, do not set dmx and dmy at the same time. should they both be set at once, only dmy will be valid. the mod register is provided for specifying the start and end addresses for the modulo address area. the mod register stores the ms (modulo start) and me (modulo end). the following shows how to use the modulo register (ms and me). mov.l modaddr,rn; rn=modend, modstart ldc rn,mod; me=modend, ms=modstart modaddr: .data.w mend; lower 16bit of modend .data.w mstart; lower 16bit of modstart modstart: .data : modend: .data set the start and end addresses in ms and me and then set the dmx or dmy bit to 1. the address register contents are compared to me. if they match me, the start address ms is stored in the address register. the bottom 16 bits of the address register are compared to me. the maximum modulo size is 64 kbytes. this is ample for accessing the x and y data memory. figure 4.3 shows a block diagram of modulo addressing. 37 instruction (movx/movy) dmx cont ms cmp me alu au abx aby r4[ax] r6[ay] r5[ax] r7[ay] r8[ix] r9[iy] dmy 31 0 0 0 0 0 0 16 16 15 15 15 31 31 31 +2 +0 +2 +0 15 15 1 1 xab yab 15 figure 4.3 modulo addressing the following is an example of modulo addressing. ms=h'c008; me=h'c00c; r4=h'c008; dmx=1; dmy=0; (sets modulo addressing for address register ax (r4, r5)) the above setting changes the r4 register as shown below. r4: h'c008 inc. r4: h'c00a inc. r4: h'c00c inc. r4: h'c008 (becomes the modulo start address when the modulo end address is reached) place data so the top 16 bits of the modulo start and end address are the same, since the modulo start address only swaps the bottom 16 bits of the address register. note: when using addition index as the dsp data addressing, the address pointer may exceed this value without matching me. should this occur, the address pointer will not return to the modulo start address. 4.5.4 dsp addressing operation the following shows how dsp addressing works in the execution stage (ex) of a pipeline (including modulo addressing). 38 if ( operation is movx.w movy.w ) { abx=ax; aby=ay /* memory access cycle uses abx and aby. the addresses to be used have not been updated */ /* ax is one of r4,5 */ if ( dmx==0 || dmx==1 @@ dmy==1 )} ax=ax+(+2 or r8[ix} or +0); /* inc,index,not-update */ else if (!not-update) ax=modulo( ax, (+2 or r8[ix]) ); /* ay is one of r6,7 */ if ( dmy==0 ) ay=ay+(+2 or r9[iy] or +0; /* inc,index,not-update */ else if (! not-update) ay=modulo( ay, (+2 or r9[iy]) ); } else if ( operation is movs.w or movs.l ) { if ( addressing is nop, inc, add-index-reg ) { mab=as; /* memory access cycle uses mab. the address to be used has not been updated */ /* as is one of r2? */ as=as+(+2 or +4 or r8[is] or +0); /* inc.index,not-update */ else { /* decrement, pre-update */ /* as is one of r2? */ as=as+(? or ?); mab=as /* memory access cycle uses mab. the address to be used has been updated */ } /* the value to be added to the address register depends on addressing operations. for example, (+2 or r8[ix] or +0) means that +2: if operation is increment r8[ix}: if operation is add-index-reg +0: if operation is not-update /* function modulo ( addrreg, index ) { 39 if ( adrreg[15:0]==me ) adrreg[15:0]==ms; else adrreg=adrreg+index return addrreg; } 4.6 instruction formats for dsp instructions new instructions have been added to the sh-dsp for use in digital signal processing. the new instructions are divided into two groups. double and single data transfer instructions for memory and dsp registers (16 bits) parallel processing instructions processed by the dsp unit (32 bits) figure 4.4 shows their instruction formats. cpu core instructions 0 0 0 0 to 1 1 1 0 double data transfer instructions single data transfer instructions parallel processing instructions b field a field a field a field 1 1 1 1 0 0 1 1 1 1 0 1 1 1 1 1 1 0 15 15 15 15 0 0 0 0 31 10 10 9 9 16 26 25 figure 4.4 instruction formats of dsp instructions 4.6.1 double and single data transfer instructions table 4.11 shows the instruction formats for double data transfer instructions. table 4.12 shows the instruction formats for single data transfer instructions 40 table 4.11 instruction formats for double data transfers category mnemonic 15 14 13 12 11 10 9 8 x memory nopx 11110 0 0 data transfers movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx ax movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix y memory nopy 11110 0 0 data transfers movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy ay movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy table 4.11 instruction formats for double data transfers (cont) category mnemonic 7 6 5 4 3 2 1 0 x memory nopx 0000 data transfers movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx dx 0 0 1 1 1 0 1 movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix da 1 0 1 1 1 0 1 y memory nopy 00 00 data transfers movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy dy 0 0 1 1 1 0 1 movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy da 1 0 1 1 1 0 1 ax: 0=r4, 1=r5 ay: 0=r6, 1=r7 dx: 0=x0, 1=x1 dy: 0=y0, 1=y1 da: 0=a0, 1=a1 41 table 4.12 instruction formats for single data transfers category mnemonic 15 14 13 12 11 10 9 8 single data transfer movs.w @?s,ds movs.w @as,ds movs.w @as+,ds movs.w @as+is,ds 111101 as 0: r4 1: r5 2: r2 movs.w ds,@a? movs.w ds,@as movs.w ds,@as+ movs.w ds,@as+is 3: r3 movs.l @ as,ds movs.l @as,ds movs.l @as+,ds movs.l @as+is,ds movs.l ds,@a? movs.l ds,@as movs.l ds,@as+ movs.l ds,@as+is table 4.12 instruction formats for single data transfers (cont) category mnemonic 7 6 5 43210 single data transfer movs.w @?s,ds movs.w @as,ds movs.w @as+,ds movs.w @as+is,ds ds 0: (*) 1: (*) 2: (*) 3: (*) 0 0 1 1 0 1 0 1 00 movs.w ds,@a? movs.w ds,@as movs.w ds,@as+ movs.w ds,@as+is 4: (*) 5: a1 6: (*) 7: a0 0 0 1 1 0 1 0 1 01 movs.l @?s,ds movs.l @as,ds movs.l @as+,ds movs.l @as+is,ds 8: x0 9: x1 a: y0 b: y1 0 0 1 1 0 1 0 1 10 movs.l ds,@a? movs.l ds,@as movs.l ds,@as+ movs.l ds,@as+is c: m0 d: a1g e:m1 f:a0g 0 0 1 1 0 1 0 1 11 note: system reserved code 42 4.6.2 parallel processing instructions parallel processing instructions are used by the sh-dsp to increase the execution efficiency of digital signal processing using the dsp unit. they are 32 bits long and four can be processed in parallel (one alu operation, one multiplication, and two data transfers). parallel processing instructions are divided into two fields, a and b. the data transfer instructions are defined in field a and the alu operation instruction and multiplication instruction are defined in field b. these instructions can be defined independently, processed independently, and can be executed simultaneously in parallel. table 4.13 lists the field a parallel data transfer instructions; figure 4.14 shows the field b alu operation instructions and multiplication instructions. the field a instructions are identical to the double data transfer instructions shown in table 4.11. table 4.13 field a parallel data transfer instructions category mnemonic 31 30 29 28 27 26 25 24 23 x memory nopx 11 111 00 0 data transfers movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx ax dx movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix da y memory nopy 0 data transfers movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy ay 43 table 4.13 field a parallel data transfer instructions (cont) category mnemonic 22 21 20 19 18 17 16 15? x memory nopx 0 0 0 field b data transfers movx.w @ax,dx movx.w @ax+,dx movx.w @ax+ix,dx 00 1 1 1 0 1 movx.w da,@ax movx.w da,@ax+ movx.w da,@ax+ix 10 1 1 1 0 1 y memory nopy 00 00 data transfers movy.w @ay,dy movy.w @ay+,dy movy.w @ay+iy,dy dy 0 0 1 1 1 0 1 movy.w da,@ay movy.w da,@ay+ movy.w da,@ay+iy da 1 0 1 1 1 0 1 ax: 0=r4, 1=r5 ay: 0=r6, 1=r7 dx: 0=x0, 1=x1 dy: 0=y0, 1=y1 da: 0=a0, 1=a1 44 category mnemonic 14 13 12 10 9 8 7 6 5 4 3 2 1 0 15 dz 00 0 00 0 ?6 imm +16 ?32 imm +32 0 1 0se sf sx sy dgdu 00 0 00 1 01 0 0 01 1 1 01 1 0 10 0 0 00 dz 0: (* 1 ) 1: (* 1 ) 2: (* 1 ) 3: (* 1 ) 4: (* 1 ) 5: a1 6: (* 1 ) 7: a0 8: x0 9: x1 a: y0 b: y1 c: m0 d: (* 1 ) e: m1 f: (* 1 ) 0 0 01 0 0 00 1 1 0 1 0 1 0 0 1 0 1 0 1 1 1 1 1 1 1 0 01 1 1 0 0 1 1 1 1 0:x0 1:x1 2:y0 3:a1 0:x0 1:x1 2:a0 3:a1 0:x0 0:y0 1:y1 2:x0 3:a1 0:y0 0:m0 1:y0 1:y1 1:m1 2:a0 2:m0 2:a0 3:a1 3:m1 3:a1 01 0 pshl #imm, dz psha #imm, dz pmuls se, sf, dg reserved reserved reserved reserved pwsb sx, sy, dz pwad sx, sy, dz pabs sx, dz prnd sx, dz prnd sy, dz pabs sy, dz reserved psubc sx, sy, dz paddc sx, sy, dz pcmp sx, sy psub sx, sy, du pmuls se, sf, dg padd sx, sy, du pmuls se, sf, dg imm. shift six operand parallel instruction three operand instructions 31?7 25?6 26 10 field a 11 0 0 1 a b c d e figure 4.5 field b alu operation instructions and multiplication instructions 45 category a mnemonic 14 13 12 10 9 8 7 6 5 4 3 2 1 0 15 11 0 0 0 10 0 1 0 0 1 1 1 0 01 0 1 0 0 1 1 1 0 00 1 1 0 0 1 1 1 0 01 1 1 0 0 1 1 1 0 00 1 1 0 0 1 1 1 0 01 if cc 1 0* 3 00 1 0 0 1 1 1 reserved reserved reserved reserved reserved (if cc) *1 pshl sx, sy, dz (if cc) psha sx, sy, dz (if cc) psub sx, sy, dz (if cc) padd sx, sy, dz (if cc) pand sx, sy, dz (if cc) pxor sx, sy, dz (if cc) por sx, sy, dz (if cc) pdec sx, dz (if cc) pdec sy, dz (if cc) pinc sx, dz (if cc) pinc sy, dz (if cc) pclr dz (if cc) pdmsb sx, dz (if cc) pdmsb sy, dz (if cc) pneg sx, dz (if cc) pneg sy, dz (if cc) pcopy sx, dz (if cc) pcopy sy, dz (if cc) psts mach, dz (if cc) psts macl, dz (if cc) plds dz, macl (if cc) plds dz, mach conditional three operand instructions 00 if cc 1 0 field a sx 0:x0 1:x1 2:y0 3:y1 sy 0:y0 1:y1 2:m0 3:m1 dz 0:(* 1 ) 1:(* 1 ) 2:(* 1 ) 3:(* 1 ) 4:(* 1 ) 5:a1 6:(* 1 ) 7:a0 8:x0 9:x1 a:y0 b:y1 c:m0 d:(* 1 ) e:m1 f:(* 1 ) 10:dct 11:dcf 01* 2 31?7 25?6 26 11 11 notes: 1. 2. 3. [if cc]: dct (dc bit true), dcf (dc bit false), or none (unconditional instruction) unconditional system reserved code b c d e figure 4.5 field b alu operation instructions and multiplication instructions (cont) 46 4.7 alu fixed decimal point operations 4.7.1 function alu fixed decimal point operations basically work with a 32-bit unit to which 8 guard bits are added for a total of 40 bits. when the source operand is a register without guard bits, the register? sign bit is extended and copied to the guard bits. when the destination operand is a register without guard bits, the lower 32 bits of the operation result are stored in the destination register. alu fixed decimal point operations are performed between registers. the source and destination operands are selected independently from the dsp register. when there are guard bits in the selected register, the operation is also executed on the guard bits. these operations are executed in the dsp stage (the last stage) of the pipeline. whenever an alu arithmetic operation is executed, the dsr register? dc, n, z, v, and gt bits are updated by the operation result. for conditional instructions, however, condition bits are not updated even when the specified condition is achieved. for unconditional instructions, the bits are updated according to the operation result. the condition reflected in the dc bit is selected with the cs[2:0] bits. the dc bits of the paddc and psub instructions, however, are updated regardless of the cs bit settings. in the paddc instruction, it is updated as a carry flag; in the psub instruction, it is updated as a borrow flag. figure 4.6 shows the alu fixed decimal point operation flowchart. 31 0 31 31 0 0 alu gt v n zdc dsr source 1 source 2 destination guard bits guard bits guard bits figure 4.6 alu fixed decimal point operation flowchart 47 when the memory read destination operand is the same as the alu operation source operand and the data transfer instruction program is written on the same line as the alu operation, data loaded from memory in the memory access stage (ma) cannot be used as the source operand of the alu operation instruction. when this occurs, the result of the instruction executed first is used as the source operand of the alu operation and is updated as the destination operand of the data load instruction thereafter. figure 4.7 is a flowchart of the operation. 123 456 movx movx, add if id if id ex (ad- dressing) ex (ad- dressing) ma (movx) ma (movx) dsp (nop) dsp (add) movx.w @ r4+r8, x0 movx.w @ r4+, x0 padd x0, y0, a0 slot the result of the previous step is used. figure 4.7 sample processing flowchart 4.7.2 instructions and operands table 4.14 shows the types of alu fixed decimal point arithmetic operations. table 4.15 shows the correspondence between the operands and registers. 48 table 4.14 types of alu fixed decimal point arithmetic operations mnemonic function source 1 source 2 destination padd addition sx sy dz (du) psub subtraction sx sy dz (du) paddc addition with carry sx sy dz psubc subtraction with borrow sx sy dz pcmp compare sx sy pcopy copy data sx dz ?ydz pabs absolute value sx dz ?ydz pneg invert sign sx dz ?ydz pclr zero clear dz table 4.15 correspondence between operands and registers for alu fixed decimal point arithmetic operations operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes* 1 yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes du* 2 yes yes yes yes notes: 1. yes: register can be used with operand. 2. du: operand when used in combination with multiplication. 4.7.3 dc bit the dc bit is set as follows depending on the specification of the cs0-cs2 bits (condition select bits) of the dsr register. 49 carry/borrow mode: cs2?s0 = 000: the dc bit indicates whether a carry or borrow has occurred from the msb of the operation result. the guard bits have no affect on this. this mode is the default. figure 4.8 shows examples when carries and borrows occur. 0000 0000 1111 1111 1111 1111 0000 0000 0000 0000 0000 0001 0000 0001 0000 0000 0000 0000 +) 1111 1111 0111 0000 0000 0000 0011 1111 0001 0000 0000 0000 0011 1110 1000 0000 0000 0000 +) (1) 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0001 0000 0000 0000 0000 0000 0000 ? 0000 0000 0001 0000 0000 0001 0000 0000 0001 0000 0000 0010 1111 1111 1111 1111 1111 1111 ? guard bits guard bits guard bits guard bits example 1: carry example 2: carry example 3: borrow example 4: borrow position where carry is detected position where carry is detected position where borrow is detected position where borrow is detected figure 4.8 examples of carries and borrows negative mode: cs2?s0 = 001: in this mode, the dc bit is the same as the msb of the operation result. when a result is negative, the dc bit is 1. when the result is positive, the dc bit is 0. alu arithmetic operations are always done in 40 bits. the sign bit indicating positive or negative is thus the msb included in the guard bits of the operation result rather than the msb of the destination operand. figure 4.9 shows an example of distinguishing negative from positive. in this mode, the dc bit has the same value as the condition bit n. 1100 0000 0000 0000 0000 0000 0000 0000 0000 0000 0000 0001 1100 0000 0000 0000 0000 0001 +) 0011 0000 0000 0000 0000 0000 0000 0000 1000 0000 0000 0001 0011 0000 1000 0000 0000 0001 +) guard bits guard bits example 1: negative example 2: positive sign bit sign bit figure 4.9 distinguishing negative and positive 50 zero mode: cs2?s0 = 010: the dc bit indicates whether the operation result is zero. when it is, the dc bit is 1. when the operation result is nonzero, the dc bit is 0. in this mode, the dc bit has the same value as the condition bit z. overflow mode: cs2?s0 = 011: the dc bit indicates whether the operation result has caused an overflow. when the operation result without the guard bits has exceeded the bounds of the destination register, the dc bit is set to 1. the dc bit considers there to be no guard bits, which makes it an overflow even when there are guard bits. this means that the dc bit is always set to 1 when large numbers use guard bits. in this mode, the dc bit has the same value as the condition bit v. figure 4.10 shows an example of distinguishing overflows. 1111 1111 1111 1111 1111 1111 1111 1111 1000 0000 0000 0000 1111 1111 0111 1111 1111 1111 +) 1111 1111 1111 1111 1111 1111 1111 1111 1000 0000 0000 0001 1111 1111 1000 0000 0000 0000 +) guard bits guard bits example 1: overflow example 2: no overflow overflow detection range overflow detection range figure 4.10 distinguishing overflows signed greater than mode: cs2?s0 = 100: the dc bit indicates whether the source 1 data (signed) is greater than the source 2 data (signed) in the result of a comparison instruction pcmp. for that reason, the pcmp instruction is executed before checking the dc bit in this mode. when the source 1 data is larger than the source 2 data, the result of the comparison is positive, so this mode becomes similar to the negative mode. when the source 1 data is larger than the source 2 data and the bounds of the destination operand are exceeded, however, the sign of the result of the comparison becomes negative. the dc bit is updated. in this mode, the dc bit has the same value as the condition bit gt. the equation shown below defines the dc bit in this mode. however, vr becomes a positive value when the result including the guard bit area exceeds the display range of the destination operand. dc bit = ~ {(n bit ^ vr)|z bit} when the pcmp instruction is executed in this mode, the dc bit becomes the same value as the t bit that indicates the result of the sh core? cmp/gt instruction. in this mode, the dc bit is updated according to the above definition for instructions other than the pcmp instruction as well. signed greater than or equal to mode: cs2?s0 = 101: the dc bit indicates whether or not the source 1 data (signed) is greater than or equal to the source 2 data (signed) in the result of the execution of a comparison instruction pcmp. for that reason, the pcmp instruction is executed before checking the dc bit in this mode. this mode is similar to the signed greater than mode except for checking if the operands are the same. the equation shown below defines the dc bit in 51 this mode. however, vr becomes a positive value when the result, including the guard bit area, exceeds the display range of the destination operand. dc bit = ~ (n bit ^ vr) when the pcmp instruction is executed in this mode, the dc bit becomes the same value as the t bit that indicates the result of the superh core? cmp/ge instruction. in this mode, the dc bit is updated according to the above definition for instructions other than the pcmp instruction as well. 4.7.4 condition bits the condition bits are set as follows: the n (negative) bit has the same value as the dc bit when the cs bits specify negative mode. when the operation result is negative, the n bit is 1. when the operation result is positive, the n bit is 0. the z (zero) bit has the same value as the dc bit when the cs bits specify zero mode. when the operation result is zero, the z bit is 1. when the operation result is nonzero, the z bit is 0. the v (overflow) bit has the same value as the dc bit when the cs bits specify overflow mode. when the operation result exceeds the bounds of the destination register without the guard bits, the v bit is 1. otherwise, the v bit is 0. the gt (greater than) bit has the same value as the dc bit when the cs bits specify signed greater than mode. when the comparison result indicates the source 1 data is greater than the source 2 data, the gt bit is 1. otherwise, the gt bit is 0. 4.7.5 overflow prevention function (saturation operation) when the s bit of the sr register is set to 1, the overflow prevention function is engaged for the alu fixed decimal point arithmetic operation executed by the dsp unit. when the operation result overflows, the maximum (positive) or minimum (negative) value is stored. 4.8 alu integer operations alu integer operations are basically 24-bit operations on the top word (the top 16 bits, or bits 16 through 31) and 8 guard bits. in alu integer operations, the bottom word of the source operand (the bottom 16 bits, or bits 0?5) is ignored and the bottom word of the destination operand is cleared with zeros. when the source operand has no guard bits, the sign bit is extended to fill the guard bits. when the destination operand has no guard bits, the top word of the operation result (not including the guard bits) are stored in the top word of the destination register. integer operations are basically the same as alu fixed decimal point arithmetic operations. there are only two types of integer operation instructions, increment and decrement, which change the second operand by +1 or ?. 16 bits of integer data (word data) is loaded to the dsp register and 52 stored in the top word. the operation is performed using the top word in the dsp register. when there are guard bits, they are valid as well. these operations are executed in the dsp stage (the last stage) of the pipeline. whenever an alu integer arithmetic operation is executed, the dsr register? dc, n, z, v, and gt bits are basically updated by the operation result. this is the same as for alu fixed decimal point operations. for conditional instructions, condition bits and flags are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. figure 4.11 shows the alu integer operation flowchart. 31 0 31 31 0 0 alu gt v n zdc dsr : cleared to 0 guard bits guard bits guard bits : ignored destination source 1 source 2 figure 4.11 alu integer operation flowchart 53 table 4.16 lists the types of alu integer operations. table 4.17 shows the correspondence between the operands and registers. table 4.16 types of alu integer operations mnemonic function source 1 source 2 destination pinc increment by 1 sx (+1) dz (+1) sy dz pdec decrement by 1 sx (?) dz (?) sy dz table 4.17 correspondence between operands and registers for alu integer operations operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. when the s bit of the sr register is set to 1, the overflow prevention function (saturation operation) is engaged. the overflow prevention function can be specified for alu integer arithmetic operations executed by the dsp unit. when the operation result overflows, the maximum (positive) or minimum (negative) value is stored. 4.9 alu logical operations 4.9.1 function alu logical operations are performed between registers. the source and destination operands are selected independently from the dsp register. these operations use only the top word of the respective operands. the bottom word of the source operand and the guard bits are ignored and the bottom word of the destination operand and guard bits are cleared with zeros. these operations are executed in the dsp stage (the last stage) of the pipeline. whenever an alu arithmetic operation is executed, the dsr register? dc, n, z, v, and gt bits are basically updated by the operation result. for conditional instructions, condition bits and flags are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. the dc bit is updated as specified in the cs bits. figure 4.12 shows the alu logical operation flowchart. 54 31 0 31 31 0 0 alu gt v n zdc dsr : cleared to 0 : ignored source 1 source 2 guard bits guard bits guard bits destination figure 4.12 alu logical operation flowchart 4.9.2 instructions and operands table 4.18 lists the types of alu logical arithmetic operations. table 4.19 shows the correspondence between the operands and registers, which is the same as for alu fixed decimal point operations. table 4.18 types of alu logical arithmetic operations mnemonic function source 1 source 2 destination pand and sx sy dz por or sx sy dz pxor exclusive or sx sy dz table 4.19 correspondence between operands and registers for alu logical arithmetic operations operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. 55 4.9.3 dc bit the dc bit is set in logical operations as follows: carry/borrow mode: cs2?s0 = 000: the dc bit is always 0. negative mode: cs2?s0 = 001: in this mode, the dc bit is the same as the bit 31 of the operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2?s0 = 010: the dc bit is 1 when the operation result is zero; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2?s0 = 011: the dc bit is always 0. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2?s0 = 100: the dc bit is always 0. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2?s0 = 101: the dc bit is always 0. 4.9.4 condition bits the condition bits are set as follows. the n bit is the value of bit 31 of the operation result. the z bit is 1 when the operation result is zero; otherwise, the z bit is 0. the v bit is always 0. the gt bit is always 0. 4.10 fixed decimal point multiplication multiplication in the dsp unit is between signed single-length operands. it is processed in one cycle. when double-length multiplication is needed, use the superh risc engine? double-length multiplication. basically, the operation result for multiplication is 32 bits. when a register that has guard bits is specified as the destination operand, it is sign-extended. in the dsp unit, multiplication is a fixed decimal point arithmetic operation, not an integer operation. this means the top words of the constant and multiplicand are entered into the mac operator. in superh risc engine multiplication, the bottom words of the two operands are entered into the mac operator. the operation result thus is different from the superh risc engine. the superh risc engine operation result is matched to the lsb of the destination, while the fixed 56 decimal point multiplication operation result is matched to the msb. the lsb of the operation result in fixed decimal point multiplication is thus always 0. figure 4.13 shows a flowchart of fixed decimal point multiplication. 31 0 31 31 0 0 0 s 0 mac guard bits guard bits guard bits : ignored destination figure 4.13 fixed decimal point multiplication flowchart table 4.20 shows the fixed decimal point multiplication instruction. table 4.21 shows the correspondence between the operands and registers. table 4.20 fixed decimal point multiplication mnemonic function source 1 source 2 destination pmuls signed multiplication se sf dg table 4.21 correspondence between operands and registers for fixed decimal point multiplication operand x0 x1 y0 y1 m0 m1 a0 a1 se yes yes yes yes sf yes yes yes yes dg yes yes yes yes note: yes: register can be used with operand. dsp unit fixed decimal point multiplication completes a single-length 16 bit 16 bit operation in one cycle. other multiplication is the same as in the superh risc engines. 57 multiplication instructions do not update the dc, n, z, v, gt, or any condition bit of the dsr register. the overflow prevention function is valid for dsp unit multiplication. specify it by setting the s bit of the sr register is set to 1. when an overflow or underflow occurs, the operation result value is the maximum or minimum value respectively. in dsp unit fixed decimal point multiplication, overflows only occur for h'8000 h'8000 ((e1.0) (e1.0)). when the s bit is 0, the operation result is h'80000000, which means e1.0 rather than the correct answer of +1.0. when the s bit is 1, the overflow prevention function is engaged and the result is h'007fffffff. 4.11 shift operations the amount of shift in shift operations is specified either through a register or using a direct immediate value. other source operands and destination operands are registers. there are two types of shift operations: arithmetic and logical. table 4.22 shows the operation types. the correspondence between operands and registers is the same as for alu fixed decimal point operations, except for immediate operands. the correspondence is shown in table 4.23. table 4.22 types of shift operations mnemonic function source 1 source 2 destination psha sx, sy, dz arithmetic shift sx sy dz pshl sx, sy, dz logical shift sx sy dz psha #imm, dz arithmetic shift with immediate data dz imm1 dz pshl #imm, dz logical shift with immediate data dz imm1 dz ?2 imm1 +32, ?6 imm2 +16 table 4.23 correspondence between operands and registers for shift operations operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. 58 4.11.1 arithmetic shift operations function: alu arithmetic shift operations basically work with a 32-bit unit to which 8 guard bits are added for a total of 40 bits. alu fixed decimal point operations are basically performed between registers. when the source operand has no guard bits, the register? sign bit is copied to the guard bits. when the destination operand has no guard bits, the lower 32 bits of the operation result are stored in the destination register. in arithmetic shifts, all bits of the source 1 operand and destination operand are valid. the source 2 operand, which specifies the shift amount, is integer data. the source 2 operand is specified as a register or immediate operand. the valid amount of shift is ?2 to +32. negative values are shifts to the right; positive values are shifts to the left. between ?4 and +63 can be specified for the source 2 operand, but only ?2 to +32 is valid. when an invalid number is specified, the results cannot be guaranteed. when an immediate value is specified for the shift amount, the source 1 operand must be the same as the destination operand. the action of the operation is the same as for fixed decimal point operations and is executed in the dsp stage (the last stage) of the pipeline. whenever an arithmetic shift operation is executed, the dsr register? dc, n, z, v, and gt bits are basically updated by the operation result. this is the same as for alu fixed decimal point operations. for conditional instructions, condition bits are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. figure 4.14 shows the arithmetic shift operation flowchart. 7g 0g 31 16 15 0 0 3 0 < 0 +32 to ?2 7g 0g 31 23 22 16 15 0 60 imm1 7g 0g 31 16 15 0 dz gt dc znv dsr left shift right shift shift out shift out (copy msb) shift amount data (source 2) update : ignored figure 4.14 arithmetic shift operation flowchart 59 dc bit: the dc bit is set as follows depending on the mode specified by the cs bits: carry/borrow mode: cs2ecs0 = 000: the dc bit is the operation result, the value of the bit pushed out by the last shift. negative mode: cs2ecs0 = 001: set to 1 for a negative operation result and 0 for a positive operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2ecs0 = 010: the dc bit is 1 when the operation result is zero; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2ecs0 = 011: the dc bit is set to 1 by an overflow. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2ecs0 = 100: the dc bit is always 0. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2ecs0 = 101: the dc bit is always 0. condition bits: the condition bits are set as follows: the n bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 for a negative operation result and 0 for a positive operation result. the z bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 when the operation result is zero; otherwise, the z bit is 0. the v bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 for an overflow. the gt bit is always 0. overflow prevention function (saturation operation): when the s bit of the sr register is set to 1, the overflow prevention function is engaged for the alu fixed decimal point arithmetic operation executed by the dsp unit. when the operation result overflows, the maximum (positive) or minimum (negative) value is stored. 4.11.2 logical shift operations function: logical shift operations use the top words of the source 1 operand and the destination operand. as in alu logical operations, the guard bits and bottom word of the operands are ignored. the source 2 operand, which specifies the shift amount, is integer data. the source 2 operand is specified as a register or immediate operand. the valid amount of shift is ?6 to +16. negative values are shifts to the right; positive values are shifts to the left. between ?2 and +31 can be specified for the source 2 operand, but only ?6 to +16 is valid. when an invalid number is specified, the results cannot be guaranteed. when an immediate value is specified for the shift amount, the source 1 operand must be the same as the destination operand. the action of the operation is the same as for fixed decimal point operations and is executed in the dsp stage (the last stage) of the pipeline. 60 whenever a logical shift operation is executed, the dsr register? dc, n, z, v, and gt bits are basically updated by the operation result. this is the same as for alu logical operations. for conditional instructions, condition bits are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. figure 4.15 shows the logical shift operation flowchart. 7g 0g 31 16 15 0 0 3 0 < 0 +16 to ?6 7g 0g 31 23 22 16 15 0 5 0 imm2 7g 0g 31 16 15 0 dz gt dc znv dsr 0 shift out shift out update : ignored : cleared to 0 shift amount data (source 2) left shift right shift figure 4.15 logical shift operation flowchart dc bit: the dc bit is set as follows depending on the mode specified by the cs bits. carry/borrow mode: cs2ecs0 = 000: the dc bit is the operation result, the value of the bit pushed out by the last shift. negative mode: cs2ecs0 = 001: in this mode, the dc bit is the same as the bit 31 of the operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2ecs0 = 010: the dc bit is 1 when the operation result is all zeros; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2ecs0 = 011: the dc bit is always 0. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2ecs0 = 100: the dc bit is always 0. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2ecs0 = 101: the dc bit is always 0. condition bits: the condition bits are set as follows. 61 the n bit is the same as the result of the alu logical operation. it is set to the value of bit 31 of the operation result. the z bit is the same as the result of the alu logical operation. it is set to 1 when the operation result is all zeros; otherwise, the z bit is 0. the v bit is always 0. the gt bit is always 0. 4.12 the msb detection instruction 4.12.1 function the msb detection instruction (pdmsb: most significant bit detection) finds the amount of shift for normalizing the data. the operation result is the same as for alu integer operations. basically, the top 16 bits and 8 guard bits are valid for a total 24 bits. when the destination operand is a register that has no guard bits, it is stored in the top 16 bits of the destination register. the msb detection instruction works on all bits of the source operand, but gets its operation result in integer data. this is because the shift amount for normalization must be integer data for the arithmetic shift operation. the action of the operation is the same as for fixed decimal point operations and is executed in the dsp stage (the last stage) of the pipeline. whenever a pdmsb instruction is executed, the dsr register? dc, n, z, v, and gt bits are basically updated by the operation result. for conditional instructions, condition bits are not updated even when the specified condition is achieved and the instruction executed. for unconditional instructions, the bits are always updated according to the operation result. figure 4.16 shows the msb detection instruction flowchart. table 4.24 shows the relationship between source data and destination data. 62 31 0 31 0 gt v n zdc dsr priority encoder : cleared to 0 guard bits guard bits source 1 or 2 destination figure 4.16 msb detection flowchart 63 table 4.24 relationship between source data and destination data source data guard bits top word bottom word 7g 6g 5g?g 1g 0g 31 30 29 28 27? 27? 3 2 1 0 00 00 000 0 0 000 00 00 000 0 0 001 00 00 000 0 0 01* 00 00 000 0 0 1** 00 00 000 1 * * ** 00 00 001 * * * ** 00 00 01* * * * ** 00 00 1** * * *** 00 01 *** * * *** 01 ****** **** 10 ****** **** 11 10 *** * * *** 11 11 0** * * *** 11 11 10* * * * ** 11 11 110 * * * ** 11 11 111 0 * * ** 11 11 111 1 1 0** 11 11 111 1 1 10* 11 11 111 1 1 110 11 11 111 1 1 111 64 table 4.24 relationship between source data and destination data (cont) destination result guard bits top word 7g?g 31?2 21 20 19 18 17 16 10 hexadecimal all 0 all 0 0 1 1 1 1 1 +31 011 1 1 0+30 011 1 0 1+29 011 1 0 0+28 all 0 all 0 0 0 0 0 1 0 +2 000 0 0 1+1 000 0 0 00 all 1 all 1 1 1 1 1 1 1 1 111 1 1 0 2 all 1 all 1 1 1 1 0 0 0 8 111 0 0 0 8 all 1 all 1 1 1 1 1 1 0 2 111 1 1 1 1 all 0 all 0 0 0 0 0 0 0 0 000 0 0 1+1 000 0 1 0+2 all 0 all 0 0 1 1 1 0 0 +28 011 1 0 1+29 011 1 1 0+30 011 1 1 1+31 note: don? care bits have no effect. 65 4.12.2 instructions and operands table 4.25 shows the msb detection instruction. the correspondence between the operands and registers is the same as for alu fixed decimal point operations. it is shown in table 4.26. table 4.25 msb detection instruction mnemonic function source 1 source 2 destination pdmsb msb detection sx dz ?ydz table 4.26 correspondence between operands and registers for msb detection instructions operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. 4.12.3 dc bit the dc bit is set as follows depending on the mode specified by the cs bits: carry/borrow mode: cs2?s0 = 000: the dc bit is always 0. mode: cs2?s0 = 001: set to 1 for a negative operation result and 0 for a positive operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2?s0 = 010: the dc bit is 1 when the operation result is zero; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2?s0 = 011: the dc bit is always 0. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2?s0 = 100: set to 1 for a positive operation result and 0 for a negative operation result. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2?s0 = 101: set to 1 for a positive or zero operation result and 0 for a negative operation result. 66 4.12.4 condition bits the condition bits are set as follows. the n bit is the same as the result of the alu integer operation. it is set to 1 for a negative operation result and 0 for a positive operation result. the z bit is the same as the result of the alu integer operation. it is set to 1 when the operation result is zero; otherwise, the z bit is 0. the v bit is always 0. the gt bit is the same as the result of the alu integer operation. it is set 1 for a positive operation result and otherwise to 0. 4.13 rounding 4.13.1 operation function the dsp unit has a function for rounding 32-bit values to 16-bit values. when the value has guard bits, 40 bits are rounded to 24 bits. when the rounding instruction is executed, h'0000 8000 is added to the source operand and the bottom word is then cleared to zeros. rounding uses all bits of the source and destination operands. the action of the operation is the same as for fixed decimal point operations and is executed in the dsp stage (the last stage) of the pipeline. the rounding instruction is unconditional. the dsr register? dc, n, z, v, and gt bits are thus always updated according to the operation result. figure 4.17 shows the rounding flowchart. figure 4.18 shows the rounding process definitions. 67 31 0 31 0 alu gt v n zdc dsr : cleared to 0 h'00008000 addition destination source 1 or 2 guard bits guard bits figure 4.17 rounding flowchart h'000002 h'000001 0 h'0000018000 h'0000020000 h'0000028000 rounding result analog values actual value figure 4.18 rounding process definitions 68 4.13.2 instructions and operands table 4.27 shows the instruction. the correspondence between the operands and registers is the same as for alu fixed decimal point operations. it is shown in table 4.28. table 4.27 rounding instruction mnemonic function source 1 source 2 destination prnd rounding sx dz ?ydz table 4.28 correspondence between operands and registers for rounding instruction operand x0 x1 y0 y1 m0 m1 a0 a1 sx yes yes yes yes sy yes yes yes yes dz yes yes yes yes yes yes yes yes note: yes: register can be used with operand. 4.13.3 dc bit the dc bit is updated as follows depending on the mode specified by the cs bits. condition bits are updated as for alu fixed decimal point arithmetic operations. carry/borrow mode: cs2?s0 = 000: the dc bit is set to 1 when a carry or borrow from the msb of the operation result occurs; otherwise, it is set to 0. negative mode: cs2?s0 = 001: set to 1 for a negative operation result and 0 for a positive operation result. in this mode, the dc bit has the same value as bit n. zero mode: cs2?s0 = 010: the dc bit is 1 when the operation result is zero; otherwise, the dc bit is 0. in this mode, the dc bit has the same value as bit z. overflow mode: cs2?s0 = 011: the dc bit is set to 1 by an overflow; otherwise, it is set to 0. in this mode, the dc bit has the same value as bit v. signed greater than mode: cs2?s0 = 100: set to 1 for a positive operation result; otherwise, it is set to 0. in this mode, the dc bit has the same value as bit gt. signed greater than or equal to mode: cs2?s0 = 101: set to 1 for a positive or zero operation result; otherwise, it is set to 0.. 69 4.13.4 condition bits the condition bits are set as follows. they are updated as for alu fixed decimal point arithmetic operations. the n bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 for a negative operation result and 0 for a positive operation result. the z bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 when the operation result is zero; otherwise, the z bit is 0. the v bit is the same as the result of the alu fixed decimal point arithmetic operation. it is set to 1 for an overflow; otherwise, the v bit is 0. the gt bit is the same as the result of the alu fixed decimal point arithmetic operation and the alu integer operation. it is set 1 for a positive operation result; otherwise, the gt bit is 0. 4.13.5 overflow prevention function (saturation operation) when the s bit of the sr register is set to 1, the overflow prevention function can be specified for all rounding processing executed by the dsp unit. when the operation result overflows, the maximum (positive) or minimum (negative) value is stored. 4.14 condition select bits (cs) and the dsp condition bit (dc) dsp instructions may be either conditional or unconditional. unconditional instructions are executed without regard to the dsp condition bit (dc bit), but conditional instructions may reference the dc bit before they are executed. with unconditional instructions, the dsr register? dc bit and condition bits (n, z, v, and gt) are updated according to the results of the alu operation or shift operation. the dc bit and condition bits (n, z, v, and gt) are not updated regardless of whether the conditional instruction is executed. the dc bit is updated according to the specifications of the condition select (cs) bits. updates differ for arithmetic operations, logical operations, arithmetic shifts and logical shifts. table 4.29 shows the relationship between the cs bits and the dc bit. 70 table 4.29 condition select bits (cs) and dsp condition bit (dc) cs bits 2 1 0 condition mode description 0 0 0 carry/borrow the dc bit is set to 1 when a carry or borrow occurs in the result of an alu arithmetic operation. otherwise, it is cleared to 0. in logical operations, the dc bit is always cleared to 0. for shift operations (the psha and pshl instructions), the bit shifted out last is copied to the dc bit. 0 0 1 negative in alu arithmetic operations or arithmetic shifts (psha), the msb of the result (including the guard bits) is copied to the dc bit. in alu logical operations and logical shifts (pshl), the msb of the result (not including the guard bits) is copied to the dc bit. 0 1 0 zero when the result of an alu or shift operation is all zeros (0), the dc bit is set to 1. otherwise, it is cleared to 0. 0 1 1 overflow in alu arithmetic operations or arithmetic shifts (psha), when the operation result (not including the guard bits) exceeds the destination register? value range, the dc bit is set to 1. otherwise, it is cleared to 0. in alu logical operations and logical shifts (pshl), the dc bit is always cleared to 0. 1 0 0 signed greater than this mode is like the greater than or equal to mode, but the dc bit is cleared to 0 when the operation result is zero (0). when the operation result (including the guard bits) exceeds the expressible limits, the true condition is vr. dc bit = ~{(n bit ^ vr)|z bit)}; for arithmetic operations dc bit = 0; for logical operations 1 0 1 greater than or equal to in alu arithmetic operations or arithmetic shifts (psha), when the result does not overflow, the value is the inversion of the negative mode? dc bit. when the operation result (including the guard bits) exceeds the expressible limits, the value is the same as the negative mode? dc bit. in alu logical operations and logical shifts (pshl), the dc bit is always cleared to 0. dc bit = ~(n bit ^ vr)); for arithmetic operations dc bit = 0; for logical operations 1 1 0 reserved 111 71 4.15 overflow prevention function (saturation operation) the overflow prevention function (saturation operation) is specified by the s bit of the sr register. this function is valid for arithmetic operations executed by the dsp unit and multiply and accumulate operations executed by the existing sh-1 and sh-2. an overflow occurs when the operation result exceeds the bounds that can be expressed as a two? complement (not including the guard bits). table 4.30 shows the overflow definitions for fixed decimal point arithmetic operations. table 4.31 shows the overflow definitions for integer arithmetic operations. multiply/accumulate calculation instructions (mac) supported by previous superh risc engines are performed on 64- bit registers (mach and macl), so the overflow value differs from the maximum and minimum values. they are defined exactly the same as before. table 4.30 overflow definitions for fixed decimal point arithmetic operations sign overflow condition maximum/ minimum hexadecimal display positive result > 1? ?1 1? ?1 007fffffff negative result < ? ? ff80000000 table 4.31 overflow definitions for integer arithmetic operations sign overflow condition maximum/ minimum hexadecimal display positive result > 2 ?5 ?1 2 ?5 ?1 007fff**** negative result < ? ?5 ? ?5 ff8000**** note: don? care bits have no effect. when the overflow prevention function is specified, overflows do not occur. naturally, the overflow bit (v bit) is not set. when the cs bits specify overflow mode, the dc bit is not set either. 72 4.16 data transfers the sh-dsp can perform up to two data transfers in parallel between the dsp register and on- chip memory with the dsp unit. the sh-dsp has the following types of data transfers: 1. x and y memory data transfers: data transfer to x and y memory using the xdb and ydb buses double data transfer: data transfer only, where transfer in one direction only is permitted parallel data transfers: data transfer that proceeds in parallel to alu operation processing 2. single data transfers: data transfer to on-chip memory using the idb bus note: data transfer instructions do not update the dsr register?s condition bits. table 4.32 shows the various functions. table 4.32 data transfer functions category bus length parallel processing with alu operation parallel processing with data transfer instruction length x and y memory data transfer x bus y bus 16 bits none (double) none (x or y bus) 16 bits available (x and y bus) 16 bits available (parallel) none (x or y bus) 32 bits available (x and y bus) 32 bits single data transfer idb bus 32 bits 16 bits none none 16 bits 4.16.1 x and y memory data transfer x and y memory data transfers allow two data transfers to be executed in parallel and allow data transfers to be executed in parallel with dsp data operations. 32-bit instruction code is required for executing dsp data operations and transfers in parallel. this is called a parallel data transfer. when executing an x and y memory data transfer by itself, 16-bit instruction code is used. this is called a double data transfer. data transfers consist of x memory data transfers and y memory data transfers. x memory data is loaded to either the x0 or x1 register; y memory data is loaded to the y0 or y1 register. the x0, x1, y0, and y1 registers become the destination registers. data can be stored in the x and y 73 memory if the a0 or a1 register is the source register. all these data transfers involve word data (16 bits). data is transferred from the top word of the source register. data is transferred to the top word of the destination register and the bottom word is automatically cleared with zeros. specifying a conditional instruction as the operation instruction executed in parallel has no effect on the data transfer instructions. x and y memory data transfers access only the x and y memory; they cannot access other memory areas. x pointer (r4, r5) y pointer (r6, r7) xab[15:1] yab[15:1] 0, +2, +r8 0, +2, +r9 xdb[15:0] ydb[15:0] x0 x1 a0 a1 y0 y1 x memory (ram, rom) y memory (ram, rom) : cannot be set : not affected for storing; cleared for loading m0 m1 a1g dsr a0g figure 4.19 flowchart of x and y memory data transfers 4.16.2 single data transfers single data transfers execute only one data transfer. they use 16-bit instruction code. single data transfers cannot be processed in parallel with alu operations. the x pointer, which accesses x memory, and two added pointers are valid; the y pointer is not valid. as with the superh risc engine, single data transfers can access all memory areas, including external memory. except for the dsr register, the dsp registers can be specified as source and destination operands. (the dsr register is defined as the system register, so it can transfer data with lds and sts instructions.) the guard bit registers a0g and a1g can be specified for operands as independent registers. 74 single data transfers use the iab and idb buses in place of the x bus and y bus, so contention occurs on the idb bus between data transfers and instruction fetches. single data transfers handle word and longword data. word data transfers involve only the top word of the register. when data is loaded to a register, it goes to the top word and the bottom word is automatically filled with zeros. if there are guard bits, the sign bit is extended to fill them. when storing from a register, the top word is stored. when a longword is transferred, 32 bits are valid. when loading a register that has guard bits, the sign bit is extended to fill the guard bits. when a guard bit register is stored, the top 24 bits become undefined, and the read out is to the idb bus. when the guard bit registers a0g and a1g load word data as the destination registers of the movs.w instruction, the bottom byte is written to the register. pointer (r2, r3, r4, r5) iab[31:0] ?, 0, +2, +r8 idb[15:0] x0 x1 a0 a1 y0 y1 cannot be set not affected for storing; cleared for loading. see the text for information about a0g and a1g. m0 m1 a1g dsr a0g all memory areas : : figure 4.20 single data transfer flowchart (word) 75 pointer (r2, r3, r4, r5) iab[31:0] ?, 0, +4, +r8 idb[31:0] x0 x1 a0 a1 y0 y1 : cannot be set m0 m1 a1g dsr a0g all memory areas figure 4.21 single data transfer flowchart (longword) 76 data transfers are executed in the ma stage of the pipeline while dsp operations are executed in the dsp stage. since the next data store instruction starts before the data operation instruction has finished, a stall cycle is inserted when the store instruction comes on the instruction line after the data operation instruction. this overhead cycle can be avoided by adding one instruction between the data operation instruction and the data transfer instruction. figure 4.22 shows an example. 123456 movx movx, add if id if id ex (ad- dressing) dsp movx.w a0, @r4+ movx.w @r5, x1 movx.w a0, @r4+ padd x0, y0, a0 slot if id movx movx movx dsp (nop) 7 add movx insert an unrelated step between data operation instruction and store instruction. ex (ad- dressing) ex (ad- dressing) figure 4.22 example of the execution of operation and data store instructions 4.17 operand contention data contention occurs when the same register is specified as the destination operand for two or more parallel processing instructions. it occurs in three cases. 1. when the same destination operand is specified for an alu operation and multiplication (du, dg) 2. when the same destination operand is specified for an x memory load and an alu operation (dx, du, dz) 3. when the same destination operand is specified for a y memory load and an alu operation (dx, du, dz) results cannot be guaranteed when contention occurs. table 4.33 shows the operand and register combinations that cause contention. some assemblers can detect these types of contention, so pay attention to assembler functions when selecting one. 77 table 4.33 operand and register combinations that create contention dsp register operation operand x0 x1 y0 y1 m0 m1 a0 a1 x memory ax load ix dx * 2 * 2 y memory ay load iy dy * 3 * 3 6-operand alu sx * 1 * 1 * 1 * 1 operation sy * 1 * 1 * 1 * 1 du * 2 * 3 * 4 * 4 3-operand se * 1 * 1 * 1 * 1 multiplication sf * 1 * 1 * 1 * 1 dg * 1 * 1 * 4 * 4 3-operand alu sx * 1 * 1 * 1 * 1 operation sy * 1 * 1 * 1 * 1 dz * 2 * 2 * 3 * 3 * 1 * 1 * 1 * 1 notes: 1. register is settable for the operand 2. dx, du, and dz contend 3. dy, du, and dz contend 4. du and dg contend 78 4.18 dsp repeat (loop) control the sh-dsp repeat (loop) control function is a special utility for controlling repetition efficiently. the setrc instruction is executed to hold a repeat count in the repeat counter (rc, 12 bits) and set an execution mode in which the repeat (loop) program is repeated until the rc is 1. upon completion of the repeat operation, the content of the rc becomes 0. the repeat start register (rs) holds the start address of the repeated section. the repeat end register (re) holds the ending address of the repeated section. (there are some exceptions. see 4.19.1 notes.) the repeat counter (rc) holds the repeat count. the procedure for executing repeat control is shown below: 1. set the repeat start address in the rs register. 2. set the repeat end address in the re register. 3. set the repeat count in the rc counter. 4. execute the repeated program (loop). the following instructions are used for executing 1 and 2: ldrs @(disp,pc); ldre @(disp,pc); the setrc instruction is used to execute 3 and 4. immediate data or a general register may be used to specify the repeat count as the operand of the setrc instruction: setrc #imm; #imm ? rc, enable repeat control setrc rm; rm ? rc, enable repeat control #imm is 8 bits and the rc counter is 12 bits, so to set the rc counter to a value of 256 or greater, use the rm register. a sample program is shown below. ldrs rptstart; ldre rptend; setrc #imm; rc=#imm instr0; ; instr1~5 executes repeatedly rptstart: instrl; instr2; instr3; instr4; rptend: instr5; instr6; 79 there are several restrictions on repeat control: 1. at least one instruction must come between the setrc instruction and the first instruction of the repeat program (loop). 2. execute the setrc instruction after executing the ldrs and ldre instructions. 3. when there are more than four instructions for the repeat program (loop) and there is no repeat start address (in the above example, it was address instr1) at the long word boundary, one cycle stall (cycle awaiting execution) is required for each repeat. 4. when there are three or fewer instructions in the loop, branch instructions (bra, bsr, bt, bf, bt/s, bf/s, bsrf, rts, braf, rte, jsr, jmp), repeat control instructions (setrc, ldrs, ldre), sr, rs, and re load instructions, and trapa cannot be used. if they are described, error exemption processing is started and the address values shown in table 4.34 are pushed out to the stack area pointed by r15. table 4.34 pc values pushed out (1) conditions position address pushed out rc>=2 any rptstart rc=1 any program address of illegal instruction 5. if there are four or fewer instructions in the loop, branched instructions (bra, bsr, bt, bf, bt/s, bf/s, bsrf, rts, braf, rte, jsr, jmp), repeat control instructions (setrc, ldrs, ldre), sr, rs, and re load instructions, and trapa cannot be used for the last three instructions in the repeat program (loop). if they are described, error exception processing is started and the address values shown in table 4.35 are pushed out to the stack area pointed by r15. in case of repeat control instruction (setrc, ldrs, ldre), and sr, rs, and re load instructions, they cannot be described in positions other than the repeat module. if described, proper operation cannot be secured. table 4.35 pc values pushed out (2) conditions position address pushed out rc>=2 instr3 program address of illegal instruction instr4 rptstart-4 instr5 rptstart-2 rc=1 any program address of illegal instruction 6. when there are three or fewer instructions in the loop, pc relative instructions (mova (disp,pc), r0, or the like) can only be used at the first instruction (instr1). 7. if there are four or more instructions in the loop, pc relative instructions (mova (disp,pc), r0, or the like) cannot be used in the final two instructions. 80 8. the sh-dsp does not have a repeat valid flag; repeats become invalid when the rc counter becomes 0. when the rc counter is not 0 and the pc counter matches the re register contents, repeating begins. when the rc counter is set to 0, the repeat program (loop) is invalid but the loop is executed only once and does not return to the starting instruction of the loop as when rc is 1. when the rc counter is set to 1, the repeat module is executed only once. though it does not return to the repeat program (loop) start instruction, the rc counter becomes zero when the repeat module is executed. 9. if there are four or more instructions in the loop, the branched instructions including the subroutine call back and return instructions cannot be used for the ?nst3?through ?nst5 instructions as branch destination address. if they are executed, the repeat control does not work correctly. if the branch destination is ?ptstart?or any address ahead of it, content of rc in the sr register is not updated. 10. while the repeat is being executed, interruption is restricted. figure 4.23 shows the flow for each stage of ex. the initial ex stage of interruption or the bus error exception is usually started immediately after the ex stage of the instruction is completed (indicated by ??. however, in the ex stage of the next instr0, only the bus error exception can be designated by ??to continue. at the ex stage of instr1, neither interruption nor bus exception can be continued by ?? only the ex stage of instr2 can be continued. 81 1-step repeat 2-step repeat 3-step repeat a: all interruption and bus error exceptions are accepted. b: only the bus error exception is accepted. c: no interruption and bus error exceptions are accepted. when rc>=1 more than 4 steps repeat start(end): instr0 instr1 instr2 ? a ? b ? c ? a start: end: instr0 instr1 instr2 instr3 ? a ? b ? c ? c ? a start: end: instr0 instr1 instr2 instr3 instr4 ? a ? b ? c ? c ? c ? a start: end: instr0 instr1 instr n-3 instr n-2 instr n-1 instr n instr n+1 ? a ? a or c (when returning from instr n) ? a ? a ? b ? c ? c ? c ? a when rc=0: all interruptions and bus errors are accepted. : : : figure 4.23 restriction on acceptance of interruption by repeat module 4.18.1 actual programming the repeat start register (rs) and repeat end register (re) store the repeat start address and repeat end address respectively. addresses stored in these registers are changed depending on the number of instructions in the repeat program (loop). this rule is shown below. repeat_start: address of repeat start instruction repeat_start0: address of instruction one higher than the repeat end instruction repeat_start3: address of instruction three higher than the repeat end instruction 82 table 4.35 rs and re setup rule number of instructions in repeat program (loop) register 1 2 3 >=4 rs repeat_start0+8 repeat_start0+6 repeat_start0+4 repeat_start re repeat_start0+4 repeat_start0+4 repeat_start0+4 repeat_end3+4 an example of an actual repeat program (loop) assuming various cases based on the above table is given below: case 1: one repeat instruction ldrs rptstart0+8;(rptstart) ldre rptstart0+4;(rptstart) setrc rptcount; ---- rptstart0:instr0; rtpstart: instr1; repeat instruction instr2; case 2: two repeat instructions ldrs rptstart0+6;(rptstart) ldre rptstart0+4;(rptend) setrc rptcount; ---- rptstart0:instr0; rtpstart: instr1; repeat instruction 1 rptend: instr2; repeat instruction 2 instr3; 83 case 3: three repeat instructions ldrs rptstart0+4;(rptstart) ldre rptstart0+4;(rptend) setrc rptcount; ---- rptstart0:instr0; rtpstart: instr1; repeat instruction 1 instr2; repeat instruction 2 rptend: instr3; repeat instruction 3 instr4; case 4: four or more instructions ldrs rptstart; ldre rptstart3+4;(rptend) setrc rptcount; ---- rptstart0:instr0; rtpstart: instr1; repeat instruction 1 instr2; repeat instruction 2 instr3; repeat instruction 3 ----------------------------------------- rptend3: instrn-3; repeat instruction n instrn-2; repeat instruction n-2 instrn-1; repeat instruction n-1 rptend: instrn; repeat instruction n instrn+1; the above example can be used as a template when programming this repeat program (loop) sequence. extension instruction ?epeat?can simplify the problems of such complicated labeling and offset. details are described in note 2 below. note 2. extension instruction repeat the extension instruction repeat can simplify the delicate handling of the labeling and offset described in table 4.35 and note 1. labels used are shown below. rptstart: rptstart: address of first instruction of repeat program (loop) rptend: address of last instruction of repeat program (loop) pptcount: repeat count immediate no. use this instruction as described below. 84 repeat count can be designated as immediate value #imm or register indirect value rn. case 1: one repeat instruction repeat rptstart, rptstart, rptcount ---- instr0; rptstart: instr1; repeat instruction 1 instr2; case 2: two repeat instructions repeat rptstart, rptend, rptcount ---- instr0; rptstart: instr1; repeat instruction 1 rptend: instr2; repeat instruction 2 case 3: three repeat instructions repeat rptstart, rptend, rptcount ---- instr0; rptstart: instr1; repeat instruction 1 instr2; repeat instruction 2 rptend: instr3; repeat instruction 3 case 4: four or more instructions repeat rptstart, rptstart, rptcount ---- instr0; rtpstart: instr1; repeat instruction 1 instr2; repeat instruction 2 instr3; repeat instruction 3 ----------------------------------------- instrn-3; repeat instruction n-3 instrn-2; repeat instruction n-2 instrn-1; repeat instruction n-1 rptend: instrn; repeat instruction n instrn+1; 85 result of extension of each case corresponds to the case 1 in note 1. 4.19 conditional instructions and data transfers data operation instructions include both unconditional and conditional instructions. data transfer instructions that execute both in parallel can be specified, but they will always execute regardless of whether the condition is met without affecting the data transfer instruction. the following is an example of a conditional instruction and a data transfer: dct padd x0, y0, a0 movx.w @r4, x0 movy.w a0,@r6+r9 when condition is true: before execution: x0= h'33333333, y0= h'55555555, a0=h'123456789a, r4=h'00008000, r6=h'00008233, r1=h'00000004 (r4)=h'1111, (r6)=h'2222 after execution: x0=h'11110000, y0= h'55555555, a0=h'00888888, r4=h'00008002, r6=h'00008237, r1=h'00000004 (r4)=h'1111, (r6)=h'1234 when condition is false: before execution: x0=h'33333333, y0= h'55555555, a0=h'123456789a, r4=h'00008000, r6=h'00008233, r1=h'00000004 (r4)=h'1111, (r6)=h'2222 after execution: x0=h'11110000, y0= h'55555555, a0= h'123456789a, r4=h'00008002, r6=h'00008237, r1=h'00000004 (r4)=h'1111, (r6)=h'1234 87 section 5 instruction set the sh-dsp instructions are divided into three groups. cpu instructions are executed by the cpu core, and dsp data transfer instructions and dsp operation instructions are executed by the dsp unit. some cpu instructions support dsp functions. the description of the instruction set is divided into these three groups. 5.1 instruction set for cpu instructions table 5.1 lists instructions by classification. table 5.1 classification of cpu instructions applicable instructions classification types operation code function sh-1 sh-2 sh- dsp no. of instructions data transfer 5 mov data transfer immediate data transfer peripheral module data transfer structure data transfer 39 mova effective address transfer movt t bit transfer swap swap of upper and lower bytes xtrct extraction of the middle of registers connected arithmetic 21 add binary addition 33 operations addc binary addition with carry addv binary addition with overflow check cmp/cond comparison div1 division div0s initialization of signed division div0u initialization of unsigned division dmuls signed double-length multiplication dmulu unsigned double-length multiplication dt decrement and test exts sign extension extu zero extension mac multiply/accumulate double-length multiply/accumulate operation 88 table 5.1 classification of cpu instructions (cont) applicable instructions classification types operation code function sh-1 sh-2 sh- dsp no. of instructions arithmetic operations mul double-length multiplication (32 32 bits) (cont) muls signed multiplication (16 16 bits) mulu unsigned multiplication (16 16 bits) neg negation negc negation with borrow sub binary subtraction subc binary subtraction with carry subv binary subtraction with underflow check logic 6 and logical and 14 operations not bit inversion or logical or tas memory test and bit set tst logical and and t bit set xor exclusive or shift 10 rotcl one-bit left rotation with t bit 14 rotcr one-bit right rotation with t bit rotl one-bit left rotation rotr one-bit right rotation shal one-bit arithmetic left shift shar one-bit arithmetic right shift shll one-bit logical left shift shlln n-bit logical left shift shlr one-bit logical right shift shlrn n-bit logical right shift 89 table 5.1 classification of cpu instructions (cont) applicable instructions classification types operation code function sh-1 sh-2 sh- dsp no. of instructions branch 9 bf conditional branch (t = 0) 11 conditional branch with delay bt conditional branch (t = 1) conditional branch with delay bra unconditional branch braf unconditional branch bsr branch to subroutine procedure bsrf branch to subroutine procedure jmp unconditional branch jsr branch to subroutine procedure rts return from subroutine procedure system 14 clrmac mac register clear 71 control clrt t bit clear ldc load to control register ldre load to repeat end register ldrs load to repeat start register lds load to system register nop no operation rte return from exception processing setrc set number of repeats sett t bit set sleep shift into power-down state stc storing control register data sts storing system register data trapa trap exception handling total:65 182 instruction codes, operation, and execution cycles are listed as shown in table 10.2 by classification. 90 table 5.2 instruction code format item format explanation instruction mnemonic op.sz src,dest op: operation code sz: size src: source dest: destination rm: source register rn: destination register imm: immediate data disp: displacement* 1 instruction code msb ? lsb mmmm: source register nnnn: destination register 0000: r0 0001: r1 ........... 1111: r15 iiii: immediate data dddd: displacement operation summary ? , ? (xx) m/q/t & | ^ ~ < 91 5.1.1 data transfer instructions table 5.3 data transfer instructions applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp mov #imm,rn imm ? sign extension ? rn 1 mov.w @(disp,pc),rn (disp 2 + pc) ? sign extension ? rn 1 mov.l @(disp,pc),rn (disp 4 + pc) ? rn 1 mov rm,rn rm ? rn 1 mov.b rm,@rn rm ? (rn) 1 mov.w rm,@rn rm ? (rn) 1 mov.l rm,@rn rm ? (rn) 1 mov.b @rm,rn (rm) ? sign extension ? rn 1 mov.w @rm,rn (rm) ? sign extension ? rn 1 mov.l @rm,rn (rm) ? rn 1 mov.b rm,@?n rn? ? rn, rm ? (rn) 1 mov.w rm,@?n rn? ? rn, rm ? (rn) 1 mov.l rm,@?n rn? ? rn, rm ? (rn) 1 mov.b @rm+,rn (rm) ? sign extension ? rn, rm + 1 ? rm 1 mov.w @rm+,rn (rm) ? sign extension ? rn, rm + 2 ? rm 1 mov.l @rm+,rn (rm) ? rn, rm + 4 ? rm 1 mov.b r0,@(disp,rn) r0 ? (disp + rn) 1 mov.w r0,@(disp,rn) r0 ? (disp 2 + rn) 1 mov.l rm,@(disp,rn) rm ? (disp 4 + rn) 1 mov.b @(disp,rm),r0 (disp + rm) ? sign extension ? r0 1 mov.w @(disp,rm),r0 (disp 2 + rm) ? sign extension ? r0 1 mov.l @(disp,rm),rn (disp 4 + rm) ? rn 1 mov.b rm,@(r0,rn) rm ? (r0 + rn) 1 mov.w rm,@(r0,rn) rm ? (r0 + rn) 1 92 table 5.3 data transfer instructions (cont) applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp mov.l rm,@(r0,rn) rm ? (r0 + rn) 1 mov.b @(r0,rm),rn (r0 + rm) ? sign extension ? rn 1 mov.w @(r0,rm),rn (r0 + rm) ? sign extension ? rn 1 mov.l @(r0,rm),rn (r0 + rm) ? rn 1 mov.b r0,@(disp, gbr) r0 ? (disp + gbr) 1 mov.w r0,@(disp, gbr) r0 ? (disp 2 + gbr) 1 mov.l r0,@(disp, gbr) r0 ? (disp 4 + gbr) 1 mov.b @(disp,gbr) ,r0 (disp + gbr) ? sign extension ? r0 1 mov.w @(disp,gbr) ,r0 (disp 2 + gbr) ? sign extension ? r0 1 mov.l @(disp,gbr) ,r0 (disp 4 + gbr) ? r0 1 mova @(disp,pc), r0 disp 4 + pc ? r0 1 movt rn t ? rn 1 swap.b rm,rn rm ? swap the bottom two bytes ? reg 1 swap.w rm,rn rm ? swap two consecutive words ? rn 1 xtrct rm,rn rm: middle 32 bits of rn ? rn 1 93 5.1.2 arithmetic instructions table 5.4 arithmetic instructions applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp add rm,rn rn + rm ? rn 1 add #imm,rn rn + imm ? rn 1 addc rm,rn rn + rm + t ? rn, carry ? t 1 carry addv rm,rn rn + rm ? rn, overflow ? t 1 overflow cmp/eq #imm,r0 if r0 = imm, 1 ? t, if r0 1 imm, 0 ? t 1 comparison result cmp/eq rm,rn if rn = rm, 1 ? t, if rn 1 rm, 0 ? t 1 comparison result cmp/hs rm,rn if rn 3 rm with unsigned data, 1 ? t, if rn < rm, 0 ? t 1 comparison result cmp/ge rm,rn if rn 3 rm with signed data, 1 ? t, if rn < rm, 0 ? t 1 comparison result cmp/hi rm,rn if rn > rm with unsigned data, 1 ? t, if rn rm, 0 ? t 1 comparison result cmp/gt rm,rn if rn > rm with signed data, 1 ? t, if rn rm, 0 ? t 1 comparison result cmp/pl rn if rn > 0, 1 ? t, if rn 0, 0 ? t 1 comparison result cmp/pz rn if rn 3 0, 1 ? t, if rn < 0, 0 ? t 1 comparison result cmp/str rm,rn if rn and rm have an equivalent byte, 1 ? t, if not equivalent byte, 0 ? t 1 comparison result div1 rm,rn single-step division (rn/rm) 1 calculation result div0s rm,rn msb of rn ? q, msb of rm ? m, m? q ? t 1 calculation result div0u 0 ? m/q/t 1 0 94 table 5.4 arithmetic instructions (cont) applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp dmuls.l rm,rn signed operation of rn rm ? mach, macl 32 32 ? 64 bits 2?* dmulu.l rm,rn unsigned operation of rn rm ? mach, macl 32 32 ? 64 bits 2?* dt rn rn ?1 ? rn, if rn = 0, 1 ? t, else 0 ? t 1 comparison result exts.b rm,rn a byte in rm is sign-extended ? rn 1 exts.w rm,rn a word in rm is sign- extended ? rn 1 extu.b rm,rn a byte in rm is zero-extended ? rn 1 extu.w rm,rn a word in rm is zero- extended ? rn 1 mac.l @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac 3/(2?)* mac.w @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac (sh-2) 16 16 + 64 ? 64 bits (sh-1) 16 16 + 42 ? 42 bits 3/(2)* mul.l rm,rn rn rm ? macl 32 32 ? 32 bits 2?* muls.w rm,rn signed operation of rn rm ? mac 16 16 ? 32 bits 1?* mulu.w rm,rn unsigned operation of rn rm ? mac 16 16 ? 32 bits 1?* 95 table 5.4 arithmetic instructions (cont) applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp neg rm,rn 0?m ? rn 1 negc rm,rn 0?m? ? rn, borrow ? t 1 borrow sub rm,rn rn?m ? rn 1 subc rm,rn rn?m? ? rn, borrow ? t 1 borrow subv rm,rn rn?m ? rn, underflow ? t 1 underflow note: the normal minimum number of execution cycles. (the number in parentheses is the number of cycles when there is contention with following instructions.) 5.1.3 logic operation instructions table 5.5 logic operation instructions applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp and rm,rn rn & rm ? rn 1 and #imm,r0 r0 & imm ? r0 1 and.b #imm,@(r0,gbr) (r0 + gbr) & imm ? (r0 + gbr) 3 not rm,rn ~rm ? rn 1 or rm,rn rn | rm ? rn 1 or #imm,r0 r0 | imm ? r0 1 or.b #imm,@(r0,gbr) (r0 + gbr) | imm ? (r0 + gbr) 3 tas.b @rn if (rn) is 0, 1 ? t; if not 0, 0 ? t. also, 1 ? msb of (rn) regardless of value of (rn) 4 test result tst rm,rn rn & rm; if the result is 0, 1 ? t, if not 0, 0 ? t 1 test result 96 table 5.5 logic operation instructions (cont) applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp tst #imm,r0 r0 & imm; if the result is 0, 1 ? t, if not 0, 0 ? t 1 test result tst.b #imm,@(r0,gbr) (r0 + gbr) & imm; if the result is 0, 1 ? t, if not 0, 0 ? t 3 test result xor rm,rn rn ^ rm ? rn 1 xor #imm,r0 r0 ^ imm ? r0 1 xor.b #imm,@(r0,gbr) (r0 + gbr) ^ imm ? (r0 + gbr) 3 5.1.4 shift instructions table 5.6 shift instructions applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp rotl rn t ? rn ? msb 1 msb rotr rn lsb ? rn ? t 1 lsb rotcl rn t ? rn ? t 1 msb rotcr rn t ? rn ? t 1 lsb shal rn t ? rn ? 0 1 msb shar rn msb ? rn ? t 1 lsb shll rn t ? rn ? 0 1 msb shlr rn 0 ? rn ? t 1 lsb shll2 rn rn << 2 ? rn 1 shlr2 rn rn >> 2 ? rn 1 shll8 rn rn << 8 ? rn 1 shlr8 rn rn >> 8 ? rn 1 shll16 rn rn << 16 ? rn 1 shlr16 rn rn >> 16 ? rn 1 97 5.1.5 branch instructions table 5.7 branch instructions applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp bf label if t = 0, disp 2 + pc ? pc; if t = 1, nop (where label is disp + pc) 3/1* bf/s label delayed branch, if t = 0, disp 2 + pc ? pc; if t = 1, nop 2/1* bt label delayed branch, if t = 1, disp 2 + pc ? pc; if t = 0, nop 3/1* bt/s label if t = 1, disp 2 + pc ? pc; if t = 0, nop 2/1* bra label delayed branch, disp 2 + pc ? pc 2 braf rm delayed branch, rm + pc ? pc 2 bsr label delayed branch, pc ? pr, disp 2 + pc ? pc 2 bsrf rm delayed branch, pc ? pr, rm + pc ? pc 2 jmp @rm delayed branch, rm ? pc 2 jsr @rm delayed branch, pc ? pr, rm ? pc 2 rts delayed branch, pr ? pc 2 note: one state when it does not branch. 98 5.1.6 system control instructions table 5.8 system control instructions applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp clrmac 0 ? mach,macl 1 clrt 0 ? t10 ldc rm,sr rm ? sr 1 lsb ldc rm,gbr rm ? gbr 1 ldc rm,vbr rm ? vbr 1 ldc rm,mod rm ? mod 1 ldc rm,re rm ? re 1 ldc rm,rs rm ? rs 1 ldc.l @rm+,sr (rm) ? sr,rm+4 ? rm 3 lsb ldc.l @rm+,gbr (rm) ? gbr,rm+4 ? rm 3 ldc.l @rm+,vbr (rm) ? vbr,rm+4 ? rm 3 ldc.l @rm+,mod (rm) ? mod,rm+4 ? rm 3 ldc.l @rm+,re (rm) ? re,rm+4 ? rm 3 ldc.l @rm+,rs (rm) ? rs,rm+4 ? rm 3 ldre @(disp,pc) disp 2+pc ? re 1 ldrs @(disp,pc) disp 2+pc ? rs 1 lds rm,mach rm ? mach 1 lds rm,macl rm ? macl 1 lds rm,pr rm ? pr 1 lds rm,dsr rm ? dsr 1 lds rm,a0 rm ? a0 1 lds rm,x0 rm ? x0 1 lds rm,x1 rm ? x1 1 lds rm,y0 rm ? y0 1 lds rm,y1 rm ? y1 1 lds.l @rm+,mach (rm) ? mach,rm+4 ? rm 1 lds.l @rm+,macl (rm) ? macl,rm+4 ? rm 1 lds.l @rm+,pr (rm) ? pr,rm+4 ? rm 1 lds.l @rm+,dsr (rm) ? dsr,rm+4 ? rm 1 99 table 5.8 system control instructions (cont) applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp lds.l @rm+,a0 (rm) ? a0,rm+4 ? rm 1 lds.l @rm+,x0 (rm) ? x0,rm+4 ? rm 1 lds.l @rm+,x1 (rm) ? x1,rm+4 ? rm 1 lds.l @rm+,y0 (rm) ? y0,rm+4 ? rm 1 lds.l @rm+,y1 (rm) ? y1,rm+4 ? rm 1 nop no operation 1 rte delayed branch, stack area, ? pc/sr 4 lsb setrc rn rn[11:0] ? rc (sr[27:16]) 1 setrc #imm imm ? rc(sr[23:16]),zeros ? sr[27:24] 1 sett 1 ? t1 sleep sleep 3* stc sr,rn sr ? rn 1 stc gbr,rn gbr ? rn 1 stc vbr,rn vbr ? rn 1 stc mod,rn mod ? rn 1 stc re,rn re ? rn 1 stc rs,rn rs ? rn 1 stc.l sr,@-rn rn? ? rn,sr ? (rn) 2 stc.l gbr,@-rn rn? ? rn,gbr ? (rn) 2 stc.l vbr,@-rn rn? ? rn,vbr ? (rn) 2 stc.l mod,@-rn rn? ? rn,mod ? (rn) 2 stc.l re,@-rn rn? ? rn,re ? (rn) 2 stc.l rs,@-rn rn? ? rn,rs ? (rn) 2 sts mach,rn mach ? rn 1 sts macl,rn macl ? rn 1 sts pr,rn pr ? rn 1 sts dsr,rn dsr ? rn 1 sts a0,rn a0 ? rn 1 sts x0,rn x0 ? rn 1 100 table 5.8 system control instructions (cont) applicable instructions instruction operation cycles t bit sh-1 sh-2 sh- dsp sts x1,rn x1 ? rn 1 sts y0,rn y0 ? rn 1 sts y1,rn y1 ? rn 1 sts.l mach,@-rn rn? ? rn,mach ? (rn) 1 sts.l macl,@-rn rn? ? rn,macl ? (rn) 1 sts.l pr,@-rn rn? ? rn,pr ? (rn) 1 sts.l dsr,@-rn rn? ? rn,dsr ? (rn) 1 sts.l a0,@-rn rn? ? rn,a0 ? (rn) 1 sts.l x0,@-rn rn? ? rn,x0 ? (rn) 1 sts.l x1,@-rn rn? ? rn,x1 ? (rn) 1 sts.l y0,@-rn rn? ? rn,y0 ? (rn) 1 sts.l y1,@-rn rn? ? rn,y1 ? (rn) 1 trapa #imm pc/sr ? stack area, (imm 4+vbr) ? pc 6 note: the number of execution states before the chip enters the sleep state. this table lists the minimum execution cycles. in practice, the number of execution cycles increases when the instruction fetch is in contention with data access or when the destination register of a load instruction (memory ? register) is the same as the register used by the next instruction, or when the branch destination address of a branch instruction is a 4n + 2 address. 5.1.7 cpu instructions that support dsp functions several system control instructions have been added to the cpu core instructions to support dsp functions. the rs, re, and mod registers (which support modulo addressing) have been added, and an rc counter has been added to the sr register. ldc and stc instructions have been added to access these. lds and sts instructions have also been added for accessing the dsp registers dsr, a0, x0, x1, y0, and y1. a setrc instruction has been added for setting the value of the repeat counter (rc) in the sr register (bits 16?7). when the operand of the setrc instruction is immediate, 8 bits of immediate data are set in bits 16?3 of the sr register and bits 24?7 are cleared. when the operand is a register, the 12 bits 0?1 of the register are set in bits 16?7 of the sr register. in addition to the new ldc instructions, the ldre and ldrs instructions have been added for setting the repeat start address and repeat end address in the rs and re registers. table 5.9 shows the added instructions. 101 table 5.9 added cpu instructions instruction operation code cycles t bit ldc rm,mod rm ? mod 0100mmmm01011110 1 ldc rm,re rm ? re 0100mmmm01111110 1 ldc rm,rs rm ? rs 0100mmmm01101110 1 ldc.l @rm+,mod (rm) ? mod , rm + 4 ? rm 0100mmmm01010111 3 ldc.l @rm+,re (rm) ? re , rm + 4 ? rm 0100mmmm01110111 3 ldc.l @rm+,rs (rm) ? rs , rm + 4 ? rm 0100mmmm01100111 3 stc mod,rn mod ? rn 0000nnnn01010010 1 stc re,rn re ? rn 0000nnnn01110010 1 stc rs,rn rs ? rn 0000nnnn01100010 1 stc.l mod,@-rn rn? ? rn , mod ? (rn) 0100nnnn01010011 2 stc.l re,@-rn rn? ? rn , re ? (rn) 0100nnnn01110011 2 stc.l rs,@-rn rn? ? rn , rs ? (rn) 0100nnnn01100011 2 lds rm,dsr rm ? dsr 0100mmmm01101010 1 lds.l @rm+,dsr (rm) ? dsr , rm + 4 ? rm 0100mmmm01100110 1 lds rm,a0 rm ? a0 0100mmmm01110110 1 lds.l @rm+,a0 (rm) ? a0 , rm + 4 ? rm 0100mmmm01100110 1 lds rm,x0 rm ? x0 0100mmmm01110110 1 lds.l @rm+,x0 (rm) ? x0 , rm + 4 ? rm 0100mmmm01100110 1 lds rm,x1 rm ? x1 0100mmmm01110110 1 lds.l @rm+,x1 (rm) ? x1 , rm + 4 ? rm 0100mmmm01100110 1 lds rm,y0 rm ? y0 0100mmmm01110110 1 lds.l @rm+,y0 (rm) ? y0 , rm + 4 ? rm 0100mmmm01100110 1 lds rm,y1 rm ? y1 , rm + 4 ? rm 0100mmmm01110110 1 lds.l @rm,y1 (rm) ? y1 , rm + 4 ? rm 0100mmmm01100110 1 sts dsr,rn dsr ? rn 0000nnnn01101010 1 sts.l dsr,@-rn rn? ? rn , dsr ? (rn) 0100nnnn01100010 1 sts a0,rn a0 ? rn 0000nnnn01111010 1 sts.l a0,@-rn rn? ? rn , a0 ? (rn) 0100nnnn01110010 1 sts x0,rn x0 ? rn 0000nnnn01111010 1 sts.l x0,@-rn rn? ? rn , x0 ? (rn) 0100nnnn01110010 1 sts x1,rn x1 ? rn 0000nnnn01111010 1 sts.l x1,@-rn rn? ? rn , x1 ? (rn) 0100nnnn01110010 1 102 table 5.9 added cpu instructions (cont) instruction operation code cycles t bit sts y0,rn y0 ? rn 0000nnnn10101010 1 sts.l y0,@-rn rn? ? rn , y0 ? (rn) 0100nnnn10100010 1 sts y1,rn y1 ? rn 0000nnnn10111010 1 sts.l y1,@-rn rn? ? rn , y1 ? (rn) 0100nnnn10110010 1 setrc rm rm[11:0] ? rc (sr[27:16]) repeat flag ? rf1, rf0 0100mmmm00010100 1 setrc #imm imm ? rc(sr[23:16]), zeros ? sr[27:24], repeat flag ? rf1, rf0 10000010iiiiiiii 1 ldrs @(disp,pc) disp 2+pc ? rs 10001100dddddddd 1 ldre @(disp,pc) disp 2+pc ? re 10001110dddddddd 1 5.2 dsp data transfer instruction set table 5.10 shows the dsp data transfer instructions by category. table 5.10 dsp data transfer instruction categories category instruction types operation code function no. of instructions double data transfer instructions 4 nopx x memory no operation 14 movx x memory data transfer nopy y memory no operation movy y memory data transfer single data transfer instructions 1 movs single data transfer 16 total 5 total 30 the data transfer instructions are divided into two groups, double data transfers and single data transfers. double data transfers are combined with dsp operation instructions to create dsp parallel processing instructions. parallel processing instructions are 32 bits long and include a double data transfer instruction in field a. double data transfers that are not parallel processing instructions and single data transfer instructions are 16 bits long. in double data transfers, x memory and y memory can be accessed simultaneously in parallel. one instruction is specified each for the respective x and y memory data accesses. the ax pointer is used for accessing x memory; the ay pointer is used for accessing y memory. double data transfers can only access x and y memory. 103 single data transfers can be accessed from any area. in single data transfers, the ax pointer and two other pointers are used as the as pointer. 5.2.1 double data transfer instructions (x memory data) table 5.11 double data transfer instructions (x memory data) instruction operation code cycles t bit nopx no operation 1111000*0*0*00** 1 movx.w @ax,dx (ax) ? msw of dx,0 ? lsw of dx 111100a*d*0*01** 1 movx.w @ax+,dx (ax) ? msw of dx,0 ? lsw of dx,ax+2 ? ax 111100a*d*0*10** 1 movx.w @ax+ix,dx (ax) ? msw of dx,0 ? lsw of dx,ax+ix ? ax 111100a*d*0*11** 1 movx.w da,@ax msw of da ? (ax) 111100a*d*1*01** 1 movx.w da,@ax+ msw of da ? (ax),ax+2 ? ax 111100a*d*1*10** 1 movx.w da,@ax+ix msw of da ? (ax),ax+ix ? ax 111100a*d*1*11** 1 5.2.2 double data transfer instructions (y memory data) table 5.12 double data transfer instructions (y memory data) instruction operation code cycles t bit nopy no operation 111100*0*0*0**00 1 movy.w @ay,dy (ay) ? msw of dy,0 ? lsw of dy 111100*a*d*0**01 1 movy.w @ay+,dy (ay) ? msw of dy,0 ? lsw of dy, ay+2 ? ay 111100*a*d*0**10 1 movy.w @ay+iy,dy (ay) ? msw of dy,0 ? lsw of dy, ay+iy ? ay 111100*a*d*0**11 1 movy.w da,@ay msw of da ? (ay) 111100*a*d*1**01 1 movy.w da,@ay+ msw of da ? (ay),ay+2 ? ay 111100*a*d*1**10 1 movy.w da,@ay+iy msw of da ? (ay),ay+iy ? ay 111100*a*d*1**11 1 104 5.2.3 single data transfer instructions table 5.13 single data transfer instructions instruction operation code cycles t bit movs.w @-as,ds as? ? as,(as) ? msw of ds,0 ? lsw of ds 111101aadddd0000 1 movs.w @as,ds (as) ? msw of ds,0 ? lsw of ds 111101aadddd0100 1 movs.w @as+,ds (as) ? msw of ds,0 ? lsw of ds, as+2 ? as 111101aadddd1000 1 movs.w @as+ix,ds (as) ? msw of ds,0 ? lsw of ds, as+ix ? as 111101aadddd1100 1 movs.w ds,@-as as? ? as,msw of ds ? (as)* 111101aadddd0001 1 movs.w ds,@as msw of ds ? (as)* 111101aadddd0101 1 movs.w ds,@as+ msw of ds ? (as),as+2 ? as* 111101aadddd1001 1 movs.w ds,@as+is msw of ds ? (as),as+is ? as* 111101aadddd1101 1 movs.l @-as,ds as? ? as,(as) ? ds 111101aadddd0010 1 movs.l @as,ds (as) ? ds 111101aadddd0110 1 movs.l @as+,ds (as) ? ds,as+4 ? as 111101aadddd1010 1 movs.l @as+is,ds (as) ? ds,as+is ? as 111101aadddd1110 1 movs.l ds, @-as as? ? as,ds ? (as) 111101aadddd0011 1 movs.l ds,@as ds ? (as) 111101aadddd0111 1 movs.l ds,@as+ ds ? (as),as+4 ? as 111101aadddd1011 1 movs.l ds,@as+is ds ? (as),as+is ? as 111101aadddd1111 1 note: when guard bit registers a0g and a1g (eight-bit registers) are specified as the source operand ds, the data is sign-extended and used. 105 table 5.14 lists the correspondence between dsp data transfer operands and registers. cpu core registers are used as pointer addresses to indicate memory addresses. table 5.14 correspondence between dsp data transfer operands and registers superh (cpu core) registers oper- and r0 r1 r2 (as2) r3 (as3) r4 (ax0) (as0) r5 (ax1) (ax0) r6 (ay0) r7 (ay1) r8 (ix) r9 (iy) ax yes yes ix (is) yes dx ay yes yes iy yes dy da as yes yes yes yes ds oper- dsp registers and x0 x1 y0 y1 m0 m1 a0 a1 a0g a1g ax ix (is) dx yes yes ay iy dy yes yes da yes yes as ds yes yes yes yes yes yes yes yes yes yes note: yes indicates that the register can be set. 5.3 dsp operation instruction set dsp operation instructions are digital signal processing instructions that are processed by the dsp unit. their instruction code is 32 bits long. multiple instructions can be processed in parallel. the instruction code is divided into two fields, a and b. field a specifies a parallel data transfer instruction and field b specifies a single or double data operation instruction. instructions can be 106 specified independently, and their execution is independent and in parallel. parallel data transfer instructions specified in field a are exactly the same as double data transfer instructions. the data operation instructions of field b are of three types: double data operation instructions, conditional single data operation instructions, and unconditional single data operation instructions. table 5.15 shows the format of dsp operation instructions. the operands are selected independently from the dsp register. table 5.16 shows the correspondence of dsp operation instruction operands and registers. table 5.15 instruction formats for dsp operation instructions classification instruction forms instruction double data operation instructions (6 operands) aluop. sx, sy, du mltop. se, sf, dg padd pmuls, psub pmuls conditional single data operation instructions 3 operands aluop. sx, sy, dz dct aluop. sx, sy, dz dcf aluop. sx, sy, dz padd, pand, por, psha, pshl, psub, pxor 2 operands aluop. sx, dz dct aluop. sx, dz dcf aluop. sx, dz aluop. sy, dz dct aluop. sy, dz dcf aluop. sy, dz pcopy, pdec, pdmsb, pinc, plds, psts, pneg 1 operand aluop. dz dct aluop. dz dcf aluop. dz pclr, psha #imm, pshl #imm unconditional single data operation instructions 3 operands aluop. sx, sy, du mltop. se, sf, dg paddc, psubc, pmuls 2 operands aluop. sx, dz aluop. sy, dz pcmp, pabs, prnd 107 table 5.16 correspondence between dsp operation instruction operands and registers alu and bpu instructions multiplication instructions register sx sy dz du se sf dg a0 yes yes yes yes a1 yes yes yes yes yes yes m0 yes yes yes m1 yes yes yes x0 yes yes yes yes yes x1 yes yes yes y0 yes yes yes yes yes y1 yes yes yes when writing parallel instructions, first write the field b instruction, then the field a instruction. the following is an example of a parallel processing program. padd a0,m0,a0 pmulsx0,y0,m0 movx.w @r4+,x0 movy.w @r6+,y0[;] dcf pinc x1,a1 movx.w a0,@r5+r8 movy.w@r7+,y0[;] pcmp x1,m0 movx.w @r4 [nopy][;] text in brackets ([]) can be omitted. the no operation instructions nopx and nopy can be omitted. semicolons (;) are used to demarcate instruction lines, but can be omitted. if semicolons are used, the space after the semicolon can be used for comments. the individual status codes (dc, n, z, v, gt) of the dsr register is always updated by unconditional alu operation instructions and shift operation instructions. conditional instructions do not update the status codes, even if the conditions have been met. multiplication instructions also do not update the status codes. dc bit definitions are determined by the specifications of the cs bits in the dsr register. table 5.17 shows the dsp operation instructions by category. 108 table 5.17 dsp operation instruction categories classification instruction types operation code function no. of in- structions alu arith- alu fixed decimal point operation 11 pabs absolute value operation 28 metic instructions padd addition opera- tion padd pmuls addition and signed multiplication instruc- paddc addition with carry tions pclr clear pcmp compare pcopy copy pneg invert sign psub subtraction psub pmuls subtraction and signed multiplication psubc subtraction with borrow alu integer operation 2 pdec decrement 12 instructions pinc increment msb detection instruction 1 pdmsb msb detection 6 rounding operation instruction 1 prnd rounding 2 alu logical operation 3 pand logical and instructions por logical or 9 pxor logical exclusive or fixed decimal point multiplication instruction 1 pmuls signed multiplication 1 shift arithmetic shift operation instruction 1 psha arithmetic shift 4 logical shift operation instruction 1 pshl logical shift 4 system control instructions 2 plds system register load 12 psts store from system register total 23 total 78 109 5.3.1 alu arithmetic operation instructions table 5.18 alu fixed decimal point operation instructions instruction operation code cycles dc bit pabs sx,dz if sx 3 0,sx ? dz if sx<0,0?sx ? dz 111110********** 10001000xx00zzzz 1 update pabs sy,dz if sy 3 0,sy ? dz if sy<0,0?y ? dz 111110********** 1010100000yyzzzz 1 update padd sx,sy,dz sx+sy ? dz 111110********** 10110001xxyyzzzz 1 update dct padd sx,sy,dz if dc=1,sx+sy ? dz if 0,nop 111110********** 10110010xxyyzzzz 1 dcf padd sx,sy,dz if dc=0,sx+sy ? dz if 1,nop 111110********** 10110011xxyyzzzz 1 padd sx,sy,du pmuls se,sf,dg sx+sy ? du msw of se msw of sf ? dg 111110********** 0111eeffxxyygguu 1 update paddc sx,sy,dz sx+sy+dc ? dz 111110********** 10110000xxyyzzzz 1 update pclr dz h'00000000 ? dz 111110********** 100011010000zzzz 1 update dct pclr dz if dc=1,h'00000000 ? dz if 0,nop 111110********** 100011100000zzzz 1 dcf pclr dz if dc=0,h'00000000 ? dz if 1,nop 111110********** 100011110000zzzz 1 pcmp sx,sy sx?y 111110********** 10000100xxyy0000 1 update pcopy sx,dz sx ? dz 111110********** 11011001xx00zzzz 1 update pcopy sy,dz sy ? dz 111110********** 1111100100yyzzzz 1 update dct pcopy sx,dz if dc=1,sx ? dz if 0,nop 111110********** 11011010xx00zzzz 1 110 table 5.18 alu fixed decimal point operation instructions (cont) instruction operation code cycles dc bit dct pcopy sy,dz if dc=1,sy ? dz if 0,nop 111110********** 1111101000yyzzzz 1 dcf pcopy sx,dz if dc=0,sx ? dz if 1,nop 111110********** 11011011xx00zzzz 1 dcf pcopy sy,dz if dc=0,sy ? dz if 1,nop 111110********** 1111101100yyzzzz 1 pneg sx,dz 0?x ? dz 111110********** 11001001xx00zzzz 1 update pneg sy,dz 0?y ? dz 111110********** 1110100100yyzzzz 1 update dct pneg sx,dz if dc=1,0?x ? dz if 0,nop 111110********** 11001010xx00zzzz 1 dct pneg sy,dz if dc=1,0?y ? dz if 0,nop 111110********** 1110101000yyzzzz 1 dcf pneg sx,dz if dc=0,0?x ? dz if 1,nop 111110********** 11001011xx00zzzz 1 dcf pneg sy,dz if dc=0,0?y ? dz if 1,nop 111110********** 1110101100yyzzzz 1 psub sx,sy,dz sx?y ? dz 111110********** 10100001xxyyzzzz 1 update dct psub sx,sy,dz if dc=1,sx?y ? dz if 0,nop 111110********** 10100010xxyyzzzz 1 dcf psub sx,sy,dz if dc=0,sx?y ? dz if 1,nop 111110********** 10100011xxyyzzzz 1 psub sx,sy,du pmuls se,sf,dg sx?y ? du msw of se msw of sf ? dg 111110********** 0110eeffxxyygguu 1 update psubc sx,sy,dz sx?y?c ? dz 111110********** 10100000xxyyzzzz 1 update 111 table 5.19 alu integer operation instructions instruction operation code cycles dc bit pdec sx,dz msw of sx ?1 ? msw of dz, clear lsw of dz 111110********** 10001001xx00zzzz 1 update pdec sy,dz msw of sy ?1 ? msw of dz, clear lsw of dz 111110********** 1010100100yyzzzz 1 update dct pdec sx,dz if dc=1, msw of sx ?1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10001010xx00zzzz 1 dct pdec sy,dz if dc=1, msw of sy ?1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1010101000yyzzzz 1 dcf pdec sx,dz if dc=0, msw of sx ?1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10001011xx00zzzz 1 dcf pdec sy,dz if dc=0, msw of sy ?1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1010101100yyzzzz 1 pinc sx,dz msw of sx + 1 ? msw of dz, clear lsw of dz 111110********** 10011001xx00zzzz 1 update pinc sy,dz msw of sy + 1 ? msw of dz, clear lsw of dz 111110********** 1011100100yyzzzz 1 update dct pinc sx,dz if dc=1, msw of sx + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011010xx00zzzz 1 dct pinc sy,dz if dc=1, msw of sy + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011101000yyzzzz 1 dcf pinc sx,dz if dc=0, msw of sx + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011011xx00zzzz 1 dcf pinc sy,dz if dc=0, msw of sy + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011101100yyzzzz 1 112 table 5.20 msb detection instructions instruction operation code cycles dc bit pdmsb sx,dz sx data msb position ? msw of dz, clear lsw of dz 111110********** 10011101xx00zzzz 1 update pdmsb sy,dz sy data msb position ? msw of dz, clear lsw of dz 111110********** 1011110100yyzzzz 1 update dct pdmsb sx,dz if dc=1, sx data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011110xx00zzzz 1 dct pdmsb sy,dz if dc=1, sy data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011111000yyzzzz 1 dcf pdmsb sx,dz if dc=0, sx data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011111xx00zzzz 1 dcf pdmsb sy,dz if dc=0, sy data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011111100yyzzzz 1 table 5.21 rounding operation instructions instruction operation code cycles dc bit prnd sx,dz sx+h'00008000 ? dz clear lsw of dz 111110********** 10011000xx00zzzz 1 update prnd sy,dz sy+h'00008000 ? dz clear lsw of dz 111110********** 1011100000yyzzzz 1 update 113 5.3.2 alu logical operation instructions table 5.22 alu logical operation instructions instruction operation code cycles dc bit pand sx,sy,dz sx & sy ? dz, clear lsw of dz 111110********** 10010101xxyyzzzz 1 update dct pand sx,sy,dz if dc=1, sx & sy ? dz, clear lsw of dz; if 0, nop 111110********** 10010110xxyyzzzz 1 dcf pand sx,sy,dz if dc=0, sx & sy ? dz, clear lsw of dz; if 1, nop 111110********** 10010111xxyyzzzz 1 por sx,sy,dz sx | sy ? dz, clear lsw of dz 111110********** 10110101xxyyzzzz 1 update dct por sx,sy,dz if dc=1, sx | sy ? dz, clear lsw of dz; if 0, nop 111110********** 10110110xxyyzzzz 1 dcf por sx,sy,dz if dc=0, sx | sy ? dz, clear lsw of dz; if 1, nop 111110********** 10110111xxyyzzzz 1 pxor sx,sy,dz sx ^ sy ? dz, clear lsw of dz 111110********** 10100101xxyyzzzz 1 update dct pxor sx,sy,dz if dc=1, sx ^ sy ? dz, clear lsw of dz; if 0, nop 111110********** 10100110xxyyzzzz 1 dcf pxor sx,sy,dz if dc=0, sx ^ sy ? dz, clear lsw of dz; if 1, nop 111110********** 10100111xxyyzzzz 1 5.3.3 fixed decimal point multiplication instructions table 5.23 fixed decimal point multiplication instructions instruction operation code cycles dc bit pmuls se,sf,dg msw of se msw of sf ? dg 111110********** 0100eeff0000gg00 1 114 5.3.4 shift operation instructions table 5.24 arithmetic shift instructions instruction operation code cycles dc bit psha sx,sy,dz if sy 3 0,sx< 115 table 5.25 logical shift operation instructions instruction operation code cycles dc bit pshl sx,sy,dz if sy 3 0,sx< 116 5.3.5 system control instructions table 5.26 system control instructions instruction operation code cycles dc bit plds dz,mach dz ? mach 111110********** 111011010000zzzz 1 plds dz,macl dz ? macl 111110********** 111111010000zzzz 1 dct plds dz,mach if dc=1,dz ? mach if 0,nop 111110********** 111011100000zzzz 1 dct plds dz,macl if dc=1,dz ? macl if 0,nop 111110********** 111111100000zzzz 1 dcf plds dz,mach if dc=0,dz ? mach if 1,nop 111110********** 111011110000zzzz 1 dcf plds dz,macl if dc=0,dz ? macl if 1,nop 111110********** 111111110000zzzz 1 psts mach,dz mach ? dz 111110********** 110011010000zzzz 1 psts macl,dz macl ? dz 111110********** 110111010000zzzz 1 dct psts mach,dz if dc=1,mach ? dz if 0,nop 111110********** 110011100000zzzz 1 dct psts macl,dz if dc=1,macl ? dz if 0,nop 111110********** 110111100000zzzz 1 dcf psts mach,dz if dc=0,mach ? dz if 1,nop 111110********** 110011110000zzzz 1 dcf psts macl,dz if dc=0,macl ? dz if 1,nop 111110********** 110111110000zzzz 1 5.3.6 nopx and nopy instruction code when there is no data transfer instruction to be processed in parallel with the dsp operation instruction, a nopx or nopy instruction can be written as the data transfer instruction or the instruction can be omitted. the operation code is the same in either case. table 5.27 shows the nopx and nopy instruction code. 117 table 5.27 sample nopx and nopy instruction code instruction code padd x0, y0, a0 movx. w @r4+, x0 movy.w @r6+r9, y0 1111100010110000 1000000010100000 padd x0, y0, a0 nopx movy.w @r6+r9, y0 1111100000110000 1000000010100000 padd x0, y0, a0 nopx nopy 1111100000000000 1000000010100000 padd x0, y0, a0 nopx padd x0, y0, a0 movx. w @r4+, x0 movy.w @r6+r9, y0 1111000010110000 movx. w @r4+, x0 nopy 1111000010000000 movs. w @r4+, x0 1111011010000000 nopx movy.w @r6+r9, y0 1111000000110000 movy.w @r6+r9, y0 nopx nopy 1111000000000000 nop 0000000000001001 119 section 6 instruction descriptions 6.1 instruction descriptions instructions are described in alphabetical order in three sections: cpu instructions, dsp data transfer instructions, and dsp operation instructions. this section describes instructions in alphabetical order using the format shown below in section 6.1.1. the actual descriptions begin at section 6.2.2. 6.1.1 sample description (name): classification class: indicates if the instruction is a delayed branch instruction or interrupt disabled instruction format abstract code cycle t bit applicable instructions assembler input format; imm and disp are numbers, expressions, or symbols a brief description of operation displayed in order msb ? lsb number of cycles when there is no wait state the value of t bit after the instruction is executed indicates whether the instruction applies to the sh-1, sh-2, or sh-dsp. description: description of operation notes: notes on using the instruction operation: operation written in c language. the following resources should be used. reads data of each length from address addr. an address error will occur if word data is read from an address other than 2n or if longword data is read from an address other than 4n: unsigned char read_byte(unsigned long addr); unsigned short read_word(unsigned long addr); unsigned long read_long(unsigned long addr); writes data of each length to address addr. an address error will occur if word data is written to an address other than 2n or if longword data is written to an address other than 4n: unsigned char write_byte(unsigned long addr, unsigned long data); unsigned short write_word(unsigned long addr, unsigned long data); unsigned long write_long(unsigned long addr, unsigned long data); 120 starts execution from the slot instruction located at an address (addr ?4). for delay_slot (4), execution starts from an instruction at address 0 rather than address 4. when execution moves from this function to one of the following instructions and one of the listed instructions precedes it, it will be considered an illegal slot instruction (the listed instructions become illegal slot instructions when used as delay slot instructions): bf, bt, bra, bsr, jmp, jsr, rts, rte, trapa, bf/s, bt/s, braf, bsrf delay_slot(unsigned long addr); unsigned log is_32bit_inst(unsigned long addr) if the address (addr_4) instruction is 32-bit, 2 is returned; 0 is returned if it is 16-bit. list registers: unsigned long r[16]; unsigned long sr,gbr,vbr; unsigned long mach,macl,pr; unsigned long pc; definition of sr structures: struct sr0 { unsigned long dummy0:4; unsigned long rc0:12; unsigned long dummy1:4; unsigned long dmy0:1; unsigned long dmx0:1; unsigned long m0:1; unsigned long q0:1; unsigned long i0:4; unsigned long rf10:1; unsigned long rf00:1; unsigned long s0:1; unsigned long t0:1; }; 121 definition of bits in sr: #define m ((*(struct sr0 *)(&sr)).m0) #define q ((*(struct sr0 *)(&sr)).q0) #define s ((*(struct sr0 *)(&sr)).s0) #define t ((*(struct sr0 *)(&sr)).t0) #define rf1 ((*struct sro *)(&sr)).rf10) #define rf0 ((*struct sro *)(&sr)).rf00) error display function: error( char *er ); the pc should point to the location four bytes after the current instruction. therefore, pc = 4; means the instruction starts execution from address 0, not address 4. examples: examples are written in assembler mnemonics and describe status before and after executing the instruction. characters in italics such as .align are assembler control instructions (listed below). for more information, see the cross assembler user manual. .org location counter set .data.w securing integer word data .data.l securing integer longword data .sdata securing string data .align 2 2-byte boundary alignment .align 4 2-byte boundary alignment .arepeat 16 16-repeat expansion .arepeat 32 32-repeat expansion .aendr end of repeat expansion of specified number note that the superh series cross assembler version 1.0 does not support the conditional assembler functions. notes: 1. in addressing modes that use the displacements listed below (disp), the assembler statements in this manual show the value prior to scaling ( 1, 2, and 4) according to the operand size. this is done to clarify the lsi operation. actual assembler statements should follow the rules of the assembler in question. @(disp:4, rn); indirect register addressing with displacement @(disp:8, gbr); indirect gbr addressing with displacement @(disp:8, pc); indirect pc addressing with displacement disp:8, disp:12:; pc relative addressing 122 2. 16-bit instruction code that is not assigned as instructions is handled as an ordinary illegal instruction and produces illegal instruction exception processing. example: h'ffff [ordinary illegal instruction] 3. an ordinary illegal instruction or branched instruction (i.e., an illegal slot instruction) that follows a bra, bt/s or another delayed branch instruction will cause illegal instruction exception processing. example 1: .... bra label .data.w h'ffff ? illegal slot instruction .... [h'ffff is an ordinary illegal instruction from the start] example 2: rte bt/s label ? illegal slot instruction 4. the delayed branch actual occurs after the slot instruction is executed. except for branches such as register updates, however, delayed branch instructions are executed before delayed slot instructions. for example, even when the contents of a register that stores a branch destination address in a delay slot are changed, the branch destination remains the register contents prior to the change. 5. when there ia an ordinary illegal instruction, branched instruction or an instruction to renew the sr, rs or re register (setrc, ldrs, etc.) in the last three instructions of a repeat program (loop) with three or less instructions or a program (loop) with four or more instructions, illegal instruction exception processing is started. refer to 4.19, dsp repeat (loop) control, for more information. 123 6.1.2 add (add binary): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp add rm,rn rm + rn ? rn 0011nnnnmmmm1100 1 add #imm,rn rn + #imm ? rn 0111nnnniiiiiiii 1 description: adds general register rn data to rm data, and stores the result in rn. 8-bit immediate data can be added instead of rm data. since the 8-bit immediate data is sign-extended to 32 bits, this instruction can add and subtract immediate data. operation: add(long m,long n) /* add rm,rn */ { r[n]+=r[m]; pc+=2; } addi(long i,long n) /* add #imm,rn */ { if ((i&0x80)==0) r[n]+=(0x000000ff & (long)i); else r[n]+=(0xffffff00 | (long)i); pc+=2; } examples: add r0,r1 ;before execution: r0 = h'7fffffff, r1 = h'00000001 ;after execution: r1 = h'80000000 add #h'01,r2 ;before execution: r2 = h'00000000 ; after execution: r2 = h'00000001 add #h'fe,r3 ;before execution: r3 = h'00000001 ;after execution: r3 = h'ffffffff 124 6.1.3 addc (add with carry): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp addc rm,rn rn + rm + t ? rn, carry ? t 0011nnnnmmmm1110 1 carry description: adds rm data and the t bit to general register rn data, and stores the result in rn. the t bit changes according to the result. this instruction can add data that has more than 32 bits. operation: addc (long m,long n) /* addc rm,rn */ { unsigned long tmp0,tmp1; tmp1=r[n]+r[m]; tmp0=r[n]; r[n]=tmp1+t; if (tmp0>tmp1) t=1; else t=0; if (tmp1>r[n]) t=1; pc+=2; } examples: clrt ;r0:r1 (64 bits) + r2:r3 (64 bits) = r0:r1 (64 bits) addc r3,r1 ;before execution: t = 0, r1 = h'00000001, r3 = h'ffffffff ;after execution: t = 1, r1 = h'0000000 addc r2,r0 ;before execution: t = 1, r0 = h'00000000, r2 = h'00000000 ;after execution: t = 0, r0 = h'00000001 125 6.1.4 addv (add with v flag overflow check): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp addv rm,rn rn + rm ? rn, overflow ? t 0011nnnnmmmm1111 1 overflow description: adds general register rn data to rm data, and stores the result in rn. if an overflow occurs, the t bit is set to 1. operation: addv(long m,long n) /*addv rm,rn */ { long dest,src,ans; if ((long)r[n]>=0) dest=0; else dest=1; if ((long)r[m]>=0) src=0; else src=1; src+=dest; r[n]+=r[m]; if ((long)r[n]>=0) ans=0; else ans=1; ans+=dest; if (src==0 || src==2) { if (ans==1) t=1; else t=0; } else t=0; pc+=2; } examples: addv r0,r1 ;before execution: r0 = h'00000001, r1 = h'7ffffffe, t = 0 ;after execution: r1 = h'7fffffff, t = 0 addv r0,r1 ;before execution: r0 = h'00000002, r1 = h'7ffffffe, t = 0 ;after execution: r1 = h'80000000, t = 1 126 6.1.5 and (and logical): logic operation instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp and rm,rn rn & rm ? rn 0010nnnnmmmm1001 1 and #imm,r0 r0 & imm ? r0 11001001iiiiiiii 1 and.b #imm, @(r0,gbr) (r0 + gbr) & imm ? (r0 + gbr) 11001101iiiiiiii 3 description: logically ands the contents of general registers rn and rm, and stores the result in rn. the contents of general register r0 can be anded with zero-extended 8-bit immediate data. 8-bit memory data pointed to by gbr relative addressing can be anded with 8-bit immediate data. note: after and #imm, r0 is executed and the upper 24 bits of r0 are always cleared to 0. operation: and(long m,long n) /* and rm,rn */ { r[n]&=r[m] pc+=2; } andi(long i) /* and #imm,r0 */ { r[0]&=(0x000000ff & (long)i); pc+=2; } andm(long i) /* and.b #imm,@(r0,gbr) */ { long temp; temp=(long)read_byte(gbr+r[0]); temp&=(0x000000ff & (long)i); write_byte(gbr+r[0],temp); pc+=2; } 127 examples: and r0,r1 ; before execution: r0 = h'aaaaaaaa, r1 = h'55555555 ;after execution: r1 = h'00000000 and #h'0f,r0 ; before execution: r0 = h'ffffffff ;after execution: r0 = h'0000000f and.b #h'80,@(r0,gbr) ; before execution: @(r0,gbr) = h'a5 ;after execution: @(r0,gbr) = h'80 128 6.1.6 bf (branch if false): branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp bf label when t = 0, disp 2 + pc ? pc; when t = 1, nop 10001011dddddddd 3/1 description: reads the t bit, and conditionally branches. if t = 0, it branches to the branch destination address. if t = 1, bf executes the next instruction. the branch destination is an address specified by pc + displacement. however, in this case it is used for address calculation. the pc is the address 4 bytes after this instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is ?56 to +254 bytes. if the displacement is too short to reach the branch destination, use bf with the bra instruction or the like. note: when branching, three cycles; when not branching, one cycle. operation: bf(long d)/* bf disp */ { long disp; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); if (t==0) pc=pc+(disp<<1); else pc+=2; } example: clrt ;t is always cleared to 0 bt trget_t ;does not branch, because t = 0 bf trget_f ;branches to trget_f, because t = 0 nop ; nop ; ? the pc location is used to calculate the branch destination .......... address of the bf instruction trget_f: ; ? branch destination of the bf instruction 129 6.1.7 bf/s (branch if false with delay slot): branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp bf/s label when t = 0, disp 2+ pc ? pc; when t = 1, nop 10001111dddddddd 2/1 description: reads the t bit and conditionally branches. if t = 0, it branches after executing the next instruction. if t = 1, bf/s executes the next instruction. the branch destination is an address specified by pc + displacement. however, in this case it is used for address calculation. the pc is the address 4 bytes after this instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is ?56 to +254 bytes. if the displacement is too short to reach the branch destination, use bf with the bra instruction or the like. note: since this is a delay branch instruction, the instruction immediately following is executed before the branch. no interrupts and address errors are accepted between this instruction and the next instruction. when the instruction immediately following is a branch instruction, it is recognized as an illegal slot instruction. when branching, this is a two- cycle instruction; when not branching, one cycle. operation: bfs(long d) /* bfs disp */ { long disp; unsigned long temp; temp=pc; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); if (t==0) { pc=pc+(disp<<1); delay_slot(temp+2); } else pc+=2; } 130 example: clrt ;t is always 0 bt/s trget_t ;does not branch, because t = 0 nop ; bf/s trget_f ;branches to trget_f, because t = 0 add r0,r1 ;executed before branch . nop ; ? the pc location is used to calculate the branch destination .......... address of the bf/s instruction trget_f: ; ? branch destination of the bf/s instruction note: with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 131 6.1.8 bra (branch): branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp bra label disp 2 + pc ? pc 1010dddddddddddd 2 description: branches unconditionally after executing the instruction following this bra instruction. the branch destination is an address specified by pc + displacement however, in this case it is used for address calculation. the pc is the address 4 bytes after this instruction. the 12- bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is ?096 to +4094 bytes. if the displacement is too short to reach the branch destination, this instruction must be changed to the jmp instruction. here, a mov instruction must be used to transfer the destination address to a register. note : since this is a delayed branch instruction, the instruction after bra is executed before branching. no interrupts and address errors are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. operation: bra(long d) /* bra disp */ { unsigned long temp; long disp; if ((d&0x800)==0) disp=(0x00000fff & (long) d); else disp=(0xfffff000 | (long) d); temp=pc; pc=pc+(disp<<1); delay_slot(temp+2); } example: bra trget ;branches to trget add r0,r1 ;executes add before branching nop ; ? the pc location is used to calculate the branch destination .......... address of the bra instruction trget: ; ? branch destination of the bra instruction 132 note: with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 133 6.1.9 braf (branch far): branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp braf rm rm + pc ? pc 0000mmmm00100011 2 description: branches unconditionally. the branch destination is pc + the 32-bit contents of the general register rm. however, in this case it is used for address calculation. the pc is the address 4 bytes after this instruction. note: since this is a delayed branch instruction, the instruction after braf is executed before branching. no interrupts and address errors are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. operation: braf(long m) /* braf rm */ { unsigned long temp; temp=pc; pc+=r[m]; delay_slot(temp+2); } example: mov.l #(target-bsrf_pc),r0 ;sets displacement. bra trget ;branches to target add r0,r1 ;executes add before branching braf_pc: ; ? the pc location is used to calculate the branch destination address of the braf instruction nop .................... target: ; ? branch destination of the braf instruction 134 note: with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 135 6.1.10 bsr (branch to subroutine): branch instruction format abstract code cycle t bit bsr label pc ? pr, disp 2+ pc ? pc 1011dddddddddddd 2 description: branches to the subroutine procedure at a specified address. the pc value is stored in the pr, and the program branches to an address specified by pc + displacement however, in this case it is used for address calculation. the pc is the address 4 bytes after this instruction. the 12-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is ?096 to +4094 bytes. if the displacement is too short to reach the branch destination, the jsr instruction must be used instead. with jsr, the destination address must be transferred to a register by using the mov instruction. this bsr instruction and the rts instruction are used together for a subroutine procedure call. note: since this is a delayed branch instruction, the instruction after bsr is executed before branching. no interrupts and address errors are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. operation: bsr(long d) /* bsr disp */ { long disp; if ((d&0x800)==0) disp=(0x00000fff & (long) d); else disp=(0xfffff000 | (long) d); pr=pc+is_32bit_inst(pr+2); pc=pc+(disp<<1); delay_slot(pr+2); } 136 example: bsr trget ; branches to trget mov r3,r4 ; executes the mov instruction before branching add r0,r1 ; ? the pc location is used to calculate the branch destination address of the bsr instruction (return address for when the subroutine procedure is completed (pr data)) ....... ....... trget: ; ? procedure entrance mov r2,r3 ; rts ;returns to the above add instruction mov #1,r0 ;executes mov before branching note: with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 137 6.1.11 bsrf (branch to subroutine far): branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp bsrf rm pc ? pr, rm + pc ? pc 0000mmmm00000011 2 description: branches to the subroutine procedure at a specified address after executing the instruction following this bsrf instruction. the pc value is stored in the pr. the branch destination is pc + the 32-bit contents of the general register rm. however, in this case it is used for address calculation. the pc is the address 4 bytes after this instruction. used as a subroutine procedure call in combination with rts. note: since this is a delayed branch instruction, the instruction after bsr is executed before branching. no interrupts and address errors are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. operation: bsrf(long m) /* bsrf rm */ { pr=pc+is_32bit_inst(pr+2); pc+=r[m]; delay_slot(pr+2); } example: mov.l #(target-bsrf_pc),r0 ; sets displacement. brsf r0 ; branches to target mov r3,r4 ; executes the mov instruction before branching bsrf_pc: ; ? the pc location is used to calculate the branch destination with bsrf . add r0,r1 ..... ..... target: ; ? procedure entrance mov r2,r3 ; rts ;returns to the above add instruction mov #1,r0 ;executes mov before branching 138 note: with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 139 6.1.12 bt (branch if true): branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp bt label when t = 1, disp 2 + pc ? pc; when t = 0, nop 10001001dddddddd 3/1 description: reads the t bit, and conditionally branches. if t = 1, bt branches. if t = 0, bt executes the next instruction. the branch destination is an address specified by pc + displacement. however, in this case it is used for address calculation. the pc is the address 4 bytes after this instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is ?56 to +254 bytes. if the displacement is too short to reach the branch destination, use bt with the bra instruction or the like. note: when branching, requires three cycles; when not branching, one cycle. operation: bt(long d)/* bt disp */ { long disp; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); if (t==1) pc=pc+(disp<<1); else pc+=2; } example: sett ;t is always 1 bf trget_f ;does not branch, because t = 1 bt trget_t ;branches to trget_t, because t = 1 nop ; nop ; ? the pc location is used to calculate the branch destination .......... address of the bt instruction trget_t: ; ? branch destination of the bt instruction 140 6.1.13 bt/s (branch if true with delay slot): branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp bt/s label when t = 1, disp 2 + pc ? pc; when t = 0, nop 10001101dddddddd 2/1 description: reads the t bit and conditionally branches. if t = 1, bt/s branches after the following instruction executes. if t = 0, bt/s executes the next instruction. the branch destination is an address specified by pc + displacement. however, in this case it is used for address calculation. the pc is the address 4 bytes after this instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is ?56 to +254 bytes. if the displacement is too short to reach the branch destination, use bt/s with the bra instruction or the like. note: since this is a delay branch instruction, the instruction immediately following is executed before the branch. no interrupts and address errors are accepted between this instruction and the next instruction. when the immediately following instruction is a branch instruction, it is recognized as an illegal slot instruction. when branching, requires two cycles; when not branching, one cycle. operation: bts(long d) /* bts disp */ { long disp; unsigned long temp; temp=pc; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); if (t==1) { pc=pc+(disp<<1); delay_slot(temp+2); } else pc+=2; } 141 example: sett ;t is always 1 bf/s target_f ;does not branch, because t = 1 nop ; bt/s target_t ;branches to target, because t = 1 add r0,r1 ;executes before branching. nop ; ? the pc location is used to calculate the branch destination .......... address of the bt/s instruction target_t: ; ? branch destination of the bt/s instruction note: with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 142 6.1.14 clrmac (clear mac register): system control instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp clrmac 0 ? mach, macl 0000000000101000 1 description: clear the mach and macl register. operation: clrmac() /* clrmac */ { mach=0; macl=0; pc+=2; } example: clrmac ;clears and initializes the mac register mac.w @r0+,@r1+ ;multiply and accumulate operation mac.w @r0+,@r1+ ; 143 6.1.15 clrt (clear t bit): system control instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp clrt 0 ? t 0000000000001000 10 description: clears the t bit. operation: clrt() /* clrt */ { t=0; pc+=2; } example: clrt ;before execution: t = 1 ;after execution: t = 0 144 6.1.16 cmp/cond (compare conditionally): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp cmp/ rm,rn eq when rn = rm, 1 ? t 0011nnnnmmmm0000 1 comparison result cmp/ rm,rn ge when signed and rn 3 rm, 1 ? t 0011nnnnmmmm0011 1 comparison result cmp/ rm,rn gt when signed and rn > rm, 1 ? t 0011nnnnmmmm0111 1 comparison result cmp/ rm,rn hi when unsigned and rn > rm, 1 ? t 0011nnnnmmmm0110 1 comparison result cmp/ rm,rn hs when unsigned and rn 3 rm, 1 ? t 0011nnnnmmmm0010 1 comparison result cmp/ rn pl when rn > 0, 1 ? t 0100nnnn00010101 1 comparison result cmp/ rn pz when rn 3 0, 1 ? t 0100nnnn00010001 1 comparison result cmp/ rm,rn str when a byte in rn equals a byte in rm, 1 ? t 0010nnnnmmmm1100 1 comparison result cmp/ #imm,r0 eq when r0 = imm, 1 ? t 10001000iiiiiiii 1 comparison result description: compares general register rn data with rm data, and sets the t bit to 1 if a specified condition (cond) is satisfied. the t bit is cleared to 0 if the condition is not satisfied. the rn data does not change. the following eight conditions can be specified. conditions pz and pl are the results of comparisons between rn and 0. sign-extended 8-bit immediate data can also be compared with r0 by using condition eq. here, r0 data does not change. table 6.2 shows the mnemonics for the conditions. 145 table 6.2 cmp mnemonics mnemonics condition cmp/eq rm,rn if rn = rm, t = 1 cmp/ge rm,rn if rn 3 rm with signed data, t = 1 cmp/gt rm,rn if rn > rm with signed data, t = 1 cmp/hi rm,rn if rn > rm with unsigned data, t = 1 cmp/hs rm,rn if rn 3 rm with unsigned data, t = 1 cmp/pl rn if rn > 0, t = 1 cmp/pz rn if rn 3 0, t = 1 cmp/str rm,rn if a byte in rn equals a byte in rm, t = 1 cmp/eq #imm,r0 if r0 = imm, t = 1 operation: cmpeq(long m,long n) /* cmp_eq rm,rn */ { if (r[n]==r[m]) t=1; else t=0; pc+=2; } cmpge(long m,long n) /* cmp_ge rm,rn */ { if ((long)r[n]>=(long)r[m]) t=1; else t=0; pc+=2; } cmpgt(long m,long n) /* cmp_gt rm,rn */ { if ((long)r[n]>(long)r[m]) t=1; else t=0; pc+=2; } 146 cmphi(long m,long n) /* cmp_hi rm,rn */ { if ((unsigned long)r[n]>(unsigned long)r[m]) t=1; else t=0; pc+=2; } cmphs(long m,long n) /* cmp_hs rm,rn */ { if ((unsigned long)r[n]>=(unsigned long)r[m]) t=1; else t=0; pc+=2; } cmppl(long n) /* cmp_pl rn */ { if ((long)r[n]>0) t=1; else t=0; pc+=2; } cmppz(long n) /* cmp_pz rn */ { if ((long)r[n]>=0) t=1; else t=0; pc+=2; } 147 cmpstr(long m,long n) /* cmp_str rm,rn */ { unsigned long temp; long hh,hl,lh,ll; temp=r[n]^r[m]; hh=(temp>>12)&0x000000ff; hl=(temp>>8)&0x000000ff; lh=(temp>>4)&0x000000ff; ll=temp&0x000000ff; hh=hh&&hl&&lh&≪ if (hh==0) t=1; else t=0; pc+=2; } cmpim(long i) /* cmp_eq #imm,r0 */ { long imm; if ((i&0x80)==0) imm=(0x000000ff & (long i)); else imm=(0xffffff00 | (long i)); if (r[0]==imm) t=1; else t=0; pc+=2; } example: cmp/ge r0,r1 ;r0 = h'7fffffff, r1 = h'80000000 bt trget_t ;does not branch because t = 0 cmp/hs r0,r1 ;r0 = h'7fffffff, r1 = h'80000000 bt trget_t ;branches because t = 1 cmp/str r2,r3 ;r2 = ?bcd? r3 = ?ycz bt trget_t ;branches because t = 1 148 6.1.17 div0s (divide step 0 as signed): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp div0s rm,rn msb of rn ? q, msb of rm ? m, m^q ? t 0010nnnnmmmm0111 1 calculation result description: div0s is an initialization instruction for signed division. it finds the quotient by repeatedly dividing in combination with the div1 or another instruction that divides for each bit after this instruction. see the description given with div1 for more information. operation: div0s(long m,long n) /* div0s rm,rn */ { if ((r[n]&0x80000000)==0) q=0; else q=1; if ((r[m]&0x80000000)==0) m=0; else m=1; t=!(m==q); pc+=2; } example: see div1. 149 6.1.18 div0u (divide step 0 as unsigned): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp div0u 0 ? m/q/t 0000000000011001 10 description: div0u is an initialization instruction for unsigned division. it finds the quotient by repeatedly dividing in combination with the div1 or another instruction that divides for each bit after this instruction. see the description given with div1 for more information. operation: div0u() /* div0u */ { m=q=t=0; pc+=2; } example: see div1. 150 6.1.19 div1 (divide 1 step): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp div1 rm,rn 1 step division (rn ? rm) 0011nnnnmmmm0100 1 calculation result description: uses single-step division to divide one bit of the 32-bit data in general register rn (dividend) by rm data (divisor). it finds a quotient through repetition either independently or used in combination with other instructions. during this repetition, do not rewrite the specified register or the m, q, and t bits. in one-step division, the dividend is shifted one bit left, the divisor is subtracted and the quotient bit reflected in the q bit according to the status (positive or negative). to find the remainder in a division, first find the quotient using a div1 instruction, then find the remainder as follows: (dividend) ?(divisor) (quotient) = (remainder) zero division, overflow detection, and remainder operation are not supported. check for zero division and overflow division before dividing. find the remainder by first finding the sum of the divisor and the quotient obtained and then subtracting it from the dividend. that is, first initialize with div0s or div0u. repeat div1 for each bit of the divisor to obtain the quotient. when the quotient requires 17 or more bits, place rotcl before div1. for the division sequence, see the following examples. 151 operation: div1(long m,long n) /* div1 rm,rn */ { unsigned long tmp0; unsigned char old_q,tmp1; old_q=q; q=(unsigned char)((0x80000000 & r[n])!=0); r[n]<<=1; r[n]|=(unsigned long)t; switch(old_q){ case 0:switch(m){ case 0:tmp0=r[n]; r[n]-=r[m]; tmp1=(r[n]>tmp0); switch(q){ case 0:q=tmp1; break; case 1:q=(unsigned char)(tmp1==0); break; } break; case 1:tmp0=r[n]; r[n]+=r[m]; tmp1=(r[n] 153 example 1: ;r1 (32 bits) / r0 (16 bits) = r1 (16 bits):unsigned shll16 r0 ;upper 16 bits = divisor, lower 16 bits = 0 tst r0,r0 ;zero division check bt zero_div ; cmp/hs r0,r1 ;overflow check bt over_div ; div0u ;flag initialization .arepeat 16 ; div1 r0,r1 ;repeat 16 times .aendr ; rotcl r1 ; extu.w r1,r1 ;r1 = quotient example 2: ; r1:r2 (64 bits)/r0 (32 bits) = r2 (32 bits):unsigned tst r0,r0 ;zero division check bt zero_div ; cmp/hs ;r0,r1 ;overflow check bt over_div ; div0u ;flag initialization .arepeat 32 ; rotcl r2 ;repeat 32 times div1 r0,r1 ; .aendr ; rotcl r2 ;r2 = quotient 154 example 3: ;r1 (16 bits)/r0 (16 bits) = r1 (16 bits):signed shll16 r0 ;upper 16 bits = divisor, lower 16 bits = 0 exts.w r1,r1 ;sign-extends the dividend to 32 bits xor r2,r2 ;r2 = 0 mov r1,r3 ; rotcl r3 ; subc r2,r1 ;decrements if the dividend is negative div0s r0,r1 ;flag initialization .arepeat 16 ; div1 r0,r1 ;repeat 16 times .aendr exts.w r1,r1 ; rotcl r1 ;r1 = quotient (one? complement) addc r2,r1 ;increments and takes the two? complement if the msb of the quotient is 1 exts.w r1,r1 ;r1 = quotient (two? complement) example 4: ;r2 (32 bits) / r0 (32 bits) = r2 (32 bits):signed mov r2,r3 ; rotcl r3 ; subc r1,r1 ;sign-extends the dividend to 64 bits (r1:r2) xor r3,r3 ;r3 = 0 subc r3,r2 ;decrements and takes the one? complement if the dividend is negative div0s r0,r1 ;flag initialization .arepeat 32 ; rotcl r2 ;repeat 32 times div1 r0,r1 ; .aendr ; rotcl r2 ;r2 = quotient (one? complement) addc r3,r2 ;increments and takes the two? complement if the msb of the quotient is 1. r2 = quotient (two? complement) 155 6.1.20 dmuls.l (double-length multiply as signed): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp dmuls.l rm, rn with sign, rn rm ? mach, macl 0011nnnnmmmm1101 2 to 4 description: performs 32-bit multiplication of the contents of general registers rn and rm, and stores the 64-bit results in the macl and mach register. the operation is a signed arithmetic operation. operation: dmuls(long m,long n)/* dmuls.l rm,rn */ { unsigned long rnl,rnh,rml,rmh,res0,res1,res2; unsigned long temp0,temp1,temp2,temp3; long tempm,tempn,fnlml; tempn=(long)r[n]; tempm=(long)r[m]; if (tempn<0) tempn=0-tempn; if (tempm<0) tempm=0-tempm; if ((long)(r[n]^r[m])<0) fnlml=-1; else fnlml=0; temp1=(unsigned long)tempn; temp2=(unsigned long)tempm; rnl=temp1&0x0000ffff; rnh=(temp1>>16)&0x0000ffff; rml=temp2&0x0000ffff; rmh=(temp2>>16)&0x0000ffff; 156 temp0=rml*rnl; temp1=rmh*rnl; temp2=rml*rnh; temp3=rmh*rnh; res2=0 res1=temp1+temp2; if (res1 157 6.1.21 dmulu.l (double-length multiply as unsigned): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp dmulu.l rm, rn without sign, rn rm ? mach, macl 0011nnnnmmmm0101 2 to 4 description: performs 32-bit multiplication of the contents of general registers rn and rm, and stores the 64-bit results in the macl and mach register. the operation is an unsigned arithmetic operation. operation: dmulu(long m,long n)/* dmulu.l rm,rn */ { unsigned long rnl,rnh,rml,rmh,res0,res1,res2; unsigned long temp0,temp1,temp2,temp3; rnl=r[n]&0x0000ffff; rnh=(r[n]>>16)&0x0000ffff; rml=r[m]&0x0000ffff; rmh=(r[m]>>16)&0x0000ffff; temp0=rml*rnl; temp1=rmh*rnl; temp2=rml*rnh; temp3=rmh*rnh; res2=0 res1=temp1+temp2; if (res1 159 6.1.22 dt (decrement and test): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp dt rn rn ?1 ? rn; when rn is 0, 1 ? t, when rn is nonzero, 0 ? t 0100nnnn00010000 1 comparison result description: the contents of general register rn are decremented by 1 and the result compared to 0 (zero). when the result is 0, the t bit is set to 1. when the result is not zero, the t bit is set to 0. operation: dt(long n)/* dt rn */ { r[n]--; if (r[n]==0) t=1; else t=0; pc+=2; } example: mov #4,r5 ;sets the number of loops. loop: add r0,r1 ; dt rs ;decrements the r5 value and checks whether it has become 0. bf loop ;branches to loop is t=0. (in this example, loops 4 times.) 160 6.1.23 exts (extend as signed): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp exts.b rm, rn sign-extend rm from byte ? rn 0110nnnnmmmm1110 1 exts.w rm, rn sign-extend rm from word ? rn 0110nnnnmmmm1111 1 description: sign-extends general register rm data, and stores the result in rn. if byte length is specified, the bit 7 value of rm is copied into bits 8 to 31 of rn. if word length is specified, the bit 15 value of rm is copied into bits 16 to 31 of rn. operation: extsb(long m,long n) /* exts.b rm,rn */ { r[n]=r[m]; if ((r[m]&0x00000080)==0) r[n]&=0x000000ff; else r[n]|=0xffffff00; pc+=2; } extsw(long m,long n) /* exts.w rm,rn */ { r[n]=r[m]; if ((r[m]&0x00008000)==0) r[n]&=0x0000ffff; else r[n]|=0xffff0000; pc+=2; } examples: exts.b r0,r1 ;before execution: r0 = h'00000080 ;after execution: r1 = h'ffffff80 exts.w r0,r1 ;before execution: r0 = h'00008000 ;after execution: r1 = h'ffff8000 161 6.1.24 extu (extend as unsigned): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp extu.b rm, rn zero-extend rm from byte ? rn 0110nnnnmmmm1100 1 extu.w rm, rn zero-extend rm from word ? rn 0110nnnnmmmm1101 1 description: zero-extends general register rm data, and stores the result in rn. if byte length is specified, 0s are written in bits 8 to 31 of rn. if word length is specified, 0s are written in bits 16 to 31 of rn. operation: extub(long m,long n)/* extu.b rm,rn */ { r[n]=r[m]; r[n]&=0x000000ff; pc+=2; } extuw(long m,long n)/* extu.w rm,rn */ { r[n]=r[m]; r[n]&=0x0000ffff; pc+=2; } examples: extu.b r0,r1 ;before execution: r0 = h'ffffff80 ;after execution: r1 = h'00000080 extu.w r0,r1 ;before execution: r0 = h'ffff8000 ;after execution: r1 = h'00008000 162 6.1.25 jmp (jump): branch instruction class: delayed branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp jmp @rm rm ? pc 0100mmmm00101011 2 description: branches unconditionally to the address specified by register indirect addressing. the branch destination is an address specified by the 32-bit data in general register rm. note: since this is a delayed branch instruction, the instruction after jmp is executed before branching. no interrupts or address errors are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. operation: jmp(long m) /* jmp @rm */ { unsigned long temp; temp=pc; pc=r[m]+4; delay_slot(temp+2); } example: mov.l jmp_table,r0 ;address of r0 = trget jmp @r0 ;branches to trget mov r0,r1 ;executes mov before branching .align 4 jmp_table: .data.l trget ;jump table ................. trget: add #1,r1 ; ? branch destination 163 6.1.26 jsr (jump to subroutine): branch instruction (class: delayed branch instruction) applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp jsr @rm pc ? pr, rm ? pc 0100mmmm00001011 2 description: branches to the subroutine procedure at the address specified by register indirect addressing. the pc value is stored in the pr. the jump destination is an address specified by the 32-bit data in general register rm. the stored/saved pc is the address four bytes after this instruction. the jsr instruction and rts instruction are used together for subroutine procedure calls. note: since this is a delayed branch instruction, the instruction after jsr is executed before branching. no interrupts and address errors are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. operation: jsr(long m) /* jsr @rm */ { pr=pc; pc=r[m]+4; delay_slot(pr+2); } 164 example: mov.l jsr_table,r0 ;address of r0 = trget jsr @r0 ;branches to trget xor r1,r1 ;executes xor before branching add r0,r1 ; ? return address for when the subroutine procedure is completed (pr data) ........... .align 4 jsr_table: .data.l trget ;jump table trget: nop ; ? procedure entrance mov r2,r3 ; rts ;returns to the above add instruction mov #70,r1 ;executes mov before rts note: when a delayed branch instruction is used, the branching operation takes place after the slot instruction is executed, but the execution of instructions (register update, etc.) takes place in the sequence delayed branch instruction ? delayed slot instruction. for example, even if a delayed slot instruction is used to change the register where the branch destination address is stored, the register content previous to the change will be used as the branch destination address. 165 6.1.27 ldc (load to control register): system control instruction (class : interrupt disabled instruction) format abstract code cycle t bit ldc rm,sr rm ? sr 0100mmmm00001110 1 lsb ldc rm,gbr rm ? gbr 0100mmmm00011110 1 ldc rm,vbr rm ? vbr 0100mmmm00101110 1 ldc rm,mod rm ? mod 0100mmmm01011110 1 ldc rm,re rm ? re 0100mmmm01111110 1 ldc rm,rs rm ? rs 0100mmmm01101110 1 ldc.l @rm+,sr (rm) ? sr, rm + 4 ? rm 0100mmmm00000111 3 lsb ldc.l @rm+,gbr (rm) ? gbr, rm + 4 ? rm 0100mmmm00010111 3 ldc.l @rm+,vbr (rm) ? vbr, rm + 4 ? rm 0100mmmm00100111 3 ldc.l @rm+,mod (rm) ? mod, rm + 4 ? rm 0100mmmm01010111 3 ldc.l @rm+,re (rm) ? re, rm + 4 ? rm 0100mmmm01110111 3 ldc.l @rm+,rs (rm) ? rs, rm + 4 ? rm 0100mmmm01100111 3 description: store the source operand into control register sr, gbr, vbr, mod, re, or rs. note: no interrupts are accepted between this instruction and the next instruction. address errors are accepted. operation: ldcsr(long m) /* ldc rm,sr */ { sr=r[m]&0x0fff0fff; pc+=2; } ldcgbr(long m) /* ldc rm,gbr */ { gbr=r[m]; pc+=2; } 166 ldcvbr(long m) /* ldc rm,vbr */ { vbr=r[m]; pc+=2; } ldcmod(long m) /* ldc rm,mod */ { mod=r[m]; pc+=2; } ldcre(long m) /* ldc rm,re */ { re=r[m]; pc+=2; } ldcrs(long m) /* ldc rm,rs */ { rsr=r[m]; pc+=2; } ldcmsr(long m) /* ldc.l @rm+,sr */ { sr=read_long(r[m])&0x0fff0fff; r[m]+=4; pc+=2; } ldcmgbr(long m) /* ldc.l @rm+,gbr */ { gbr=read_long(r[m]); r[m]+=4; pc+=2; } 167 ldcmvbr(long m) /* ldc.l @rm+,vbr */ { vbr=read_long(r[m]); r[m]+=4; pc+=2; } ldcmmod(long m) /* ldc.l @rm+,mod */ { mod=read_long(r[m]); r[m]+=4; pc+=2; } ldcmre(long m) /* ldc.l @rm+,re */ { re=read_long(r[m]); r[m]+=4; pc+=2; } ldcmrs(long m) /* ldc.l @rm+,rs */ { rs=read_long(r[m]); r[m]+=4; pc+=2; } examples: ldc r0,sr ;before execution: r0 = h'ffffffff, sr = h'00000000 ;after execution: sr = h'0fff0fff ldc.l @r15+,gbr ;before execution: r15 = h'10000000 ;after execution: r15 = h'10000004, gbr = @h'10000000 note: this is the execution result for the sh-dsp. 168 6.1.28 ldre (load effective address to re register): system control instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp ldre @(disp,pc) disp 2 + pc ? re 10001110dddddddd 1 description: stores the effective address of the source operand in the repeat end register re. the effective address is an address specified by pc + displacement. the pc is the address four bytes after this instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is ?56 to +254 bytes. note: the effective address value designated for the re reregister is different from the actual repeat end address. refer to table 4.35, rs and re design rule, for more information. when this instruction is arranged immediately after the delayed branch instruction, pc becomes the "first address +2" of the branch destination. operation: ldre(long d) /* ldre @(disp, pc) */ { long disp; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); re=pc+(disp<<1); pc+=2; } 169 example: ldrs sta ;set repeat start address to rs. ldre end ;set repeat end address to re. setrc #32 ;repeat 32 times from inst.a to inst.c. inst.0 ; sta: inst.a ; inst.b ; ............ end: inst.c ; inst.e ; ............ 170 6.1.29 ldrs (load effective address to rs register): system control instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp ldrs @(disp,pc) disp 2 + pc ? rs 10001100dddddddd 1 description: stores the effective address of the source operand in the repeat start register rs. the effective address is an address specified by pc + displacement. the pc is the address four bytes after this instruction. the 8-bit displacement is sign-extended and doubled. consequently, the relative interval from the branch destination is ?56 to +254 bytes. note: when the instructions of the repeat (loop) program are below 3, the effective address value designated for the rs register is different from the actual repeat start address. refer to table 4.35. "rs and re setting rule", for more information. if this instruction is arranged immediately after the delayed branch instruction, the pc becomes "the first address +2" of the branch destination. operation: ldrs(long d) /* ldrs @(disp, pc) */ { long disp; if ((d&0x80)==0) disp=(0x000000ff & (long)d); else disp=(0xffffff00 | (long)d); rs=pc+(disp<<1); pc+=2; } 171 example: ldrs sta ;set repeat start address to rs. ldre end ;set repeat end address to re. setrc #32 ;repeat 32 times from inst.a to inst.c. inst.0 ; sta: inst.a ; inst.b ; ............ end: inst.c ; inst.d ; ............ 172 6.1.30 lds (load to system register): system control instruction class: interrupt disabled instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp lds rm,mach rm ? mach 0100mmmm00001010 1 lds rm,macl rm ? macl 0100mmmm00011010 1 lds rm,pr rm ? pr 0100mmmm00101010 1 lds rm,dsr rm ? dsr 0100mmmm01101010 1 lds rm,a0 rm ? a0 0100mmmm01111010 1 lds rm,x0 rm ? x0 0100mmmm10001010 1 lds rm,x1 rm ? x1 0100mmmm10011010 1 lds rm,y0 rm ? y0 0100mmmm10101010 1 lds rm,y1 rm ? y1 0100mmmm10111010 1 lds.l @rm+, mach (rm) ? mach, rm + 4 ? rm 0100mmmm00000110 1 lds.l @rm+, macl (rm) ? macl, rm + 4 ? rm 0100mmmm00010110 1 lds.l @rm+,pr (rm) ? pr, rm + 4 ? rm 0100mmmm00100110 1 lds.l @rm+, dsr (rm) ? dsr, rm + 4 ? rm 0100mmmm01100110 1 lds.l @rm+,a0 (rm) ? a0, rm + 4 ? rm 0100mmmm01110110 1 lds.l @rm+, x0 (rm) ? x0, rm+4 ? rm 0100nnnn10000110 1 lds.l @rm+, x1 (rm) ? x1, rm+4 ? rm 0100nnnn10010110 1 lds.l @rm+, y0 (rm) ? y0, rm+4 ? rm 0100nnnn10100110 1 lds.l @rm+, y1 (rm) ? y1, rm+4 ? rm 0100nnnn10110110 1 description: store the source operand into the system register mach, macl, or pr or the dsp register dsr, a0, x0, x1, y0, or y1. when a0 is designated as the destination, the msb of the data is copied into a0g. 173 note: no interrupts are accepted between this instruction and the next instruction. address errors are accepted. for the sh-1 cpu, the lower 10 bits are stored in mach. for the sh-2 and sh-dsp cpu, 32 bits are stored in mach. operation: ldsmach(long m) /* lds rm,mach */ { mach=r[m]; if ((mach&0x00000200)==0) mach&=0x000003ff; for sh-1 cpu(these 2 lines else mach|=0xfffffc00; not needed for sh-2 and v pc+=2; n sh-dsp cpu) } ldsmacl(long m) /* lds rm,macl */ { macl=r[m]; pc+=2; } ldspr(long m) /* lds rm,pr */ { pr=r[m]; pc+=2; } ldsdsr(long m) /* lds rm,dsr */ { dsr=r[m]&0x0000000f; pc+=2; } ldsa0(long m) /* lds rm,a0 */ { a0=r[m]; if((a0&0x80000000)==0) a0g=0x00; else a0g=0xff; pc+=2; } ldsx0(long m) /* lds rm, x0 */ { 174 x0=r[m]; pc+=2; } ldsx1(long m) /* lds rm, x1 */ { x1=r[m]; pc+=2; } ldsy0(long m) /* lds rm, y0 */ { y0=r[m]; pc+=2; } ldsy1(long m) /* lds rm, y1 */ { y1=r[m]; pc+=2; } ldsmmach(long m) /* lds.l @rm+,mach */ { mach=read_long(r[m]); if ((mach&0x00000200)==0) mach&=0x000003ff; for sh-1 cpu (these 2 lines else mach|=0xfffffc00; not needed for sh-2 and r[m]+=4; sh-dsp cpu) pc+=2; } ldsmmacl(long m) /* lds.l @rm+,macl */ { macl=read_long(r[m]); r[m]+=4; pc+=2; } ldsmpr(long m) /* lds.l @rm+,pr */ { pr=read_long(r[m]); r[m]+=4; pc+=2; 175 } ldsmdsr(long m) /* lds.l @rm+,dsr */ { dsr=read_long(r[m])&0x0000000f; r[m]+=4; pc+=2; } ldsma0(long m) /* lds.l @rm+,a0 */ { a0=read_long(r[m]); if((a0&0x80000000)==0) a0g=0x00; else a0g=0xff; r[m]+=4; pc+=2; } ldsmx0(long m) /* lds.l @rm+,x0 */ { x0=read_long(r[m]); r[m]+=4; pc+=2; } ldsmx1(long m) /* lds.l @rm+,x1 */ { x1=read_long(r[m]); r[m]+=4; pc+=2; } ldsmy0(long m) /* lds.l @rm+,y0 */ { y0=read_long(r[m]); r[m]+=4; pc+=2; } ldsmy1(long m) /* lds.l @rm+,y1 */ { y1=read_long(r[m]); r[m]+=4; 176 pc+=2; } examples: lds r0,pr ;before execution: r0 = h'12345678, pr = h'00000000 ;after execution: pr = h'12345678 lds.l @r15+,macl ;before execution: r15 = h'10000000 ;after execution: r15 = h'10000004, macl = @h'10000000 177 6.1.31 mac.l (multiply and accumulate calculation long): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mac.l @rm+, @rn+ signed operation, (rn) (rm) + mac ? mac 0000nnnnmmmm1111 3/(2 to?) description: does signed multiplication of 32-bit operands obtained using the contents of general registers rm and rn as addresses. the 64-bit result is added to contents of the mac register, and the final result is stored in the mac register. every time an operand is read, they increment rm and rn by four. when the s bit is cleared to 0, the 64-bit result is stored in the coupled mach and macl registers. when bit s is set to 1, addition to the mac register is a saturation operation of 48 bits starting from the lsb. for the saturation operation, only the lower 48 bits of the macl register are enabled and the result is limited to a range of h'ffff800000000000 (minimum) and h'00007fffffffffff (maximum). operation: macl(long m,long n) /* mac.l @rm+,@rn+*/ { unsigned long rnl,rnh,rml,rmh,res0,res1,res2; unsigned long temp0,templ,temp2,temp3; long tempm,tempn,fnlml; tempn=(long)read_long(r[n]); r[n]+=4; tempm=(long)read_long(r[m]); r[m]+=4; if ((long)(tempn^tempm)<0) fnlml=-1; else fnlml=0; if (tempn<0) tempn=0-tempn; if (tempm<0) tempm=0-tempm; temp1=(unsigned long)tempn; 178 temp2=(unsigned long)tempm; rnl=temp1&0x0000ffff; rnh=(temp1>>16)&0x0000ffff; rml=temp2&0x0000ffff; rmh=(temp2>>16)&0x0000ffff; temp0=rml*rnl; temp1=rmh*rnl; temp2=rml*rnh; temp3=rmh*rnh; res2=0 res1=temp1+temp2; if (res1 179 if(((long)res2>0)&&(res2>0x00007fff)){ res2=0x00007fff; res0=0xffffffff; }; mach={res2; macl=res0; } else { res0=macl+res0; if (macl>res0) res2++; res2+=mach mach=res2; macl=res0; } pc+=2; } example: mova tblm,r0 ;table address mov r0,r1 ; mova tbln,r0 ;table address clrmac ;mac register initialization mac.l @r0+,@r1+ ; mac.l @r0+,@r1+ ; sts macl,r0 ;store result into r0 ............... .align 2 ; tblm .data.l h'1234abcd ; .data.l h'5678ef01 ; tbln .data.l h'0123abcd ; .data.l h'4567def0 ; 180 6.1.32 mac.w (multiply and accumulate calculation word): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mac.w @rm+, @rn+ with sign, (rn) (rm) + mac ? mac 0100nnnnmmmm1111 3/(2) mac @rm+, @rn+ description: does signed multiplication of 16-bit operands obtained using the contents of general registers rm and rn as addresses. the 32-bit result is added to contents of the mac register, and the final result is stored in the mac register. rm and rn data are incremented by 2 after the operation. when the s bit is cleared to 0, the operation is 16 16 + 64 ? 64-bit multiply and accumulate and the 64-bit result is stored in the coupled mach and macl registers. when the s bit is set to 1, the operation is 16 16 + 32 ? 32-bit multiply and accumulate and addition to the mac register is a saturation operation. for the saturation operation, only the macl register is enabled and the result is limited to a range of h'80000000 (minimum) and h'7fffffff (maximum). if an overflow occurs, the lsb of the mach register is set to 1. the result is stored in the macl register. the result is limited to a value between h'80000000 (minimum) for overflows in the negative direction and h'7fffffff (maximum) for overflows in the positive direction. note: when the s bit is 0, the sh-2 and sh-dsp cpu perform a 16 16 + 64 ? 64 bit multiply and accumulate operation and the sh-1 cpu performs a 16 16 + 42 ? 42 bit multiply and accumulate operation. 181 operation: macw(long m,long n) /* mac.w @rm+,@rn+*/ { long tempm,tempn,dest,src,ans; unsigned long templ; tempn=(long)read_word(r[n]); r[n]+=2; tempm=(long)read_word(r[m]); r[m]+=2; templ=macl; tempm=((long)(short)tempn*(long)(short)tempm); if ((long)macl>=0) dest=0; else dest=1; if ((long)tempm>=0 { src=0; tempn=0; } else { src=1; tempn=0xffffffff; } src+=dest; macl+=tempm; if ((long)macl>=0) ans=0; else ans=1; ans+=dest; 182 if (s==1) { if (ans==1) { if (src==0 || src==2) for sh-1 cpu (these 2 lines mach|=0x00000001; not needed for sh-2 and if (src==0) macl=0x7fffffff; sh-dsp cpu) if (src==2) macl=0x80000000; } } else { mach+=tempn; if (templ>macl) mach+=1; if ((mach&0x00000200)==0) for sh-1 cpu (these 3 lines mach&=0x000003ff; not needed for sh-2 and else mach|=0xfffffc00; sh-dsp cpu) } pc+=2; } example: mova tblm,r0 ;table address mov r0,r1 ; mova tbln,r0 ;table address clrmac ;mac register initialization mac.w @r0+,@r1+ ; mac.w @r0+,@r1+ ; sts macl,r0 ;store result into r0 ............... .align 2 ; tblm .data.w h'1234 ; .data.w h'5678 ; tbln .data.w h'0123 ; .data.w h'4567 ; 183 6.1.33 mov (move data): data transfer instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mov rm,rn rm ? rn 0110nnnnmmmm0011 1 mov.b rm,@rn rm ? (rn) 0010nnnnmmmm0000 1 mov.w rm,@rn rm ? (rn) 0010nnnnmmmm0001 1 mov.l rm,@rn rm ? (rn) 0010nnnnmmmm0010 1 mov.b @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0000 1 mov.w @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0001 1 mov.l @rm,rn (rm) ? rn 0110nnnnmmmm0010 1 mov.b rm,@?n rn ?1 ? rn, rm ? (rn) 0010nnnnmmmm0100 1 mov.w rm,@?n rn ?2 ? rn, rm ? (rn) 0010nnnnmmmm0101 1 mov.l rm,@?n rn ?4 ? rn, rm ? (rn) 0010nnnnmmmm0110 1 mov.b @rm+,rn (rm) ? sign extension ? rn, rm + 1 ? rm 0110nnnnmmmm0100 1 mov.w @rm+,rn (rm) ? sign extension ? rn, rm + 2 ? rm 0110nnnnmmmm0101 1 mov.l @rm+,rn (rm) ? rn, rm + 4 ? rm 0110nnnnmmmm0110 1 mov.b rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0100 1 mov.w rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0101 1 mov.l rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0110 1 mov.b @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1100 1 mov.w @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1101 1 mov.l @(r0,rm),rn (r0 + rm) ? rn 0000nnnnmmmm1110 1 description: transfers the source operand to the destination. when the operand is stored in memory, the transferred data can be a byte, word, or longword. loaded data from memory is stored in a register after it is sign-extended to a longword. 184 operation: mov(long m,long n) /* mov rm,rn */ { r[n]=r[m]; pc+=2; } movbs(long m,long n) /* mov.b rm,@rn */ { write_byte(r[n],r[m]); pc+=2; } movws(long m,long n) /* mov.w rm,@rn */ { write_word(r[n],r[m]); pc+=2; } movls(long m,long n) /* mov.l rm,@rn */ { write_long(r[n],r[m]); pc+=2; } movbl(long m,long n) /* mov.b @rm,rn */ { r[n]=(long)read_byte(r[m]); if ((r[n]&0x80)==0) r[n]&0x000000ff; else r[n]|=0xffffff00; pc+=2; } 185 movwl(long m,long n) /* mov.w @rm,rn */ { r[n]=(long)read_word(r[m]); if ((r[n]&0x8000)==0) r[n]&0x0000ffff; else r[n]|=0xffff0000; pc+=2; } movll(long m,long n) /* mov.l @rm,rn */ { r[n]=read_long(r[m]); pc+=2; } movbm(long m,long n) /* mov.b rm,@?n */ { write_byte(r[n]?,r[m]); r[n]?1; pc+=2; } movwm(long m,long n) /* mov.w rm,@?n */ { write_word(r[n]?,r[m]); r[n]?2; pc+=2; } movlm(long m,long n) /* mov.l rm,@?n */ { write_long(r[n]?,r[m]); r[n]?4; pc+=2; } 186 movbp(long m,long n)/* mov.b @rm+,rn */ { r[n]=(long)read_byte(r[m]); if ((r[n]&0x80)==0) r[n]&0x000000ff; else r[n]|=0xffffff00; if (n!=m) r[m]+=1; pc+=2; } movwp(long m,long n) /* mov.w @rm+,rn */ { r[n]=(long)read_word(r[m]); if ((r[n]&0x8000)==0) r[n]&0x0000ffff; else r[n]|=0xffff0000; if (n!=m) r[m]+=2; pc+=2; } movlp(long m,long n) /* mov.l @rm+,rn */ { r[n]=read_long(r[m]); if (n!=m) r[m]+=4; pc+=2; } movbs0(long m,long n) /* mov.b rm,@(r0,rn) */ { write_byte(r[n]+r[0],r[m]); pc+=2; } movws0(long m,long n) /* mov.w rm,@(r0,rn) */ { write_word(r[n]+r[0],r[m]); pc+=2; } 187 movls0(long m,long n) /* mov.l rm,@(r0,rn) */ { write_long(r[n]+r[0],r[m]); pc+=2; } movbl0(long m,long n) /* mov.b @(r0,rm),rn */ { r[n]=(long)read_byte(r[m]+r[0]); if ((r[n]&0x80)==0) r[n]&0x000000ff; else r[n]|=0xffffff00; pc+=2; } movwl0(long m,long n) /* mov.w @(r0,rm),rn */ { r[n]=(long)read_word(r[m]+r[0]); if ((r[n]&0x8000)==0) r[n]&0x0000ffff; else r[n]|=0xffff0000; pc+=2; } movll0(long m,long n) /* mov.l @(r0,rm),rn */ { r[n]=read_long(r[m]+r[0]); pc+=2; } example: mov r0,r1 ;before execution: r0 = h'ffffffff, r1 = h'00000000 ;after execution: r1 = h'ffffffff mov.w r0,@r1 ;before execution: r0 = h'ffff7f80 ;after execution: @r1 = h'7f80 mov.b @r0,r1 ;before execution: @r0 = h'80, r1 = h'00000000 ;after execution: r1 = h'ffffff80 mov.w r0,@?1 ;before execution: r0 = h'aaaaaaaa, r1 = h'ffff7f80 ;after execution: r1 = h'ffff7f7e, @r1 = h'aaaa 188 mov.l @r0+,r1 ;before execution: r0 = h'12345670 ;after execution: r0 = h'12345674, r1 = @h'12345670 mov.b r1,@(r0,r2) ;before execution: r2 = h'00000004, r0 = h'10000000 ;after execution: r1 = @h'10000004 mov.w @(r0,r2),r1 ;before execution: r2 = h'00000004, r0 = h'10000000 ;after execution: r1 = @h'10000004 189 6.1.34 mov (move immediate data): data transfer instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mov #imm,rn imm ? sign extension ? rn 1110nnnniiiiiiii 1 mov.w @(disp, pc),rn (disp 2 + pc) ? sign extension ? rn 1001nnnndddddddd 1 mov.l @(disp, pc),rn (disp 4 + pc) ? rn 1101nnnndddddddd 1 description: stores immediate data, which has been sign-extended to a longword, into general register rn. if the data is a word or longword, table data stored in the address specified by pc + displacement is accessed. if the data is a word, the 8-bit displacement is zero-extended and doubled. consequently, the relative interval from the table can be up to pc + 510 bytes. the pc points to the starting address of the second instruction after this mov instruction. if the data is a longword, the 8-bit displacement is zero-extended and quadrupled. consequently, the relative interval from the table can be up to pc + 1020 bytes. the pc points to the starting address of the second instruction after this mov instruction, but the lowest two bits of the pc are corrected to b'00. note: the optimum table assignment is at the rear end of the module or one instruction after the unconditional branch instruction. if the optimum assignment is impossible for the reason of no unconditional branch instruction in the 510 byte/1020 byte or some other reason, means to jump past the table by the bra instruction are required. by assigning this instruction immediately after the delayed branch instruction, the pc becomes the "first address + 2". operation: movi(long i,long n) /* mov #imm,rn */ { if ((i&0x80)==0) r[n]=(0x000000ff & (long)i); else r[n]=(0xffffff00 | (long)i); pc+=2; } 190 movwi(long d,long n) /* mov.w @(disp,pc),rn */ { long disp; disp=(0x000000ff & (long)d); r[n]=(long)read_word(pc+(disp<<1)); if ((r[n]&0x8000)==0) r[n]&=0x0000ffff; else r[n]|=0xffff0000; pc+=2; } movli(long d,long n) /* mov.l @(disp,pc),rn */ { long disp; disp=(0x000000ff & (long)d); r[n]=read_long((pc&0xfffffffc)+(disp<<2)); pc+=2; } example: address 1000 mov #h'80,r1 ;r1 = h'ffffff80 1002 mov.w imm,r2 ;r2 = h'ffff9abc, imm means @(h'08,pc) 1004 add #?,r0 ; 1006 tst r0,r0 ; ? pc location used for address calculation for the mov.w instruction 1008 movt r13 ; 100a bra next ;delayed branch instruction 100c mov.l @(4,pc),r3 ;r3 = h'12345678 100e imm .data.w h'9abc ; 1010 .data.w h'1234 ; 1012 next jmp @r3 ;branch destination of the bra instruction 1014 cmp/eq #0,r0 ; ? pc location used for address calculation for the ;mov.l instruction .align 4 ; 1018 .data.l h'12345678 ; 191 6.1.35 mov (move peripheral data): data transfer instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mov.b @(disp,gbr),r0 (disp + gbr) ? sign extension ? r0 11000100dddddddd 1 mov.w @(disp,gbr),r0 (disp 2 + gbr) ? sign extension ? r0 11000101dddddddd 1 mov.l @(disp,gbr),r0 (disp 4 + gbr) ? r0 11000110dddddddd 1 mov.b r0,@(disp,gbr) r0 ? (disp + gbr) 11000000dddddddd 1 mov.w r0,@(disp,gbr) r0 ? (disp 2 + gbr) 11000001dddddddd 1 mov.l r0,@(disp,gbr) r0 ? (disp 4 + gbr) 11000010dddddddd 1 description: transfers the source operand to the destination. this instruction is optimum for accessing data in the peripheral module area. the data can be a byte, word, or longword, but only the r0 register can be used. a peripheral module base address is set to the gbr. when the peripheral module data is a byte, the only change made is to zero-extend the 8-bit displacement. consequently, an address within +255 bytes can be specified. when the peripheral module data is a word, the 8-bit displacement is zero-extended and doubled. consequently, an address within +510 bytes can be specified. when the peripheral module data is a longword, the 8-bit displacement is zero-extended and is quadrupled. consequently, an address within +1020 bytes can be specified. if the displacement is too short to reach the memory operand, the above @(r0,rn) mode must be used after the gbr data is transferred to a general register. when the source operand is in memory, the loaded data is stored in the register after it is sign-extended to a longword. note: the destination register of a data load is always r0. r0 cannot be accessed by the next instruction until the load instruction is finished. the instruction order shown in figure 6.1 will give better results. mov.b and add @(12, gbr), r0 #80, r0 #20, r1 mov.b add and @(12, gbr), r0 #20, r1 #80, r0 figure 6.1 using r0 after mov 192 operation: movblg(long d) /* mov.b @(disp,gbr),r0 */ { long disp; disp=(0x000000ff & (long)d); r[0]=(long)read_byte(gbr+disp); if ((r[0]&0x80)==0) r[0]&=0x000000ff; else r[0]|=0xffffff00; pc+=2; } movwlg(long d) /* mov.w @(disp,gbr),r0 */ { long disp; disp=(0x000000ff & (long)d); r[0]=(long)read_word(gbr+(disp<<1)); if ((r[0]&0x8000)==0) r[0]&=0x0000ffff; else r[0]|=0xffff0000; pc+=2; } movllg(long d) /* mov.l @(disp,gbr),r0 */ { long disp; disp=(0x000000ff & (long)d); r[0]=read_long(gbr+(disp<<2)); pc+=2; } 193 movbsg(long d) /* mov.b r0,@(disp,gbr) */ { long disp; disp=(0x000000ff & (long)d); write_byte(gbr+disp,r[0]); pc+=2; } movwsg(long d) /* mov.w r0,@(disp,gbr) */ { long disp; disp=(0x000000ff & (long)d); write_word(gbr+(disp<<1),r[0]); pc+=2; } movlsg(long d) /* mov.l r0,@(disp,gbr) */ { long disp; disp=(0x000000ff & (long)d); write_long(gbr+(disp<<2),r[0]); pc+=2; } examples: mov.l @(2,gbr),r0 ;before execution: @(gbr + 8) = h'12345670 ;after execution: r0 = h'12345670 mov.b r0,@(1,gbr) ;before execution: r0 = h'ffff7f80 ;after execution: @(gbr + 1) = h'80 194 6.1.36 mov (move structure data): data transfer instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mov.b r0,@(disp,rn) r0 ? (disp + rn) 10000000nnnndddd 1 mov.w r0,@(disp,rn) r0 ? (disp 2 + rn) 10000001nnnndddd 1 mov.l rm,@(disp,rn) rm ? (disp 4 + rn) 0001nnnnmmmmdddd 1 mov.b @(disp,rm),r0 (disp + rm) ? sign extension ? r0 10000100mmmmdddd 1 mov.w @(disp,rm),r0 (disp 2 + rm) ? sign extension ? r0 10000101mmmmdddd 1 mov.l @(disp,rm),rn disp 4 + rm) ? rn 0101nnnnmmmmdddd 1 description: transfers the source operand to the destination. this instruction is optimum for accessing data in a structure or a stack. the data can be a byte, word, or longword, but when a byte or word is selected, only the r0 register can be used. when the data is a byte, the only change made is to zero-extend the 4-bit displacement. consequently, an address within +15 bytes can be specified. when the data is a word, the 4-bit displacement is zero-extended and doubled. consequently, an address within +30 bytes can be specified. when the data is a longword, the 4-bit displacement is zero-extended and quadrupled. consequently, an address within +60 bytes can be specified. if the displacement is too short to reach the memory operand, the aforementioned @(r0,rn) mode must be used. when the source operand is in memory, the loaded data is stored in the register after it is sign-extended to a longword. note: when byte or word data is loaded, the destination register is always r0. r0 cannot be accessed by the next instruction until the load instruction is finished. the instruction order in figure 6.2 will give better results. mov.b and add @(2, r1), r0 #80, r0 #20, r1 mov.b add and @(2, r1), r0 #20, r1 #80, r0 figure 6.2 using r0 after mov 195 operation: movbs4(long d,long n) /* mov.b r0,@(disp,rn) */ { long disp; disp=(0x0000000f & (long)d); write_byte(r[n]+disp,r[0]); pc+=2; } movws4(long d,long n) /* mov.w r0,@(disp,rn) */ { long disp; disp=(0x0000000f & (long)d); write_word(r[n]+(disp<<1),r[0]); pc+=2; } movls4(long m,long d,long n) /* mov.l rm,@(disp,rn) */ { long disp; disp=(0x0000000f & (long)d); write_long(r[n]+(disp<<2),r[m]); pc+=2; } movbl4(long m,long d) /* mov.b @(disp,rm),r0 */ { long disp; disp=(0x0000000f & (long)d); r[0]=read_byte(r[m]+disp); if ((r[0]&0x80)==0) r[0]&=0x000000ff; else r[0]|=0xffffff00; pc+=2; } 196 movwl4(long m,long d) /* mov.w @(disp,rm),r0 */ { long disp; disp=(0x0000000f & (long)d); r[0]=read_word(r[m]+(disp<<1)); if ((r[0]&0x8000)==0) r[0]&=0x0000ffff; else r[0]|=0xffff0000; pc+=2; } movll4(long m,long d,long n) /* mov.l @(disp,rm),rn */ { long disp; disp=(0x0000000f & (long)d); r[n]=read_long(r[m]+(disp<<2)); pc+=2; } examples: mov.l @(2,r0),r1 ;before execution: @(r0 + 8) = h'12345670 ;after execution: r1 = h'12345670 mov.l r0,@(h'f,r1) ;before execution: r0 = h'ffff7f80 ;after execution: @(r1 + 60) = h'ffff7f80 197 6.1.37 mova (move effective address): data transfer instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mova @(disp,pc),r0 disp 4 + pc ? r0 11000111dddddddd 1 description: stores the effective address of the source operand into general register r0. the 8-bit displacement is zero-extended and quadrupled. consequently, the relative interval from the operand is pc + 1020 bytes. the pc is the address four bytes after this instruction, but the lowest two bits of the pc are corrected to b'00. note: if this instruction is placed immediately after a delayed branch instruction, the pc must point to an address specified by (the starting address of the branch destination) + 2. operation: mova(long d) /* mova @(disp,pc),r0 */ { long disp; disp=(0x000000ff & (long)d); r[0]=(pc&0xfffffffc)+(disp<<2); pc+=2; } example: address .org h'1006 1006 mova str,r0 ;address of str ? r0 1008 mov.b @r0,r1 ;r1 = ? ? pc location after correcting the lowest two bits 100a add r4,r5 ; ? original pc location for address calculation for the mova instruction .align 4 100c str: .sdata ?yzp12 ............... 2002 bra trget ;delayed branch instruction 2004 mova @(0,pc),r0 ;address of trget + 2 ? r0 2006 nop ; 198 6.1.38 movt (move t bit): data transfer instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp movt rn t ? rn 0000nnnn00101001 1 description: stores the t bit value into general register rn. when t = 1, 1 is stored in rn, and when t = 0, 0 is stored in rn. operation: movt(long n) /* movt rn */ { r[n]=(0x00000001 & sr); pc+=2; } example: xor r2,r2 ;r2 = 0 cmp/pz r2 ;t = 1 movt r0 ;r0 = 1 clrt ;t = 0 movt r1 ;r1 = 0 199 6.1.39 mul.l (multiply long): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mul.l rm,rn rn rm ? macl 0000nnnnmmmm0111 2 (to?) description: performs 32-bit multiplication of the contents of general registers rn and rm, and stores the bottom 32 bits of the result in the macl register. the mach register data does not change. operation: mul.l(long m,long n)/* mul.l rm,rn */ { macl=r[n]*r[m]; pc+=2; } example: mull r0,r1 ;before execution: r0 = h'fffffffe, r1 = h'00005555 ;after execution: macl = h'ffff5556 sts macl,r0 ;operation result 200 6.1.40 muls.w (multiply as signed word): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp muls.w rm,rn muls rm,rn signed operation, rn rm ? macl 0010nnnnmmmm1111 1 (to?) description: performs 16-bit multiplication of the contents of general registers rn and rm, and stores the 32-bit result in the macl register. the operation is signed and the mach register data does not change. operation: muls(long m,long n) /* muls rm,rn */ { macl=((long)(short)r[n]*(long)(short)r[m]); pc+=2; } example: muls r0,r1 ;before execution: r0 = h'fffffffe, r1 = h'00005555 ;after execution: macl = h'ffff5556 sts macl,r0 operation result 201 6.1.41 mulu.w (multiply as unsigned word): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp mulu.w rm,rn mulu rm,rn unsigned, rn rm ? macl 0010nnnnmmmm1110 1 (to?) description: performs 16-bit multiplication of the contents of general registers rn and rm, and stores the 32-bit result in the macl register. the operation is unsigned and the mach register data does not change. operation: mulu(long m,long n) /* mulu rm,rn */ { macl=((unsigned long)(unsigned short)r[n] *(unsigned long)(unsigned short)r[m]); pc+=2; } example: mulu r0,r1 ;before execution: r0 = h'00000002, r1 = h'ffffaaaa ;after execution: macl = h'00015554 sts macl,r0 ; o peration result 202 6.1.42 neg (negate): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp neg rm,rn 0 rm ? rn 0110nnnnmmmm1011 1 description: takes the two? complement of data in general register rm, and stores the result in rn. this effectively subtracts rm data from 0, and stores the result in rn. operation: neg(long m,long n) /* neg rm,rn */ { r[n]=0-r[m]; pc+=2; } example: neg r0,r1 ;before execution: r0 = h'00000001 ;after execution: r1 = h'ffffffff 203 6.1.43 negc (negate with carry): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp negc rm,rn 0 ?rm t ? rn, borrow ? t 0110nnnnmmmm1010 1 borrow description: subtracts general register rm data and the t bit from 0, and stores the result in rn. if a borrow is generated, t bit changes accordingly. this instruction is used for inverting the sign of a value that has more than 32 bits. operation: negc(long m,long n) /* negc rm,rn */ { unsigned long temp; temp=0-r[m]; r[n]=temp-t; if (0 205 6.1.45 not (not?ogical complement): logic operation instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp not rm,rn ~rm ? rn 0110nnnnmmmm0111 1 description: takes the one? complement of general register rm data, and stores the result in rn. this effectively inverts each bit of rm data and stores the result in rn. operation: not(long m,long n) /* not rm,rn */ { r[n]=~r[m]; pc+=2; } example: not r0,r1 ;before execution: r0 = h'aaaaaaaa ;after execution: r1 = h'55555555 206 6.1.46 or (or logical) logic operation instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp or rm,rn rn | rm ? rn 0010nnnnmmmm1011 1 or #imm,r0 r0 | imm ? r0 11001011iiiiiiii 1 or.b #imm,@(r0,gbr) (r0 + gbr) | imm ? (r0 + gbr) 11001111iiiiiiii 3 description: logically ors the contents of general registers rn and rm, and stores the result in rn. the contents of general register r0 can also be ored with zero-extended 8-bit immediate data, or 8-bit memory data accessed by using indirect indexed gbr addressing can be ored with 8-bit immediate data. operation: or(long m,long n) /* or rm,rn */ { r[n]|=r[m]; pc+=2; } ori(long i) /* or #imm,r0 */ { r[0]|=(0x000000ff & (long)i); pc+=2; } orm(long i) /* or.b #imm,@(r0,gbr) */ { long temp; temp=(long)read_byte(gbr+r[0]); temp|=(0x000000ff & (long)i); write_byte(gbr+r[0],temp); pc+=2; } 207 examples: or r0,r1 ;before execution: r0 = h'aaaa5555, r1 = h'55550000 ;after execution: r1 = h'ffff5555 or #h'f0,r0 ;before execution: r0 = h'00000008 ;after execution: r0 = h'000000f8 or.b #h'50,@(r0,gbr) ;before execution: @(r0,gbr) = h'a5 ;after execution: @(r0,gbr) = h'f5 208 6.1.47 rotcl (rotate with carry left): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp rotcl rn t ? rn ? t 0100nnnn00100100 1 msb description: rotates the contents of general register rn and the t bit to the left by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 6.3). lsb msb t rotcl figure 6.3 rotate with carry left operation: rotcl(long n) /* rotcl rn */ { long temp; if ((r[n]&0x80000000)==0) temp=0; else temp=1; r[n]<<=1; if (t==1) r[n]|=0x00000001; else r[n]&=0xfffffffe; if (temp==1) t=1; else t=0; pc+=2; } example: rotcl r0 ;before execution: r0 = h'80000000, t = 0 ;after execution: r0 = h'00000000, t = 1 209 6.1.48 rotcr (rotate with carry right): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp rotcr rn t ? rn ? t 0100nnnn00100101 1 lsb description: rotates the contents of general register rn and the t bit to the right by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure?.4). lsb msb t rotcr figure 6.4 rotate with carry right operation: rotcr(long n) /* rotcr rn */ { long temp; if ((r[n]&0x00000001)==0) temp=0; else temp=1; r[n]>>=1; if (t==1) r[n]|=0x80000000; else r[n]&=0x7fffffff; if (temp==1) t=1; else t=0; pc+=2; } examples: rotcr r0 ;before execution: r0 = h'00000001, t = 1 ;after execution: r0 = h'80000000, t = 1 210 6.1.49 rotl (rotate left): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp rotl rn t ? rn ? msb 0100nnnn00000100 1 msb description: rotates the contents of general register rn to the left by one bit, and stores the result in rn (figure 6.5). the bit that is shifted out of the operand is transferred to the t bit. lsb msb t rotl figure 6.5 rotate left operation: rotl(long n) /* rotl rn */ { if ((r[n]&0x80000000)==0) t=0; else t=1; r[n]<<=1; if (t==1) r[n]|=0x00000001; else r[n]&=0xfffffffe; pc+=2; } examples: rotl r0 ;before execution: r0 = h'80000000, t = 0 ;after execution: r0 = h'00000001, t = 1 211 6.1.50 rotr (rotate right): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp rotr rn lsb ? rn ? t 0100nnnn00000101 1 lsb description: rotates the contents of general register rn to the right by one bit, and stores the result in rn (figure 6.6). the bit that is shifted out of the operand is transferred to the t bit. lsb msb t rotr figure 6.6 rotate right operation: rotr(long n) /* rotr rn */ { if ((r[n]&0x00000001)==0) t=0; else t=1; r[n]>>=1; if (t==1) r[n]|=0x80000000; else r[n]&=0x7fffffff; pc+=2; } examples: rotr r0 ;before execution: r0 = h'00000001, t = 0 ;after execution: r0 = h'80000000, t = 1 212 6.1.51 rte (return from exception): system control instruction class: delayed branch instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp rte delayed branch, stack area ? pc/sr 0000000000101011 4 lsb description: returns from an interrupt routine. the pc and sr values are restored from the stack, and the program continues from the address specified by the restored pc value. the t bit is used as the lsb bit in the sr register restored from the stack area. note: since this is a delayed branch instruction, the instruction after this rte is executed before branching. no address errors and interrupts are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. operation: rte() /* rte */ { unsigned long temp; temp=pc; pc=read_long(r[15])+4; r[15]+=4; sr=read_long(r[15])&0x0fff0fff; r[15]+=4; delay_slot(temp+2); } example: rte ;returns to the original routine add #8,r14 ;executes add before branching 213 note: with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 214 6.1.52 rts (return from subroutine): branch instruction (class : delayed branch instruction) applicable instructions format abstract code cycle t bi t sh- 1 sh-2 sh- dsp rts delayed branch, pr ? pc 000000000000101 1 2 description: returns from a subroutine procedure. the pc values are restored from the pr, and the program continues from the address specified by the restored pc value. this instruction is used to return to the program from a subroutine program called by a bsr, bsrf, or jsr instruction. note : since this is a delayed branch instruction, the instruction after this rts is executed before branching. no address errors and interrupts are accepted between this instruction and the next instruction. if the next instruction is a branch instruction, it is acknowledged as an illegal slot instruction. operation: rts() /* rts */ { unsigned long temp; temp=pc; pc=pr+4; delay_slot(temp+2); } 215 example: mov.l table,r3 ;r3 = address of trget jsr @r3 ;branches to trget nop ;executes nop before branching add r0,r1 ; ? return address for when the subroutine procedure is completed (pr data) ............. table: .data.l trget ;jump table ............. trget: mov r1,r0 ; ? procedure entrance rts ;pr data ? pc mov #12,r0 ;executes mov before branching note: with delayed branching, branching occurs after execution of the slot instruction. however, instructions such as register changes etc. are executed in the order of delayed branch instruction, then delay slot instruction. for example, even if the register in which the branch destination address has been loaded is changed by the delay slot instruction, the branch will still be made using the value of the register prior to the change as the branch destination address. 216 6.1.53 setrc (set repeat count to rc): system control instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp setrc rm rm[11:0] rccsr[27:16] repeat control flag ? rf1, rf0 0100mmmm00010100 1 setrc #imm imm ? rc [23:26] zeros ? sr[27:24], repeat control flag ? rf1, rf0 10000010iiiiiiii 1 description: sets the repeat count to the sr register? rc counter. when the operand is a register, the bottom 12 bits are used as the repeat count. when the operand is an immediate data value, 8 bits are used as the repeat count. set repeat control flags to rf1, rf0 bits of the sr register. use of the setrc instruction is subject to any limitations. refer to section 4.19, dsp repeat (loop) control, for more information. operation: setrc(long m) /* setrc rm */ { long temp; temp=(r[m] & 0x00000fff)<<16; sr&=0x00000ff3; sr|=temp; rf1=repeat_control_flag1; rf0=repeat_control_flag0; pc+=2; } 217 setrci(long i) /* setrc #imm */ { long temp; temp=((long)i & 0x000000ff)<<16; sr&=0x00000fff; sr|=temp; rf1=repeat_control_flag1; rf0=repeat_control_flag0; pc+=2; } setrc #imm 70 setrc rn imm sr 8 bits 31 12 11 0 rn sr 1 imm 255 1 rm [11:0] 4095 12 bits 31 27 23 16 15 0 0 8 bits 12 bits 31 27 16 15 0 repeat control flag repeat control flag 32 32 figure 6.7 setrc instruction example: ldrs sta ;set repeat start address to rs. ldre end ;set repeat end address to re. setrc #32 ;repeat 32 times from inst.a to inst.c. inst.0 ; sta: inst.a ; inst.b ; ............ end: inst.c ; inst.d ; 218 6.1.54 sett (set t bit): system control instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp sett 1 ? t 0000000000011000 11 description: sets the t bit to 1. operation: sett() /* sett */ { t=1; pc+=2; } example: sett ;before execution: t = 0 ;after execution: t = 1 219 6.1.55 shal (shift arithmetic left): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp shal rn t ? rn ? 0 0100nnnn00100000 1 msb description: arithmetically shifts the contents of general register rn to the left by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure?.8). lsb msb t0 shal figure 6.8 shift arithmetic left operation: shal(long n) /* shal rn(same as shll) */ { if ((r[n]&0x80000000)==0) t=0; else t=1; r[n]<<=1; pc+=2; } example: shal r0 ;before execution: r0 = h'80000001, t = 0 ;after execution: r0 = h'00000002, t = 1 220 6.1.56 shar (shift arithmetic right): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp shar rn msb ? rn ? t 0100nnnn00100001 1 lsb description: arithmetically shifts the contents of general register rn to the right by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 6.9). lsb msb t shar figure 6.9 shift arithmetic right operation: shar(long n) /* shar rn */ { long temp; if ((r[n]&0x00000001)==0) t=0; else t=1; if ((r[n]&0x80000000)==0) temp=0; else temp=1; r[n]>>=1; if (temp==1) r[n]|=0x80000000; else r[n]&=0x7fffffff; pc+=2; } example: shar r0 ;before execution: r0 = h'80000001, t = 0 ;after execution: r0 = h'c0000000, t = 1 221 6.1.57 shll (shift logical left): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp shll rn t ? rn ? 0 0100nnnn00000000 1 msb description: logically shifts the contents of general register rn to the left by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure 6.10). lsb msb t0 shll figure 6.10 shift logical left operation: shll(long n) /* shll rn(same as shal) */ { if ((r[n]&0x80000000)==0) t=0; else t=1; r[n]<<=1; pc+=2; } examples: shll r0 ;before execution: r0 = h'80000001, t = 0 ;after execution: r0 = h'00000002, t = 1 222 6.1.58 shlln (shift logical left n bits): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp shll2 rn rn << 2 ? rn 0100nnnn00001000 1 shll8 rn rn << 8 ? rn 0100nnnn00011000 1 shll16 rn rn << 16 ? rn 0100nnnn00101000 1 description: logically shifts the contents of general register rn to the left by 2, 8, or 16 bits, and stores the result in rn. bits that are shifted out of the operand are not stored (figure 6.11). 0 0 0 msb lsb msb lsb msb lsb shll2 shll8 shll16 figure 6.11 shift logical left n bits 223 operation: shll2(long n) /* shll2 rn */ { r[n]<<=2; pc+=2; } shll8(long n) /* shll8 rn */ { r[n]<<=8; pc+=2; } shll16(long n) /* shll16 rn */ { r[n]<<=16; pc+=2; } examples: shll2 r0 ;before execution: r0 = h'12345678 ;after execution: r0 = h'48d159e0 shll8 r0 ;before execution: r0 = h'12345678 ;after execution: r0 = h'34567800 shll16 r0 ;before execution: r0 = h'12345678 ;after execution: r0 = h'56780000 224 6.1.59 shlr (shift logical right): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp shlr rn 0 ? rn ? t 0100nnnn00000001 1 lsb description: logically shifts the contents of general register rn to the right by one bit, and stores the result in rn. the bit that is shifted out of the operand is transferred to the t bit (figure?.12). lsb msb t 0 shlr figure 6.12 shift logical right operation: shlr(long n) /* shlr rn */ { if ((r[n]&0x00000001)==0) t=0; else t=1; r[n]>>=1; r[n]&=0x7fffffff; pc+=2; } examples: shlr r0 ;before execution: r0 = h'80000001, t = 0 ;after execution: r0 = h'40000000, t = 1 225 6.1.60 shlrn (shift logical right n bits): shift instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp shlr2 rn rn>>2 ? rn 0100nnnn00001001 1 shlr8 rn rn>>8 ? rn 0100nnnn00011001 1 shlr16 rn rn>>16 ? rn 0100nnnn00101001 1 description: logically shifts the contents of general register rn to the right by 2, 8, or 16 bits, and stores the result in rn. bits that are shifted out of the operand are not stored (figure?.13). 0 0 0 msb lsb msb lsb msb lsb shlr2 shlr8 shlr16 figure 6.13 shift logical right n bits 226 operation: shlr2(long n) /* shlr2 rn */ { r[n]>>=2; r[n]&=0x3fffffff; pc+=2; } shlr8(long n) /* shlr8 rn */ { r[n]>>=8; r[n]&=0x00ffffff; pc+=2; } shlr16(long n) /* shlr16 rn */ { r[n]>>=16; r[n]&=0x0000ffff; pc+=2; } examples: shlr2 r0 ;before execution: r0 = h'12345678 ;after execution: r0 = h'048d159e shlr8 r0 ;before execution: r0 = h'12345678 ;after execution: r0 = h'00123456 shlr16 r0 ;before execution: r0 = h'12345678 ;after execution: r0 = h'00001234 227 6.1.61 sleep (sleep): system control instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp sleep sleep 0000000000011011 3 description: sets the cpu into power-down mode. in power-down mode, instruction execution stops, but the cpu internal status is maintained, and the cpu waits for an interrupt request. if an interrupt is requested, the cpu exits the power-down mode and begins exception processing. note: the number of cycles given is for the transition to sleep mode. operation: sleep() /* sleep */ { pc-=2; wait_for_exception; } example: sleep ;enters power-down mode 228 6.1.62 stc (store control register): system control instruction (interrupt disabled instruction) applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp stc sr,rn sr ? rn 0000nnnn00000010 1 stc gbr,rn gbr ? rn 0000nnnn00010010 1 stc vbr,rn vbr ? rn 0000nnnn00100010 1 stc mod,rn mod ? rn 0000nnnn01010010 1 stc re,rn re ? rn 0000nnnn01110010 1 stc rs,rn rs ? rn 0000nnnn01100010 1 stc.l sr,@-rn rn ?4 ? rn, sr ? (rn) 0100nnnn00000011 2 stc.l gbr,@-rn rn ?4 ? rn, gbr ? (rn) 0100nnnn00010011 2 stc.l vbr,@-rn rn ?4 ? rn, vbr ? (rn) 0100nnnn00100011 2 stc.l mod,@-rn rn ?4 ? rn, mod ? (rn) 0100nnnn01010011 2 stc.l re,@-rn rn ?4 ? rn, re ? (rn) 0100nnnn01110011 2 stc.l rs,@-rn rn ?4 ? rn, rs ? (rn) 0100nnnn01100011 2 description: stores control register sr, gbr, vbr, mod, re, or rs data into a specified destination. note: no interrupts are accepted between this instruction and the next instruction. address errors are accepted. operation: stcsr(long n) /* stc sr,rn */ { r[n]=sr; pc+=2; } 229 stcgbr(long n) /* stc gbr,rn */ { r[n]=gbr; pc+=2; } stcvbr(long n) /* stc vbr,rn */ { r[n]=vbr; pc+=2; } stcmod(long n) /* stc mod,rn */ { r[n]=mod; pc+=2; } stcre(long n) /* stc re,rn */ { r[n]=re; pc+=2; } stcrs(long n) /* stc rs,rn */ { r[n]=rs; pc+=2; } stcmsr(long n) /* stc.l sr,@-rn */ { r[n]-=4; write_long(r[n],sr); pc+=2; } 230 stcmgbr(long n) /* stc.l gbr,@-rn */ { r[n]-=4; write_long(r[n],gbr); pc+=2; } stcmvbr(long n) /* stc.l vbr,@-rn */ { r[n]-=4; write_long(r[n],vbr); pc+=2; } stcmmod(long n) /* stc.l mod,@-rn */ { r[n]-=4; write_long(r[n],mod); pc+=2; } stcmre(long n) /* stc.l re,@-rn */ { r[n]-=4; write_long(r[n],re); pc+=2; } stcmrs(long n) /* stc.l rs,@-rn */ { r[n]-=4; write_long(r[n],sr); pc+=2; } examples: stc sr,r0 ;before execution: r0 = h'ffffffff, sr = h'00000000 ;after execution: r0 = h'00000000 stc.l gbr,@-r15 ;before execution: r15 = h'10000004 ;after execution: r15 = h'10000000, @r15 = gbr 231 6.1.63 sts (store system register): system control instruction (interrupt disabled instruction) applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp sts mach,rn mach ? rn 0000nnnn00001010 1 sts macl,rn macl ? rn 0000nnnn00011010 1 sts pr,rn pr ? rn 0000nnnn00101010 1 sts dsr,rn dsr ? rn 0000nnnn01101010 1 sts a0,rn a0 ? rn 0000nnnn01111010 1 sts x0,rn x0 ? rn 0000nnnn10001010 1 sts x1,rn x1 ? rn 0000nnnn10011010 1 sts y0,rn y0 ? rn 0000nnnn10101010 1 sts y1,rn y1 ? rn 0000nnnn10111010 1 sts.l mach,@?n rn ?4 ? rn, mach ? (rn) 0100nnnn00000010 1 sts.l macl,@?n rn ?4 ? rn, macl ? (rn) 0100nnnn00010010 1 sts.l pr,@?n rn ?4 ? rn, pr ? (rn) 0100nnnn00100010 1 sts.l dsr,@?n rn ?4 ? rn, dsr ? (rn) 0100nnnn01100010 1 sts.l a0,@?n rn ?4 ? rn, a0 ? (rn) 0100nnnn01100010 1 sts.l x0,@-rn rn? ? rn,x0 ? (rn) 0100nnnn10000010 1 sts.l x1,@-rn rn? ? rn,x1 ? (rn) 0100nnnn10010010 1 sts.l y0,@-rn rn? ? rn,y0 ? (rn) 0100nnnn10100010 1 sts.l y1,@-rn rn? ? rn,y1 ? (rn) 0100nnnn10110010 1 description: stores data from system register mach, macl, or pr or dsp register dsr, a0, x0, x1, y0, or y1 into a specified destination. note: no interrupts are accepted between this instruction and the next instruction. address errors are accepted. if the system register is mach in the sh-1 series, the value of bit 9 is transferred to and stored in the higher 22 bits (bits 31 to 10) of the destination. with the sh-2 and sh-dsp, the 32 bits of mach are stored directly. 232 operation: stsmach(long n) /* sts mach,rn */ { r[n]=mach; if ((r[n]&0x00000200)==0) for sh-1 cpu (these 2 lines not r[n]&=0x000003ff; needed for sh-2 and sh-dsp cpu) else r[n]|=0xfffffc00; pc+=2; } stsmacl(long n) /* sts macl,rn */ { r[n]=macl; pc+=2; } stspr(long n) /* sts pr,rn */ { r[n]=pr; pc+=2; } stsdsr(long n) /* sts dsr,rn */ { r[n]=dsr; pc+=2; } stsa0(long n) /* sts a0,rn */ { r[n]=a0; pc+=2; } stsx0(long n) /* sts x0,rn */ { r[n]=x0; pc+=2; } 233 stsx1(long n) /* sts x1,rn */ { r[n]=x1; pc+=2; } stsy0(long n) /* sts y0,rn */ { r[n]=y0; pc+=2; } stsy1(long n) /* sts y1,rn */ { r[n]=y1; pc+=2; } stsmmach(long n) /* sts.l mach,@?n */ { r[n]?4; if ((mach&0x00000200)==0) write_long(r[n],mach&0x000003ff); for sh-1 cpu else write_long (r[n],mach|0xfffffc00) write_long(r[n], mach); for sh-2 and sh-dsp cpu pc+=2; } stsmmacl(long n) /* sts.l macl,@?n */ { r[n]?4; write_long(r[n],macl); pc+=2; } 234 stsmpr(long n) /* sts.l pr,@?n */ { r[n]?4; write_long(r[n],pr); pc+=2; } stsmdsr(long n) /* sts.l dsr,@?n */ { r[n]?4; write_long(r[n],dsr); pc+=2; } stsma0(long n) /* sts.l a0,@?n */ { r[n]?4; write_long(r[n],a0); pc+=2; } stsmx0(long n) /* sts.l x0,@?n */ { r[n]?4; write_long(r[n],x0); pc+=2; } stsmx1(long n) /* sts.l x1,@?n */ { r[n]?4; write_long(r[n],x1); pc+=2; } 235 stsmy0(long n) /* sts.l y0,@?n */ { r[n]?4; write_long(r[n],y0); pc+=2; } stsmy1(long n) /* sts.l y1,@?n */ { r[n]?4; write_long(r[n],y1); pc+=2; } example: sts mach,r0 ;before execution: r0 = h'ffffffff, mach = h'00000000 ;after execution: r0 = h'00000000 sts.l pr,@?15 ;before execution: r15 = h'10000004 ;after execution: r15 = h'10000000, @r15 = pr 236 6.1.64 sub (subtract binary): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp sub rm,rn rn ?rm ? rn 0011nnnnmmmm1000 1 description: subtracts general register rm data from rn data, and stores the result in rn. to subtract immediate data, use add #imm,rn. operation: sub(long m,long n) /* sub rm,rn */ { r[n]-=r[m]; pc+=2; } example: sub r0,r1 ;before execution: r0 = h'00000001, r1 = h'80000000 ;after execution: r1 = h'7fffffff 237 6.1.65 subc (subtract with carry): arithmetic instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp subc rm,rn rn rmt ? rn, borrow ? t 0011nnnnmmmm1010 1 borrow description: subtracts rm data and the t bit value from general register rn data, and stores the result in rn. the t bit changes according to the result. this instruction is used for subtraction of data that has more than 32 bits. operation: subc(long m,long n) /* subc rm,rn */ { unsigned long tmp0,tmp1; tmp1=r[n]-r[m]; tmp0=r[n]; r[n]=tmp1-t; if (tmp0 239 6.1.67 swap (swap register halves): data transfer instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp swap.b rm,rn rm ? swap upper and lower halves of lower 2 bytes ? rn 0110nnnnmmmm1000 1 swap.w rm,rn rm ? swap upper and lower word ? rn 0110nnnnmmmm1001 1 description: swaps the upper and lower bytes of the general register rm data, and stores the result in rn. if a byte is specified, bits 0 to 7 of rm are swapped for bits 8 to 15. the upper 16 bits of rm are transferred to the upper 16 bits of rn. if a word is specified, bits 0 to 15 of rm are swapped for bits 16 to 31. operation: swapb(long m,long n)/* swap.b rm,rn */ { unsigned long temp0,temp1; temp0=r[m]&0xffff0000; temp1=(r[m]&0x000000ff)<<8; r[n]=(r[m]>>8)&0x000000ff; r[n]=r[n]|temp1|temp0; pc+=2; } swapw(long m,long n)/* swap.w rm,rn */ { unsigned long temp; temp=(r[m]>>16)&0x0000ffff; r[n]=r[m]<<16; r[n]|=temp; pc+=2; } 240 examples: swap.b r0,r1 ;before execution: r0 = h'12345678 ;after execution: r1 = h'12347856 swap.w r0,r1 ;before execution: r0 = h'12345678 ;after execution: r1 = h'56781234 241 6.1.68 tas (test and set): logic operation instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp tas.b @rn when (rn) is 0, 1 ? t, 1 ? msb of (rn) 0100nnnn00011011 4 test results description: reads byte data from the address specified by general register rn, and sets the t bit to 1 if the data is 0, or clears the t bit to 0 if the data is not 0. then, data bit 7 is set to 1, and the data is written to the address specified by rn. during this operation, the bus is not released. operation: tas(long n) /* tas.b @rn */ { long temp; temp=(long)read_byte(r[n]); /* bus lock enable */ if (temp==0) t=1; else t=0; temp|=0x00000080; write_byte(r[n],temp); /* bus lock disable */ pc+=2; } example: _loop tas.b @r7 ;r7 = 1000 bf _loop ;loops until data in address 1000 is 0 242 6.1.69 trapa (trap always): system control instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp trapa #imm pc/sr ? stack area, (imm 4 + vbr) ? pc 11000011iiiiiiii 8 description: starts the trap exception processing. the pc and sr values are stored on the stack, and the program branches to an address specified by the vector. the vector is a memory address obtained by zero-extending the 8-bit immediate data and then quadrupling it. the pc is the start address of the next instruction. trapa and rte are both used together for system calls. operation: trapa(long i) /* trapa #imm */ { long imm; imm=(0x000000ff & i); r[15]-=4; write_long(r[15],sr); r[15]-=4; write_long(r[15],pc?); pc=read_long(vbr+(imm<<2))+4; } example: address vbr+h'80 .data.l 10000000 ; .......... trapa #h'20 ;branches to an address specified by data in address vbr + h'80 tst #0,r0 ; ? return address from the trap routine (stacked pc value) ........... .......... 100000000 xor r0,r0 ; ? trap routine entrance 100000002 rte ;returns to the tst instruction 100000004 nop ;executes nop before rte 243 6.1.70 tst (test logical): logic operation instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp tst rm,rn rn & rm, when result is 0, 1 ? t 0010nnnnmmmm1000 1 test results tst #imm,r0 r0 & imm, when result is 0, 1 ? t 11001000iiiiiiii 1 test results tst.b #imm, @(r0,gbr) (r0 + gbr) & imm, when result is 0, 1 ? t 11001100iiiiiiii 3 test results description: logically ands the contents of general registers rn and rm, and sets the t bit to 1 if the result is 0 or clears the t bit to 0 if the result is not 0. the rn data does not change. the contents of general register r0 can also be anded with zero-extended 8-bit immediate data, or the contents of 8-bit memory accessed by indirect indexed gbr addressing can be anded with 8-bit immediate data. the r0 and memory data do not change. operation: tst(long m,long n) /* tst rm,rn */ { if ((r[n]&r[m])==0) t=1; else t=0; pc+=2; } tsti(long i) /* test #imm,r0 */ { long temp; temp=r[0]&(0x000000ff & (long)i); if (temp==0) t=1; else t=0; pc+=2; } 244 tstm(long i) /* tst.b #imm,@(r0,gbr) */ { long temp; temp=(long)read_byte(gbr+r[0]); temp&=(0x000000ff & (long)i); if (temp==0) t=1; else t=0; pc+=2; } examples: tst r0,r0 ;before execution: r0 = h'00000000 ;after execution: t = 1 tst #h'80,r0 ;before execution: r0 = h'ffffff7f ;after execution: t = 1 tst.b #h'a5,@(r0,gbr) ;before execution: @(r0,gbr) = h'a5 ;after execution: t = 0 245 6.1.71 xor (exclusive or logical): logic operation instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp xor rm,rn rn ^ rm ? rn 0010nnnnmmmm1010 1 xor #imm,r0 r0 ^ imm ? r0 11001010iiiiiiii 1 xor.b #imm,@(r0,gbr) (r0 + gbr) ^ imm ? (r0 + gbr) 11001110iiiiiiiii 3 description: exclusive ors the contents of general registers rn and rm, and stores the result in rn. the contents of general register r0 can also be exclusive ored with zero-extended 8-bit immediate data, or 8-bit memory accessed by indirect indexed gbr addressing can be exclusive ored with 8-bit immediate data. operation: xor(long m,long n) /* xor rm,rn */ { r[n]^=r[m]; pc+=2; } xori(long i) /* xor #imm,r0 */ { r[0]^=(0x000000ff & (long)i); pc+=2; } xorm(long i) /* xor.b #imm,@(r0,gbr) */ { long temp; temp=(long)read_byte(gbr+r[0]); temp^=(0x000000ff & (long)i); write_byte(gbr+r[0],temp); pc+=2; } 246 examples: xor r0,r1 ;before execution: r0 = h'aaaaaaaa, r1 = h'55555555 ;after execution: r1 = h'ffffffff xor #h'f0,r0 ;before execution: r0 = h'ffffffff ;after execution: r0 = h'ffffff0f xor.b #h'a5,@(r0,gbr) ;before execution: @(r0,gbr) = h'a5 ;after execution: @(r0,gbr) = h'00 247 6.1.72 xtrct (extract): data transfer instruction applicable instructions format abstract code cycle t bit sh-1 sh-2 sh- dsp xtrct rm,rn rm: center 32 bits of rn ? rn 0010nnnnmmmm1101 1 description: extracts the middle 32 bits from the 64 bits of coupled general registers rm and rn, and stores the 32 bits in rn (figure 6.14). rm rn rn msb msb lsb lsb figure 6.14 extract operation: xtrct(long m,long n)/* xtrct rm,rn */ { unsigned long temp; temp=(r[m]<<16)&0xffff0000; r[n]=(r[n]>>16)&0x0000ffff; r[n]|=temp; pc+=2; } example: xtrct r0,r1 ;before execution: r0 = h'01234567, r1 = h'89abcdef ;after execution: r1 = h'456789ab 248 6.2 dsp data transfer instructions table 6.3 lists the dsp data transfer instructions in alphabetical order. table 6.3 dsp data transfer instructions in alphabetical order applicable instructions instruction operation code cycles dc bit sh-1 sh-2 sh- dsp movs.l @-as,ds as? ? as,(as) ? ds 111101aadddd0010 1 movs.l @as,ds (as) ? ds 111101aadddd0110 1 movs.l @as+,ds (as) ? ds,as+4 ? as 111101aadddd1010 1 movs.l @as+ix,ds (as) ? ds,as+ix ? as 111101aadddd1110 1 movs.l ds,@-as as? ? as,ds ? (as) 111101aadddd0011 1 movs.l ds,@as ds ? (as) 111101aadddd0111 1 movs.l ds,@as+ ds ? (as),as+4 ? as 111101aadddd1011 1 movs.l ds,@as+ix ds ? (as),as+ix ? as 111101aadddd1111 1 movs.w @-as,ds as? ? as,(as) ? msw of ds,0 ? lsw of ds 111101aadddd0000 1 movs.w @as,ds (as) ? msw of ds,0 ? lsw of ds 111101aadddd0100 1 movs.w @as+,ds (as) ? msw of ds,0 ? lsw of ds, as+2 ? as 111101aadddd1000 1 movs.w @as+ix,ds (as) ? msw of ds,0 ? lsw of ds, as+ix ? as 111101aadddd1100 1 movs.w ds,@-as as? ? as,msw of ds ? (as) 111101aadddd0001 1 movs.w ds,@as msw of ds ? (as) 111101aadddd0101 1 movs.w ds,@as+ msw of ds ? (as),as+2 ? as 111101aadddd1001 1 movs.w ds,@as+ix msw of ds ? (as),as+ix ? as 111101aadddd1101 1 movx.w @ax,dx (ax) ? msw of dx,0 ? lsw of dx 111100a*d*0*01** 1 movx.w @ax+,dx (ax) ? msw of dx,0 ? lsw of dx,ax+2 ? ax 111100a*d*0*10** 1 249 table 6.3 dsp data transfer instructions in alphabetical order (cont) applicable instructions instruction operation code cycles dc bit sh-1 sh-2 sh- dsp movx.w @ax+ix,dx (ax) ? msw of dx,0 ? lsw of dx,ax+ix ? ax 111100a*d*0*11** 1 movx.w da,@ax msw of da ? (ax) 111100a*d*1*01** 1 movx.w da,@ax+ msw of da ? (ax),ax+2 ? ax 111100a*d*1*10** 1 movx.w da,@ax+ix msw of da ? (ax),ax+ix ? ax 111100a*d*1*11** 1 movy.w @ay,dy (ay) ? msw of dy,0 ? lsw of dy 111100*a*d*0**01 1 movy.w @ay+,dy (ay) ? msw of dy,0 ? lsw of dy, ay+2 ? ay 111100*a*d*0**10 1 movy.w @ay+iy,dy (ay) ? msw of dy,0 ? lsw of dy, ay+iy ? ay 111100*a*d*0**11 1 movy.w da,@ay msw of da ? (ay) 111100*a*d*1**01 1 movy.w da,@ay+ msw of da ? (ay),ay+2 ? ay 111100*a*d*1**10 1 movy.w da,@ay+iy msw of da ? (ay),ay+iy ? ay 111100*a*d*1**11 1 nopx no operation 1111000*0*0*00** 1 nopy no operation 111100*0*0*0**00 1 note: msw = high-order word of operand lsw = low-order word of operand 6.2.1 x and y data transfers (movx.w and movy.w) these instructions use the xdb and ydb buses to access x and y memory. areas other than x and y memory cannot be accessed. memory is accessed in word units. since independent bus is used, it does not create access contention with instruction fetches (using the main buses). x and y data transfer instructions are executed regardless of conditions even when the data operation instruction executed in parallel has conditions. figure 6.15 shows the load and store operations in x and y data transfers. 250 instruction code for x data transfer operation r4 [ax] r5 [ax] r6 [ay] r7 [ay] control for x memory control for y memory abx aby 31 0 31 0 15 1 15 1 x data memory 4 kbytes y data memory 4 kbytes xab 15 bits yab 15 bits 16 bits 16 bits xdb ydb x_mem y_mem x r/w y r/w x_mem, y_mem: select signals for x and y data memory instruction code for y data transfer operation dsp data register x0/x1, a0/a1 input/output control dsp data register y0/y1, a0/a1 input/output control figure 6.15 load and store operations in x and y data transfers x memory data transfer operation is shown below. y memory data transfers are the same. if ( !nop ) { x_mem=1; xab=abx; x r/w=1; if ( load operation ) { dx[31:16]=xdb; dx[15:0] =0x0000; /* dx is x0 or x1 */ } else {xdb=dx[31:16];x r/w=0;} /* dx is a0 or a1 */ } else { x_mem=0; xab=unknown; } 251 6.2.2 single data transfers (movs.w and movs.l) single data transfers are instructions that load to and store from the dsp register. they are like system register load and store instructions. data transfers between the dsp register and memory use the main buses. like cpu core instructions, data accesses can create access contention with instruction memory accesses. single data transfers can use either word or longword data. figure 6.16 shows the load and store operations in single data transfers. wl ls mab memory control is sh core control 31 0 31 0 32 bits 32 bits iab, idb: main buses iab idb r2 [as] r3 [as] r4 [as] r5 [as] instruction code for single data transfer operation dsp data register input/output control figure 6.16 load and store operations in single data transfers load and store operations in single data transfers are shown below. iab = mab; if ( ms!=nls @@ w/l is word access {/* movs.w */ if (ls==load) { if (ds!=a0g @@ ds!=a1g){ ds[31:16] = idb[15:0]; ds[15:0] = 0x0000; if (ds==a0) a0g[7:0] = idb[15]; if (ds==a1) a1g[7:0] = idb[15]; } else ds[7:0] = idb[7:0] /* ds is a0g or a1g */ } else { /* store */ 252 if (ds!=a0g @@ ds!=a1g) idb[15:0] = ds[31:16]; /* ds is a0g or a1g */ else idb[15:0] = ds[7:0] with 8-bit sign extension } } else if ( ma!=nls @@ w/l is longword access ) { /* movs.l */ if (ls==load { if (ds!=a0g @@ ds!=a1g) { ds[31:0] = idb[31:0]; if (ds==a0) a0g[7:0] = idb[31]; if (ds==a1) a1g[7:0] = idb[31]; } else ds[7:0] = idb[7:0] /* ds is a0g or a1g */ } else { /* store */ if (ds!=a0g @@ ds!=a1g) idb[31:0] = ds[31:0] /* ds is a0g or a1g */ else idb[31:0] = ds[7:0] with 24-bit sign extension } } 6.2.3 sample description (name): classification this section explains the breakdown of instructions, descriptions, etc. given in the rest of this section (section 12). table 6.4 sample description (name): classification format abstract code cycle dc bit applicable instructions assembler input format. a brief description of operation displayed in order msb ? lsb all dsp instructions execute in 1 cycle the status of the dc bit after the instruction is executed indicates whether the instruction applies to the sh-1, sh-2, or sh-dsp. format: [if cc] op.sz src1,src2,dest [if cc]: condition (unconditional, dct, or dcf) op: operation code 253 sz: size src1: source 1 operand src2: source 2 operand dest: destination table 6.5 operation summary operation description ? , ? direction of transfer (xx) memory operand dc flag bits in the dsr & logical and of each bit | logical or of each bit ^ exclusive or of each bit ~ logical not of each bit < 254 dsp operation instructions: iiiiiii(imm): ?2 to +32 ee(se): 0=x0, 1=x1, 2=y0, 3=a1 ff(sf): 0=y0, 1=y1, 2=x0, 3=a1 xx(sx): 0=x0, 1=x1, 2=a0, 3=a1 yy(sy): 0=y0, 1=y1, 2=m0, 3=m1 gg(dg): 0=m0, 1=m1, 2=a0, 3=a1 uu(du): 0=x0, 1=y0, 2=a0, 3=a1 zzzz(dz): 5=a1, 7=a0, 8=x0, 9=x1, a=y0, b=y1, c=m0, e=m1 dc bit: update: updated according to the operation result and the specifications of the cs (condition select) bits. ? not updated. description: description of operation notes: notes on using the instruction operation: operation written in c language. examples: examples are written in assembler mnemonics and describe status before and after executing the instruction. 255 6.2.4 movs (move single data between memory and dsp register): dsp data transfer instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp movs.w @-as,ds as? ? as,(as) ? msw of ds,0 ? lsw of ds 111101aadddd0000 1 movs.w @as,ds (as) ? msw of ds,0 ? lsw of ds 111101aadddd0100 1 movs.w @as+,ds (as) ? msw of ds,0 ? lsw of ds, as+2 ? as 111101aadddd1000 1 movs.w @as+ix,ds (as) ? msw of ds,0 ? lsw of ds, as+ix ? as 111101aadddd1100 1 movs.w ds,@-as as? ? as,msw of ds ? (as) 111101aadddd0001 1 movs.w ds,@as msw of ds ? (as) 111101aadddd0101 1 movs.w ds,@as+ msw of ds ? (as),as+2 ? as 111101aadddd1001 1 movs.w ds,@as+ix msw of ds ? (as),as+ix ? as 111101aadddd1101 1 movs.l @-as,ds as? ? as,(as) ? ds 111101aadddd0010 1 movs.l @as,ds (as) ? ds 111101aadddd0110 1 movs.l @as+,ds (as) ? ds,as+4 ? as 111101aadddd1010 1 movs.l @as+ix,ds (as) ? ds,as+ix ? as 111101aadddd1110 1 movs.l ds,@-as as? ? as,ds ? (as) 111101aadddd0011 1 movs.l ds,@as ds ? (as) 111101aadddd0111 1 movs.l ds,@as+ ds ? (as),as+4 ? as 111101aadddd1011 1 movs.l ds,@as+ix ds ? (as),as+ix ? as 111101aadddd1111 1 description: transfers the source operand data to the destination. transfer can be from memory to register or register to memory. the transferred data can be a word or longword. when a word is transferred, the source operand is in memory, and the destination operand is a register, the word data is loaded to the top word of the register and the bottom word is cleared with zeros. when the source operand is a register and the destination operand is memory, the top word of the register is 256 stored as the word data . in a longword transfer, the longword data is transferred. when the destination operand is a register with guard bits, the sign is extended and stored in the guard bits. note: when one of the guard bit registers a0g and a1g is the source operand for store processing, the data is output to the bottom 8 bits (bits 0?) and the top 24 bits (bits 31?) become undefined. operation: see figure 6.17. memory to register register to memory as as any memory area any memory area 31 0 31 0 post update post update ds all 0 ds s 31 16 0 0 31 16 idb[15:0] cleared ?, 0, +2, +lx ignored memory to register register to memory as as any memory area any memory area 31 0 31 0 post update post update ds ds s 31 0 0 31 idb[31:0] longword data transfer word data transfer sign extension sign extension idb: main bus ?, 0, +2, +lx ?, 0, +4, +lx ?, 0, +4, +lx 15 15 figure 6.17 the movs instruction examples: movs.w @r4+,a0 ;before execution: r4=h'00000400, @r4=h'8765, a0=h'123456789a ;after execution: r4=h'00000402, a0=h'ff87650000 movs.l a1, @-r3 ;before execution: r3=h'00000800, a1=h'123456789a 257 ;after execution: r3=h'000007fc, @(h'000007fc)=h'3456789a 6.2.5 movx (move between x memory and dsp register): dsp data transfer instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp movx.w @ax,dx (ax) ? msw of dx, 0 ? lsw of dx 111100a*d*0*01** 1 movx.w @ax+,dx (ax) ? msw of dx, 0 ? lsw of dx,ax+2 ? ax 111100a*d*0*10** 1 movx.w @ax+ix,dx (ax) ? msw of dx, 0 ? lsw of dx,ax+ix ? ax 111100a*d*0*11** 1 movx.w da,@ax msw of da ? (ax) 111100a*d*1*01** 1 movx.w da,@ax+ msw of da ? (ax), ax+2 ? ax 111100a*d*1*10** 1 movx.w da,@ax+ix msw of da ? (ax), ax+ix ? ax 111100a*d*1*11** 1 note: "*" of the instruction code is movy instruction designation area. description: transfers the source operand data to the destination operand. transfer can be from memory to register or register to memory. the transferred data can only be word length for x memory. when the source operand is in memory, and the destination operand is a register, the word data is loaded to the top word of the register and the bottom word is cleared with zeros. when the source operand is a register and the destination operand is memory, the word data is stored in the top word of the register. operation: see figure 6.18. memory to register register to memory ax ax x memory x memory 31 0 31 0 post update post update dx all 0 da s 31 16 0 0 31 16 xdb[15:0] cleared 0, +2, +lx 0, +2, +lx ignored 15 15 figure 6.18 the movx instruction examples: 258 movx.w @r4+,x0 ;before execution: r4=h'08010000, @r4=h'5555, x0=h'12345678 ;after execution: r4=h'08010002, x0=h'55550000 6.2.6 movy (move between y memory and dsp register): dsp data transfer instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp movy.w @ay,dy (ay) ? msw of dy,0 ? lsw of dy 111100*a*d*0**01 1 movy.w @ay+,dy (ay) ? msw of dy,0 ? lsw of dy, ay+2 ? ay 111100*a*d*0**10 1 movy.w @ay+iy,dy (ay) ? msw of dy,0 ? lsw of dy, ay+iy ? ay 111100*a*d*0**11 1 movy.w da,@ay msw of da ? (ay) 111100*a*d*1**01 1 movy.w da,@ay+ msw of da ? (ay),ay+2 ? ay 111100*a*d*1**10 1 movy.w da,@ay+iy msw of da ? (ay),ay+iy ? ay 111100*a*d*1**11 1 note: "*" of the instruction code is movx instruction designation area. description: transfers the source operand data to the destination operand. transfer can be from memory to register or register to memory. the transferred data can only be word length for y memory. when the source operand is in memory, and the destination operand is a register, the word data is loaded to the top word of the register and the bottom word is cleared with zeros. when the source operand is a register and the destination operand is memory, the word data is stored in the top word of the register. operation: see figure 6.19. 259 memory to register register to memory ay ay y memory y memory 31 0 31 0 post update post update dy all 0 da s 31 16 0 0 31 16 ydb[15:0] cleared 0, +2, +ly 0, +2, +ly ignored 15 15 figure 6.19 the movy instruction examples: movy.w a0, @r6+,r9 ;before execution: r6=h'08020000, r9=h'00000006, a0=h'123456789a ;after execution: r6=h'08020006, @(h'08020000)=h'3456 260 6.2.7 nopx (no access operation for x memory): dsp data transfer instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp nopx no operation 1111000*0*0*00** 1 description: no access operation for x memory. 6.2.8 nopy (no access operation for y memory): dsp data transfer instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp nopy no operation 111100*0*0*0**00 1 description: no access operation for y memory. 261 6.3 dsp operation instructions the dsp operation instructions are listed below in alphabetical order. see section 6.2.3, sample descriptions (name): classification, for an explanation of the format and symbols used in this description. table 6.6 alphabetical listing of dsp operation instructions applicable instructions instruction operation code cycles dc bit sh-1 sh-2 sh- dsp pabs sx,dz if sx 3 0, sx ? dz if sx < 0, 0?x ? dz 111110********** 10001000xx00zzzz 1 update pabs sy,dz if sy 3 0, sy ? dz if sy < 0, 0?y ? dz 111110********** 1010100000yyzzzz 1 update padd sx,sy,dz sx + sy ? dz 111110********** 10110001xxyyzzzz 1 update dct padd sx,sy,dz if dc = 1, sx + sy ? dz; if 0, nop 111110********** 10110010xxyyzzzz 1 dcf padd sx,sy,dz if dc = 0, sx + sy?z; if 1, nop 111110********** 10110011xxyyzzzz 1 padd sx,sy,du pmuls se,sf,dg sx + sy ? du; msw of se msw of sf ? dg 111110********** 0111eeffxxyygguu 1 update* paddc sx,sy,dz sx + sy + dc ? dz 111110********** 10110000xxyyzzzz 1 update pand sx,sy,dz sx & sy ? dz; clear lsw of dz 111110********** 10010101xxyyzzzz 1 update dct pand sx,sy,dz if dc = 1, sx & sy ? dz, clear lsw of dz; if 0, nop 111110********** 10010110xxyyzzzz 1 dcf pand sx,sy,dz if dc = 0, sx & sy ? dz, clear lsw of dz; if 1, nop 111110********** 10010111xxyyzzzz 1 pclr dz h'00000000 ? dz 111110********** 100011010000zzzz 1 update dct pclr dz if dc = 1, h'00000000 ? dz; if 0, nop 111110********** 100011100000zzzz 1 dcf pclr dz if dc = 0, h'00000000 ? dz; if 1, nop 111110********** 100011110000zzzz 1 262 table 6.6 alphabetical listing of dsp operation instructions (cont) applicable instructions instruction operation code cycles dc bit sh-1 sh-2 sh- dsp pcmp sx,sy sx ?sy 111110********** 10000100xxyy0000 1 update pcopy sx,dz sx ? dz 111110********** 11011001xx00zzzz 1 update pcopy sy,dz sy ? dz 111110********** 1111100100yyzzzz 1 update dct pcopy sx,dz if dc = 1, sx ? dz; if 0, nop 111110********** 11011010xx00zzzz 1 dct pcopy sy,dz if dc = 1, sy ? dz; if 0, nop 111110********** 1111101000yyzzzz 1 dcf pcopy sx,dz if dc = 0, sx ? dz; if 1, nop 111110********** 11011011xx00zzzz 1 dcf pcopy sy,dz if dc = 0, sy ? dz; if 1, nop 111110********** 1111101100yyzzzz 1 pdec sx,dz msw of sx? ? msw of dz, clear lsw of dz 111110********** 10001001xx00zzzz 1 update pdec sy,dz msw of sy? ? msw of dz, clear lsw of dz 111110********** 10101001xx00zzzz 1 update dct pdec sx,dz if dc = 1, msw of sx? ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10001010xx00zzzz 1 dct pdec sy,dz if dc = 1, msw of sy? ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10101010xx00zzzz 1 dcf pdec sx,dz if dc = 0, msw of sx? ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10001011xx00zzzz 1 dcf pdec sy,dz if dc = 0, msw of sy? ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10101011xx00zzzz 1 pdmsb sx,dz sx data msb position ? msw of dz, clear lsw of dz 111110********** 10011101xx00zzzz 1 update pdmsb sy,dz sy data msb position ? msw of dz, clear lsw of dz 111110********** 1011110100yyzzzz 1 update 263 table 6.6 alphabetical listing of dsp operation instructions (cont) applicable instructions instruction operation code cycles dc bit sh-1 sh-2 sh- dsp dct pdmsb sx,dz if dc = 1, sx data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011110xx00zzzz 1 dct pdmsb sy,dz if dc = 1, sy data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011111000yyzzzz 1 dcf pdmsb sx,dz if dc = 0, sx data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011111xx00zzzz 1 dcf pdmsb sy,dz if dc = 0, sy data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011111100yyzzzz 1 pinc sx,dz msw of sx + 1 ? msw of dz, clear lsw of dz 111110********** 10011001xx00zzzz 1 update pinc sy,dz msw of sy + 1 ? msw of dz, clear lsw of dz 111110********** 1011100100yyzzzz 1 update dct pinc sx,dz if dc = 1, msw of sx + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011010xx00zzzz 1 dct pinc sy,dz if dc = 1, msw of sy + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011101000yyzzzz 1 dcf pinc sx,dz if dc = 0, msw of sx + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011011xx00zzzz 1 dcf pinc sy,dz if dc = 0, msw of sy + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011101100yyzzzz 1 plds dz,mach dz ? mach 111110********** 111011010000zzzz 1 plds dz,macl dz ? macl 111110********** 111111010000zzzz 1 dct plds dz,mach if dc = 1, dz ? mach; if 0, nop 111110********** 111011100000zzzz 1 dct plds dz,macl if dc = 1, dz ? macl; if 0, nop 111110********** 111111100000zzzz 1 dcf plds dz,mach if dc = 0, dz ? mach; if 1, nop 111110********** 111011110000zzzz 1 264 table 6.6 alphabetical listing of dsp operation instructions (cont) applicable instructions instruction operation code cycles dc bit sh-1 sh-2 sh- dsp dcf plds dz,macl if dc = 0, dz ? macl; if 1, nop 111110********** 111111110000zzzz 1 pmuls se,sf,dg msw of se msw of sf ? dg 111110********** 0100eeff0000gg00 1 pneg sx,dz 0 ?sx ? dz 111110********** 11001001xx00zzzz 1 update pneg sy,dz 0 ?sy ? dz; 111110********** 1110100100yyzzzz 1 update dct pneg sx,dz if dc = 1, 0 ?sx ? dz; if 0, nop 111110********** 11001010xx00zzzz 1 dct pneg sy,dz if dc = 1, 0 ?sy ? dz; if 0, nop 111110********** 1110101000yyzzzz 1 dcf pneg sx,dz if dc = 0, 0 ?sx ? dz; if 1, nop 111110********** 11001011xx00zzzz 1 dcf pneg sy,dz if dc = 0, 0 ?sy ? dz; if 1, nop 111110********** 1110101100yyzzzz 1 por sx,sy,dz sx | sy ? dz, clear lsw of dz 111110********** 10110101xxyyzzzz 1 update dct por sx,sy,dz if dc = 1, sx|sy ? dz, clear lsw of dz; if 0, nop 111110********** 10110110xxyyzzzz 1 dcf por sx,sy,dz if dc = 0, sx|sy ? dz, clear lsw of dz; if 1, nop 111110********** 10110111xxyyzzzz 1 prnd sx,dz sx + h'00008000 ? dz, clear lsw of dz 111110********** 10011000xx00zzzz 1 update prnd sy,dz sy + h'00008000 ? dz, clear lsw of dz 111110********** 1011100000yyzzzz 1 update psha sx,sy,dz if sy 3 0, sx << sy ? dz; if sy < 0, sx >> sy ? dz 111110********** 10010001xxyyzzzz 1 update dct psha sx,sy,dz if dc = 1 & sy 3 0, sx << sy ? dz; if dc = 1 & sy < 0, sx >> sy ? dz; if dc = 0, nop 111110********** 10010010xxyyzzzz 1 265 table 6.6 alphabetical listing of dsp operation instructions (cont) applicable instructions instruction operation code cycles dc bit sh-1 sh-2 sh- dsp dcf psha sx,sy,dz if dc = 0 & sy 3 0, sx << sy ? dz; if dc = 0 & sy < 0, sx >> sy ? dz; if dc = 1, nop 111110********** 10010011xxyyzzzz 1 psha #imm,dz if imm 3 0, dz << imm ? dz; if imm < 0, dz >> imm ? dz 111110********** 00000iiiiiiizzzz 1 update pshl sx,sy,dz if sy 3 0, sx< 266 table 6.6 alphabetical listing of dsp operation instructions (cont) applicable instructions instruction operation code cycles dc bit sh-1 sh-2 sh- dsp psub sx,sy,dz sx?y ? dz 111110********** 10100001xxyyzzzz 1 update dct psub sx,sy,dz if dc = 1, sx ?sy ? dz; if 0, nop 111110********** 10100010xxyyzzzz 1 dcf psub sx,sy,dz if dc = 0, sx ?sy ? dz; if 1, nop 111110********** 10100011xxyyzzzz 1 psub sx,sy,du pmuls se,sf,dg sx ?sy ? du; msw of se msw of sf ? dg 111110********** 0110eeffxxyygguu 1 update psubc sx,sy,dz sx?y?c ? dz 111110********** 10100000xxyyzzzz 1 update pxor sx,sy,dz sx ^ sy ? dz, clear lsw of dz 111110********** 10100101xxyyzzzz 1 update dct pxor sx,sy,dz if dc = 1, sx ^ sy ? dz, clear lsw of dz; if 0, nop 111110********** 10100110xxyyzzzz 1 dcf pxor sx,sy,dz if dc = 0, sx ^ sy ? dz, clear lsw of dz; if 1, nop 111110********** 10100111xxyyzzzz 1 note: updated based on the padd operation results dsp instructions are explained using the same form as for cpu instructions. however, in the description of operation using c, usage of the following dsp resources is presupposed: 1. dsp register definitions the dsp register names are defined based on the union named dsp_register_set noted below. this union is composed of 11 longwords; each of these longwords corresponds to one of the 11 dsp registers (a0, a1, m0, m1, x0, x1, y0, y1, ag0, ag1, dsr). /* definition of union dsp_register_set */ union { unsigned long int uli[11]; unsigned short int usi[22]; struct { struct { unsigned short int usi[2]; 267 } ee[11]; } dd; struct { struct { union { unsigned long int uli; unsigned short int usi[2]; struct { unsigned msb: 1; unsigned : 23; unsigned g_msb:1; unsigned : 7; } bb; struct { unsigned : 24; unsigned lsb8: 8; } cc; } mm; } a0, a1, m0, m1, x0, x1, y0, y1, a0g, a1g; union { unsigned long int uli; struct { unsigned reserved: 24; unsigned gz: 1; /* signed greater than */ unsigned z: 1; /* zero value */ unsigned n: 1; /* negative value */ unsigned v: 1; /* overflow */ unsigned cs: 3; /* condition selection */ unsigned dc: 1; /* dsp condition bit */ } a; } dsr; } name; struct { unsigned short int a[2][2]; unsigned short int m[2][2]; unsigned short int x[2][2]; unsigned short int y[2][2]; 268 unsigned short int ag[2][2]; unsigned short int dsr[2]; } word; } dsp_register_set; the dsp register names are defined as follows, using the union dsp_register_set noted above. /* definition of dsp register names */ #define macl dsp_register_set.name.a0.mm.uli #define a0 dsp_register_set.name.a0.mm.uli #define a0_hw dsp_register_set.name.a0.mm.usi[0] #define a0_lw dsp_register_set.name.a0.mm.usi[1] #define a0_msb dsp_register_set.name.a0.mm.bb.msb #define mach dsp_register_set.name.a1.mm.uli #define a1 dsp_register_set.name.a1.mm.uli #define a1_hw dsp_register_set.name.a1.mm.usi[0] #define a1_lw dsp_register_set.name.a1.mm.usi[1] #define a1_msb dsp_register_set.name.a1.mm.bb.msb #define m0 dsp_register_set.name.m0.mm.uli #define m0_hw dsp_register_set.name.m0.mm.usi[0] #define m0_lw dsp_register_set.name.m0.mm.usi[1] #define m0_msb dsp_register_set.name.m0.mm.bb.msb #define m1 dsp_register_set.name.m1.mm.uli #define m1_hw dsp_register_set.name.m1.mm.usi[0] #define m1_lw dsp_register_set.name.m1.mm.usi[1] #define m1_msb dsp_register_set.name.m1.mm.bb.msb #define x0 dsp_register_set.name.x0.mm.uli #define x0_hw dsp_register_set.name.x0.mm.usi[0] #define x0_lw dsp_register_set.name.x0.mm.usi[1] #define x0_msb dsp_register_set.name.x0.mm.bb.msb #define x1 dsp_register_set.name.x1.mm.uli #define x1_hw dsp_register_set.name.x1.mm.usi[0] #define x1_lw dsp_register_set.name.x1.mm.usi[1] 269 #define x1_msb dsp_register_set.name.x1.mm.bb.msb #define y0 dsp_register_set.name.y0.mm.uli #define y0_hw dsp_register_set.name.y0.mm.usi[0] #define y0_lw dsp_register_set.name.y0.mm.usi[1] #define y0_msb dsp_register_set.name.y0.mm.bb.msb #define y1 dsp_register_set.name.y1.mm.uli #define y1_hw dsp_register_set.name.y1.mm.usi[0] #define y1_lw dsp_register_set.name.y1.mm.usi[1] #define y1_msb dsp_register_set.name.y1.mm.bb.msb #define a0g dsp_register_set.name.a0g.mm.uli #define a0g_hw dsp_register_set.name.a0g.mm.usi[0] #define a0g_lw dsp_register_set.name.a0g.mm.usi[1] #define a0g_lsb8 dsp_register_set.name.a0g.mm.cc.lsb8 #define a0g_msb dsp_register_set.name.a0g.mm.bb.g_msb #define a1g dsp_register_set.name.a1g.mm.uli #define a1g_hw dsp_register_set.name.a1g.mm.usi[0] #define a1g_lw dsp_register_set.name.a1g.mm.usi[1] #define a1g_lsb8 dsp_register_set.name.a1g.mm.cc.lsb8 #define a1g_msb dsp_register_set.name.a1g.mm.bb.g_msb #define dsr dsp_register_set.name.dsr.uli additionally, the individual bits of the dsr register are defined in the same manner, using the union dsp_register_set, as follows: #define dspgtbit dsp_register_set.name.dsr.a.gt #define dspzbit dsp_register_set.name.dsr.a.z #define dspnbit dsp_register_set.name.dsr.a.n #define dspvbit dsp_register_set.name.dsr.a.v #define dspcsbits dsp_register_set.name.dsr.a.cs #define dspdcbit dsp_register_set.name.dsr.a.dc 2. alu input/output and variables representing operation results 270 the alu input/output is defined based on the union named dsp_alu_set noted below. this union is composed of six longwords. three of these longwords correspond to two inputs and one output (src1, src2, dst). the remaining three longwords are used as guard bits for these two inputs and one output (src1g, src2g, dstg). /* definition of union dsp_alu_set */ union { unsigned long int uli[6]; unsigned short int usi[12]; struct { struct { unsigned msb: 1; unsigned: 31; } src1, src2, dst; struct { union { unsigned long int uli; struct { unsigned: 24; unsigned bit7: 1; unsigned: 7; } a; struct { unsigned: 24; unsigned lsb8: 8; } b; } u; } src1g, src2g, dstg; } n; } dsp_alu_set; the alu input/output names are defined as follows, using the union dsp_alu_set noted above. /* definition of alu input/output in dsp operation instructions */ #define dsp_alu_src1 dsp_alu_set.uli[0] #define dsp_alu_src2 dsp_alu_set.uli[1] #define dsp_alu_dst dsp_alu_set.uli[2] 271 #define dsp_alu_src1g dsp_alu_set.uli[3] #define dsp_alu_src2g dsp_alu_set.uli[4] #define dsp_alu_dstg dsp_alu_set.uli[5] #define dsp_alu_src1_hw dsp_alu_set.usi[0] #define dsp_alu_src2_hw dsp_alu_set.usi[2] #define dsp_alu_dst_hw dsp_alu_set.usi[4] #define dsp_alu_src1_msb dsp_alu_set.n.src1.msb #define dsp_alu_src2_msb dsp_alu_set.n.src2.msb #define dsp_alu_dst_msb dsp_alu_set.n.dst.msb #define dsp_alu_src1g_bit7 dsp_alu_set.n.src1g.u.a.bit7 #define dsp_alu_src2g_bit7 dsp_alu_set.n.src2g.u.a.bit7 #define dsp_alu_dstg_bit7 dsp_alu_set.n.dstg.u.a.bit7 #define dsp_alu_src1g_lsb8 dsp_alu_set.n.src1g.u.b.lsb8 #define dsp_alu_src2g_lsb8 dsp_alu_set.n.src2g.u.b.lsb8 #define dsp_alu_dstg_lsb8 dsp_alu_set.n.dstg.u.b.lsb8 additionally, the variables representing operation results are defined as follows, using the definitions noted above. these variables are used to calculate the dsr register? dc bit within the description of operation of each instruction. /* definition of variables representing dsp operation results */ #define plus_op_g_ov ((~dsp_alu_src1g_bit7 && ~dsp_alu_src2g_bit7 && dsp_alu_dstg_bit7) || (dsp_alu_src1g_bit7 && dsp_alu_src2g_bit7 && ~dsp_alu_dstg_bit7)) #define minus_op_g_ov ((~dsp_alu_src1g_bit7 && dsp_alu_src2g_bit7 && dsp_alu_dstg_bit7) || (dsp_alu_src1g_bit7 && ~dsp_alu_src2g_bit7 && ~dsp_alu_dstg_bit7)) #define pos_not_ov ((dsp_alu_dstg_lsb8==0x00) && (dsp_alu_dst_msb==0x0)) #define neg_not_ov ((dsp_alu_dstg_lsb8==0xff) && (dsp_alu_dst_msb==0x1)) 3. multiplier input/output 272 the multiplier input/output is defined based on the union named dsp_mul_set noted below. this union is composed of four longwords. one longword each is allocated for the two inputs, but only the upper 16 bits of both of these (usi [0], usi [2]) are used. two longwords including guard bit usage (dst, dstg) correspond to the outputs. /* definition of union dsp_mul_set */ union { unsigned long int uli[4]; struct { unsigned short int usi[4]; struct { unsigned msb: 1; unsigned: 31; } dst; struct { unsigned: 24; unsigned lsb8: 8; } dstg; } aa; } dsp_mul_set; the multiplier input/output names are defined as follows, using the union dsp_mul_set noted above. /* definition of multiplier input/output in dsp operation instructions */ #define dsp_m_src1 dsp_mul_set.aa.usi[0] #define dsp_m_src2 dsp_mul_set.aa.usi[2] #define dsp_m_dst dsp_mul_set.uli[2] #define dsp_m_dst_msb dsp_mul_set.aa.dst.msb #define dsp_m_dstg dsp_mul_set.uli[3] #define dsp_m_dstg_lsb8 dsp_mul_set.aa.dstg.lsb8 4. variables used in the operation descriptions of other instructions, etc. the following variables are used when describing the operation of dsp operation instructions for which the dct, dcf conditions can be designated. in the above definitions, ex_dct and ex_dcf are variables that become true when the dct, dcf conditions are designated in instructions. refer to (1) dsp register definitions for dspdcbit. 273 #define dsp_unconditional_update (!ex_dct && !ex_dcf) #define dsp_condition_match ((ex_dct && dspdcbit) || (ex_dcf && !dspdcbit)) #define dsp_condition_not_match ((ex_dct && !dspdcbit)||(ex_dcf && dspdcbit)) in dsp arithmetic operations, saturation processing is performed when the sr register? saturation bit is a 1. this saturation bit is called sbit when describing the operations. additionally, the following function is defined to be used in common, to simplify the notation when describing operations: /* function used in common in descriptions of dsp operation instructions */ unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; overflow_protection() { if(sbit && overflow_bit) { /* overflow protection enable & overflow */ if(dsp_alu_dstg_bit7==0) { /* positive value */ if((dsp_alu_dstg_lsb8!=0x0) || (dsp_alu_dst_msb!=0)) { dsp_alu_dstg= 0x0; dsp_alu_dst = 0x7fffffff; } } else { /* negative value */ if((dsp_alu_dstg_lsb8!=0xff) || (dsp_alu_dst_msb!=1)) { dsp_alu_dstg= 0xff; dsp_alu_dst = 0x80000000; } } overflow_bit = 0; /* no more overflow when protected */ } } the six functions noted below are used for dsr register updating. the dc bit in the dsr register is updated in accordance with the operation results of the dsp operation instructions and the directions of the status selection bit (cs). the other bits in the dsr register are updated in accordance with the operation results of the dsp operation instructions only. 274 /* function to unconditionally update the dc bit (dspdcbit) with the borrow flag */ dc_always_borrow() { /* dc update policy: don't care the status of dspcsbits */ dspdcbit = borrow_bit; dspgtbit = ~((negative_bit ^ overflow_bit) | zero_bit); dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } /* function to unconditionally update the dc bit (dspdcbit) with the carry flag */ dc_always_carry() { /* dc update policy: don't care the status of dspcsbits */ dspdcbit = carry_bit; dspgtbit = ~((negative_bit ^ overflow_bit) | zero_bit); dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } /* function to update the dc bit (dspdcbit) upon a subtraction */ minus_dc_bit() { switch (dspcsbits) { case 0x0: /* borrow mode */ dspdcbit = borrow_bit; break; case 0x1: /* negative value mode */ dspdcbit = negative_bit; break; case 0x2: /* zero value mode */ dspdcbit = zero_bit; break; case 0x3: /* overflow mode */ dspdcbit = overflow_bit; 275 break; case 0x4: /* signed greater than mode */ dspdcbit = ~((negative_bit ^ overflow_bit) | zero_bit); break; case 0x5: /* signed greater than or equal mode */ dspdcbit = ~(negative_bit ^ overflow_bit); break; case 0x6: /* reserved */ case 0x7: /* reserved */ break; } dspgtbit = ~((negative_bit ^ overflow_bit) | zero_bit); dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } /* function to update the dc bit (dspdcbit) upon an addition */ plus_dc_bit() { switch (dspcsbits) { case 0x0: /* carry mode */ dspdcbit = carry_bit; break; case 0x1: /* negative value mode */ dspdcbit = negative_bit; break; case 0x2: /* zero value mode */ dspdcbit = zero_bit; break; case 0x3: /* overflow mode */ dspdcbit = overflow_bit; break; case 0x4: /* signed greater than mode */ dspdcbit = ~((negative_bit ^ overflow_bit) | zero_bit); break; case 0x5: /* signed greater than or equal mode */ dspdcbit = ~(negative_bit ^ overflow_bit); 276 break; case 0x6: /* reserved */ case 0x7: /* reserved */ break; } dspgtbit = ~((negative_bit ^ overflow_bit) | zero_bit); dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } /* function to update the dc bit (dspdcbit) upon a logical operation */ logical_dc_bit() { switch (dspcsbits) { case 0x0: /* carry mode */ dspdcbit = 0; break; case 0x1: /* negative value mode */ dspdcbit = negative_bit; break; case 0x2: /* zero value mode */ dspdcbit = zero_bit; break; case 0x3: /* overflow mode */ dspdcbit = 0; break; case 0x4: /* signed greater than mode */ dspdcbit = 0; break; case 0x5: /* signed greater than or equal mode */ dspdcbit = 0; break; case 0x6: /* reserved */ case 0x7: /* reserved */ break; } dspgtbit = 0; 277 dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = 0; } shift_dc_bit() { switch (dspcsbits) { case 0x0: /* carry mode */ dspdcbit = carry_bit; break; case 0x1: /* negative value mode */ dspdcbit = negative_bit; break; case 0x2: /* zero value mode */ dspdcbit = zero_bit; break; case 0x3: /* overflow mode */ dspdcbit = overflow_bit; break; case 0x4: /* signed greater than mode */ dspdcbit = 0; break; case 0x5: /* signed greater than or equal mode */ dspdcbit = 0; break; case 0x6: /* reserved */ case 0x7: /* reserved */ break; } dspgtbit = 0; dspzbit = zero_bit; dspnbit = negative_bit; dspvbit = overflow_bit; } 278 6.3.1 pabs (absolute): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pabs sx,dz if sx 3 0,sx ? dz if sx<0,0?x ? dz 111110********** 10001000xx00zzzz 1 update pabs sy,dz if sy 3 0,sy ? dz if sy<0,0?y ? dz 111110********** 1010100000yyzzzz 1 update description: finds absolute values. when the sx and sy operands are positive, the contents of the operands are stored to the dz operand. if the value is negative, the amounts of the sx and sy operand contents are subtracted from 0 and stored in the dz operand. the dc bit of the dsr register are updated according to the specifications of the cs bits. the n, z, v, and gt bits of the dsr register are updated. operation: /* case1: pabs sx,dz */ /* case2: pabs sx,dz */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit, borrow_bit; /* alu sources assignment */ dsp_alu_src1 = 0 dsp_alu_src1g = 0 if (case1) { /* pabs sx,dz */ switch (xx) {/* sx operand selection bit (xx) */ case 0x0: dsp_alu_src2 = x0; if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; break; case 0x1: dsp_alu_src2 = x1; if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; break; case 0x2: dsp_alu_src2 = a0; dsp_alu_src2g = a0g; 279 break; case 0x3: dsp_alu_src2 = a1; dsp_alu_src2g = a1g; break; } } else { /* pabs sy,dz */ switch (yy) { case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; } /* alu operation */ if(dsp_alu_src2g_bit7==0) { /* positive value */ dsp_alu_dst = 0x0 + dsp_alu_src2; carry_bit = 0; dsp_alu_dstg_lsb8= 0x0 + dsp_alu_src2g_lsb8 + carry_bit; } else { /* negative value */ dsp_alu_dst = 0x0 - dsp_alu_src2; borrow_bit = 1; dsp_alu_dstg_lsb8= 0x0 - dsp_alu_src2g_lsb8 - borrow_bit; } overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; 280 a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break case 0x7: a0 = dsp_alu_dstg; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf(?nerror: illegal dsp instruction?; break; } negative _bit = dsp_alu_dst_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dst_lsb8==0); /* dsr register update */ if(dsp_alu_src2g_bit7==0) { plus_dc_bit (); } else { overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); minus_dc_bit(); } } 281 examples: pabs x0, m0 nopx nopy ;before execution: x0=h'33333333, m0=h'12345678 ;after execution: x0=h'33333333, m0=h'33333333 pabs x1, x1 nopx nopy ;before execution: x1=h'dddddddd ;after execution: x1=h'22222223 dc bit is updated depending on the state of cs [2:0]. 282 6.3.2 [if cc]padd (addition with condition): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp padd sx,sy,dz sx+sy ? dz 111110********** 10110001xxyyzzzz 1 update dct padd sx,sy,dz if dc=1,sx+sy ? dz if 0,nop 111110********** 10110010xxyyzzzz 1 dcf padd sx,sy,dz if dc=0,sx+sy ? dz if 1,nop 111110********** 10110011xxyyzzzz 1 description: adds the contents of the sx and sy operands and stores the result in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. operation: /* padd sx,sy,dz */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; 283 dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; /* alu operation */ dsp_alu_dst = dsp_alu_src1 + dsp_alu_src2; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu _dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break case 0x7: a0 = dsp_alu_dst; 284 a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf(?nerror: illegal dsp instruction?; break; } negative _bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dst_lsb8==0); /* dsr register update */ plus_dc_bit (); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break case 0x7: a0 = dsp_alu_dstg; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; 285 case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf(?nerror: illegal dsp instruction?; break; } } } examples: padd x0,y0,a0 nopx nopy ;before execution: x0=h'22222222, y0=h'33333333, a0=h'123456789a ;after execution: x0=h'22222222, y0=h'33333333, a0=h'0055555555 in case of unconditional execution, the dc bit is updated depending on the state of the cs [2:0] bit immediately before the operation. 286 6.3.3 padd pmuls (addition & multiply signed by signed): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp padd sx,sy,du sx + sy ? du 111110********** 1 update pmuls se,sf,dg msw of se msw of sf ? dg 0111eeffxxyygguu description: adds the contents of the sx and sy operands and stores the result in the du operand. the contents of the top word of the se and sf operands are multiplied as signed and the result stored in the dg operand. these two processes are executed simultaneously in parallel. the dc bit of the dsr register is updated according to the results of the alu operation and the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated according to the results of the alu operation. note: since the pmuls is fixed decimal point multiplication, the operation result is different from that of muls even though the source data is the same. operation: /* padd sx,sy,du pmuls se,sf,dg */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit; /* multiplier sources assignment */ switch (ee) { /* se operand selection bit (ee) */ case 0x0: dsp_m_src1 = x0_hw; break; case 0x1: dsp_m_src1 = x1_hw; break; case 0x2: dsp_m_src1 = y0_hw; break; case 0x3: dsp_m_src1 = a1_hw; break; } switch (ff) { /* sf operand selection bit (ff) */ case 0x0: dsp_m_src2 = y0_hw; 287 break; case 0x1: dsp_m_src2 = y1_hw; break; case 0x2: dsp_m_src2 = x0_hw; break; case 0x3: dsp_m_src2 = a1_hw; break; } /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g_lsb8 = 0xff; else dsp_alu_src1g_lsb8 = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g_lsb8 = 0xff; else dsp_alu_src1g_lsb8 = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } 288 if (dsp_alu_src2_msb) dsp_alu_src2g_lsb8 = 0xff; else dsp_alu_src2g_lsb8 = 0x0; /* multiplier operation */ /* pmuls se, sf, dg */ if ((sbit==1) && (dsp_m_src1==0x8000) && (dsp_m_src2==0x8000)) { dsp_m_dst=0x7fffffff; /* overflow protection */ } else { dsp_m_dst=((long)(short)dsp_m_src1*(long)(short)dsp_m_src2)<<1; } if (dsp_m_dst_msb) dsp_m_dstg_lsb8 = 0xff; else dsp_m_dstg_lsb8 = 0x0; switch (gg) { /* dg operand selection bit (gg) */ case 0x0: m0 = dsp_m_dst; break; case 0x1: m1 = dsp_m_dst; break; case 0x2: a0 = dsp_m_dst; if(dsp_m_dstg_lsb8==0x0) a0g=0x0; else a0g=0xffffffff; break; case 0x3: a1 = dsp_m_dst; if(dsp_m_dstg_lsb8==0x0) a1g=0x0; else a1g=0xffffffff; break; } /* alu operation */ dsp_alu_dst = dsp_alu_src1 + dsp_alu_src2; carry_bit=((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu_dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8=dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); 289 overflow_protection(); switch (uu) { /* du operand selection bit (uu) */ case 0x0: x0 = dsp_alu_dst; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst==0); break; case 0x1: y0 = dsp_alu_dst; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst==0); break; case 0x2: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); break; case 0x3: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); break; } /* dsr register update */ plus_dc_bit(); } 290 examples: padd a0,m0,a0 pmuls x0,yo,mo nopx nopy ;before execution: x0=h'00020000, y0=h'00030000, m0=h'22222222, a0=h'0055555555 ;after execution: x0=h'00020000, y0=h'00030000, m0=h'0000000c, a0=h'0077777777 the dc bit is updated based on the result of the padd operation , depending on the state of cd [2:0]. 291 6.3.4 paddc (addition with carry): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp paddc sx, sy, dz sx + sy + dc ? dz 111110********** 10110000xxyyzzzz 1 carry description: adds the contents of the sx and sy operands to the dc bit and stores the result in the dz operand. the dc bit of the dsr register is updated as the carry flag. the n, z, v, and gt bits of the dsr register are also updated. note: the dc bit is updated as the carry flag after execution of the paddc instruction regardless of the cs bits. operation: /* padd sx,sy,dz */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (yy) { /* sy operand selection bit (yy) */ 292 case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; /* alu operation */ dsp_alu_dst = dsp_alu_src1 + dsp_alu_src2 + dspdcbit; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu_dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; 293 case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ dc_always_carry(); example: cs[2:0]=***: always operate as carry or borrow mode, regardless of the status of the dc bit. paddc x0,y0,m0 nopx nopy ;before execution: x0=h'b3333333, y0=h'55555555 m0=h' 12345678, dc=0 ;after execution: x0=h'b3333333, y0=h'55555555 m0=h'08888888, dc=1 paddc x0,y0,m0 nopx nopy ;before execution: x0=h'33333333, y0=h'55555555 m0=h' 12345678, dc=1 ;after execution: x0=h'33333333, y0=h'55555555 m0=h'88888889, dc=0 the dc bit is updated as the carry flag, regardless of the state of the cs bit. 294 6.3.5 [if cc] pand (logical and): dsp logical operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pand sx,sy,dz sx & sy ? dz; clear lsw of dz 111110********** 10010101xxyyzzzz 1 dct pand sx,sy,dz if dc = 1, sx & sy ? dz, clear lsw of dz; if 0, nop 111110********** 10010110xxyyzzzz 1 dcf pand sx,sy,dz if dc = 0, sx & sy ? dz, clear lsw of dz; if 1, nop 111110********** 10010111xxyyzzzz 1 description: does an and of the upper word of the sx operand and the upper word of the sy operand, stores the result in the upper word of the dz operand, and clears the bottom word of the dz operand with zeros. when dz is a register that has guard bits, the guard bits are also zeroed. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. note: the bottom word of the destination register and the guard bits are ignored when the dc bit is updated. operation: /* pand sx,sy,dz */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; break; case 0x1: dsp_alu_src1 = x1; break; case 0x2: dsp_alu_src1 = a0; 295 break; case 0x3: dsp_alu_src1 = a1; break; } switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } dsp_alu_dst_hw = dsp_alu_src1_hw & dsp_alu_src2_hw; if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = 0x0; /* clear guard bits */ break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = 0x0; /* clear guard bits */ break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst; 296 y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } carry_bit = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; /* dsr register update */ logical_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = 0x0; /* clear guard bits */ break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = 0x0; /* clear guard bits */ break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst; x1_lw = 0x0; /* clear lsw */ break; 297 case 0xa: y0_hw = dsp_alu_dst; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } } } example: pand x0,y0,a0 nopx nopy ;before execution: x0=h'33333333, y0=h'55555555 a0=h'123456789a ;after execution: x0=h'33333333, y0=h'55555555 a0=h'0011110000 in case of unconditional execution, the dc bit is updated depending on the state of the cs [2:0] bit immediately before the operation. 298 6.3.6 [if cc] pclr (clear): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pclr dz h'00000000 ? dz 111110********** 100011010000zzzz 1 update dct pclr dz if dc = 1, h'00000000 ? dz if 0, nop 111110********** 100011100000zzzz 1 dcf pclr dz if dc = 0, h'00000000 ? dz if 1, nop 111110********** 100011110000zzzz 1 description: clears the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the z bit of the dsr register is set to 1. the n, v, and gt bits are cleared to 0. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. operation: /* pclr dz */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit; if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = 0x0; a1g = 0x0; break; case 0x7: a0 = 0x0; a0g = 0x0; break; case 0x8: x0 = 0x0; break; case 0x9: x1 = 0x0; 299 break; case 0xa: y0 = 0x0; break; case 0xb: y1 = 0x0; break; case 0xc: m0 = 0x0; break; case 0xe: m1 = 0x0; break; default: printf("\nerror:illegal dsp instruction"); break; } carry_bit = 0; negative_bit = 0; zero_bit = 1; overflow_bit = 0; /* dsr register update */ plus_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = 0x0; a1g = 0x0; break; case 0x7: a0 = 0x0; a0g = 0x0; break; case 0x8: x0 = 0x0; break; case 0x9: x1 = 0x0; break; case 0xa: y0 = 0x0; break; case 0xb: y1 = 0x0; 300 break; case 0xc: m0 = 0x0; break; case 0xe: m1 = 0x0; break; default: printf("\nerror:illegal dsp instruction"); break; } } } example: pclr a0 nopx nopy ;before execution: a0=h'ff87654321 ;after execution: a0=h'0000000000 in case of unconditional execution, the dc bit is updated depending on the state of the cs [2:0]. 301 6.3.7 pcmp (compare two data): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pcmp sx, sy sx?y 111110********** 10000100xxyy0000 1 update description: subtracts the contents of the sy operand from the sx operand. the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. operation: /* pcmp sx,sy */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; 302 break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu_dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); /* dsr register update */ minus_dc_bit(); } examples: pcmp x0, y0 nopx nopy ;before execution: x0=h'22222222, y0=h'33333333 ;after execution: x0=h'22222222, y0=h'33333333 n=1, z=0, v=0, gt=0 dc bit is updated depending on the state of cs [2:0]. 303 6.3.8 [if cc] pcopy (copy with condition): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pcopy sx,dz sx ? dz 111110********** 11011001xx00zzzz 1 update pcopy sy,dz sy ? dz 111110********** 1111100100yyzzzz 1 update dct pcopy sx,dz if dc = 1, sx ? dz if 0, nop 111110********** 11011010xx00zzzz 1 dct pcopy sy,dz if dc = 1, sy ? dz if 0, nop 111110********** 1111101000yyzzzz 1 dcf pcopy sx,dz if dc = 0, sx ? dz if 1, nop 111110********** 11011011xx00zzzz 1 dcf pcopy sy,dz if dc = 0, sy ? dz if 1, nop 111110********** 1111101100yyzzzz 1 description: stores the sx and sy operands in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. operation: /* case1 : pcopy sx,dz */ /* case2 : pcopy sy,dz */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ if (case1) { /* pcopy sx,dz */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; 304 if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } dsp_alu_src2 = 0; dsp_alu_src2g= 0; } else { /* pcopy sy,dz */ dsp_alu_src1 = 0; dsp_alu_src1g= 0; switch (yy) { case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; } dsp_alu_dst = dsp_alu_src1 + dsp_alu_src2; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu_dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); 305 dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ plus_dc_bit(); } 306 else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dsp instruction"); break; } } } examples: pcopy x0, a0 nopx nopy ;before execution: x0=h'55555555, a0=h'ffffffff ;after execution: x0=h'55555555, a0=h'0055555555 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 307 6.3.9 [if cc] pdec (decrement by 1): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pdec sx,dz msw of sx? ? msw of dz, clear lsw of dz 111110********** 10001001xx00zzzz 1 update pdec sy,dz msw of sy? ? msw of dz, clear lsw of dz 111110********** 1010100100yyzzzz 1 update dct pdec sx,dz if dc = 1, msw of sx? ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10001010xx00zzzz 1 dct pdec sy,dz if dc = 1, msw of sy? ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1010101000yyzzzz 1 dcf pdec sx,dz if dc = 0, msw of sx? ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10001011xx00zzzz 1 dcf pdec sy,dz if dc = 0, msw of sy? ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1010101100yyzzzz 1 description: subtracts 1 from the top word of the sx and sy operands, stores the result in the upper word of the dz operand, and clears the bottom word of the dz operand with zeros. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. note: the bottom word of the destination register is ignored when the dc bit is updated. 308 operation: /* case1 : pdec sx,dz */ /* case2 : pdec sy,dz */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ dsp_alu_src2 = 0x1; dsp_alu_src2g= 0x0; if (case1) { /* msw of sx -1 ? dz */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } } else { /* msw of sy -1 ? dz */ switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src1 = y0; break; case 0x1: dsp_alu_src1 = y1; break; case 0x2: dsp_alu_src1 = m0; break; 309 case 0x3: dsp_alu_src1 = m1; break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; } dsp_alu_dst_hw = dsp_alu_src1_hw - 1; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu_dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst_hw; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst_hw; y0_lw = 0x0; /* clear lsw */ 310 break; case 0xb: y1_hw = dsp_alu_dst_hw; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst_hw; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst_hw; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst_hw==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ minus_dc_bit.c" } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst_hw; x1_lw = 0x0; /* clear lsw */ 311 break; case 0xa: y0_hw = dsp_alu_dst_hw; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst_hw; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst_hw; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst_hw; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } } } example: pdec x0,m0 nopx nopy ;before execution: x0=h'0052330f, m0=h'12345678 ;after execution: x0=h'0052330f, m0=h'00510000 pdec x1,x1 nopx nopy ;before execution: x1=h'fc342855 ;after execution: x1=h'fc330000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 312 6.3.10 [if cc] pdmsb (detect msb with condition): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pdmsb sx,dz sx data msb position ? msw of dz, clear lsw of dz 111110********** 10011101xx00zzzz 1 update pdmsb sy,dz sy data msb position ? msw of dz, clear lsw of dz 111110********** 1011110100yyzzzz 1 update dct pdmsb sx,dz if dc = 1, sx data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011110xx00zzzz 1 dct pdmsb sy,dz if dc = 1, sy data msb position ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011111000yyzzzz 1 dcf pdmsb sx,dz if dc = 0, sx data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011111xx00zzzz 1 dcf pdmsb sy,dz if dc = 0, sy data msb position ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011111100yyzzzz 1 description: finds the first position to change in the lineup of sx and sy operand bits and stores the bit position in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. 313 operation: /* case1 : pdmsb sx,dz */ /* case2 : pdmsb sy,dz */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ dsp_alu_src2 = 0x0; dsp_alu_src2g= 0x0; if (case1) { /* msb(sx) ? dz */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } } else { /* msb(sy) ? dz */ switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src1 = y0; break; case 0x1: dsp_alu_src1 = y1; break; case 0x2: dsp_alu_src1 = m0; break; 314 case 0x3: dsp_alu_src1 = m1; break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; } { short int i; unsigned char msb, src1g; unsigned long src1=dsp_alu_src1; msb= dsp_alu_src1g_bit7; src1g=(dsp_alu_src1g_lsb8 << 1); for(i=38;((msb==(src1g>>7))&&(i>=32));i--) { src1g <<= 1; } if(i==31) { for(i;((msb==(src1>>31))&&(i>=0));i--) { src1 <<= 1; } } dsp_alu_dst = 0x0; dsp_alu_dst_hw = (short int) (30-i); if (dsp_alu_dst_msb) dsp_alu_dstg_lsb8 = 0xff; else dsp_alu_dstg_lsb8 = 0x0; } carry_bit = 0; if(dsp_unconditional_update) { /* unconditional operation */ overflow_bit= 0; /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; 315 case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst_hw; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst_hw; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst_hw; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst_hw; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst_hw; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst_hw==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ plus_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ 316 a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst_hw; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst_hw; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst_hw; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst_hw; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst_hw; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } } } example: pdmsb x0,m0 nopx nopy ;before execution: x0=h'0052330f, m0=h'12345678 ;after execution: x0=h'0052330f, m0=h'00080000 pdmsb x1,x1 nopx nopy ;before execution: x1=h'fc342855 ;after execution: x1=h'00050000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 317 6.3.11 [if cc] pinc (increment by 1 with condition): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pinc sx,dz msw of sx + 1 ? msw of dz, clear lsw of dz 111110********** 10011001xx00zzzz 1 update pinc sy,dz msw of sy + 1 ? msw of dz, clear lsw of dz 111110********** 1011100100yyzzzz 1 update dct pinc sx,dz if dc = 1, msw of sx + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 10011010xx00zzzz 1 dct pinc sy,dz if dc = 1, msw of sy + 1 ? msw of dz, clear lsw of dz; if 0, nop 111110********** 1011101000yyzzzz 1 dcf pinc sx,dz if dc = 0, msw of sx + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 10011011xx00zzzz 1 dcf pinc sy,dz if dc = 0, msw of sy + 1 ? msw of dz, clear lsw of dz; if 1, nop 111110********** 1011101100yyzzzz 1 description: adds 1 to the top word of the sx and sy operands, stores the result in the upper word of the dz operand, and clears the bottom word of the dz operand with zeros. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. 318 note: the bottom word of the destination register is ignored when the dc bit is updated. operation: /* case1 : pinc sx,dz */ /* case2 : pinc sy,dz */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ dsp_alu_src2 = 0x1; dsp_alu_src2g= 0x0; if (case1) { /* msw of sx +1 ? dz */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } } else { /* msw of sy +1 ? dz */ switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src1 = y0; break; case 0x1: dsp_alu_src1 = y1; break; case 0x2: dsp_alu_src1 = m0; 319 break; case 0x3: dsp_alu_src1 = m1; break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; } dsp_alu_dst_hw = dsp_alu_src1_hw + 1; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu_dst_msb) | (dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst_hw; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst_hw; y0_lw = 0x0; /* clear lsw */ 320 break; case 0xb: y1_hw = dsp_alu_dst_hw; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst_hw; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst_hw; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst_hw==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ plus_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; 321 case 0x9: x1_hw = dsp_alu_dst_hw; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst_hw; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst_hw; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst_hw; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst_hw; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } } } example: pinc x0,m0 nopx nopy ;before execution: x0=h'0052330f, m0=h'12345678 ;after execution: x0=h'0052330f, m0=h'00530000 pinc x1,x1 nopx nopy ;before execution: x1=h'fc342855 ;after execution: x1=h'fc350000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 322 6.3.12 [if cc] plds (load system register): dsp system control instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp plds dz,mach dz ? mach 111110********** 111011010000zzzz 1 plds dz,macl dz ? macl 111110********** 111111010000zzzz 1 dct plds dz,mach if dc = 1, dz ? mach if 0, nop 111110********** 111011100000zzzz 1 dct plds dz,macl if dc = 1, dz ? macl if 0, nop 111110********** 111111100000zzzz 1 dcf plds dz,mach if dc = 0, dz ? mach if 1, nop 111110********** 111011110000zzzz 1 dcf plds dz,macl if dc = 0, dz ? macl if 1, nop 111110********** 111111110000zzzz 1 description: stores the dz operand in the mach and macl registers. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. the dc, n, z, v, and gt bits of the dsr register are not updated. note: though psts, movx, and movy can be designated in parallel, their execution may take two cycles. 323 operation: /* case1 : plds dz,mach */ /* case2 : plds dz,macl */ { if(case1){ /* dz ? mach */ if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: mach = a1; break; case 0x7: mach = a0; break; case 0x8: mach = x0; break; case 0x9: mach = x1; break; case 0xa: mach = y0; break; case 0xb: mach = y1; break; case 0xc: mach = m0; break; case 0xe: mach = m1; break; default: printf("\nerror:illegal dspinstruction"); break; } } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: mach = a1; break; case 0x7: mach = a0; break; 324 case 0x8: mach = x0; break; case 0x9: mach = x1; break; case 0xa: mach = y0; break; case 0xb: mach = y1; break; case 0xc: mach = m0; break; case 0xe: mach = m1; break; default: printf("\nerror:illegal dsp instruction"); break; } } else{ /* dz ? macl */ if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: macl = a1; break; case 0x7: macl = a0; break; case 0x8: macl = x0; break; case 0x9: macl = x1; break; case 0xa: macl = y0; break; case 0xb: macl = y1; break; case 0xc: macl = m0; break; case 0xe: macl = m1; break; 325 default: printf("\nerror:illegal dsp instruction"); break; } } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: macl = a1; break; case 0x7: macl = a0; break; case 0x8: macl = x0; break; case 0x9: macl = x1; break; case 0xa: macl = y0; break; case 0xb: macl = y1; break; case 0xc: macl = m0; break; case 0xe: macl = m1; break; default: printf("\nerror:illegal dsp instruction"); break; } } } } example: plds a0,mach nopx nopy ;before execution: a0=h'123456789a, mach=h'66666666 ; after execution: a0=h'123456789a, mach=h'3456789a 326 6.3.13 pmuls (multiply signed by signed): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pmuls se,sf,dg msw of se msw of sf ? dg 111110********** 0100eeff0000gg00 1 description: the contents of the top word of the se and sf operands are multiplied as signed and the result stored in the dg operand. the dc, n, z, v, and gt bits of the dsr register are not updated. note: since pmuls performs fixed decimal point multiplication, the operation result will be different from that of muls, which performs integer multiplication, even though the source data may be the same. operation: /* pmuls se,sf,dg */ { /* multiplier sources assignment */ switch (ee) { /* se operand selection bit (ee) */ case 0x0: dsp_m_src1 = x0_hw; break; case 0x1: dsp_m_src1 = x1_hw; break; case 0x2: dsp_m_src1 = y0_hw; break; case 0x3: dsp_m_src1 = a1_hw; break; } switch (ff) { /* sf operand selection bit (ff) */ case 0x0: dsp_m_src2 = y0_hw; break; case 0x1: dsp_m_src2 = y1_hw; break; case 0x2: dsp_m_src2 = x0_hw; break; 327 case 0x3: dsp_m_src2 = a1_hw; break; } /* multiplier operation */ if ((sbit==1) && (dsp_m_src1==0x8000) && (dsp_m_src2==0x8000)) { dsp_m_dst=0x7fffffff; /* overflow protection */ } else { dsp_m_dst=((long)(short)dsp_m_src1*(long)(short)dsp_m_src2)<<1; } if (dsp_m_dst_msb) dsp_m_dstg_lsb8 = 0xff; else dsp_m_dstg_lsb8 = 0x0; /* multiplier destination assignment */ switch (gg) { /* dg operand selection bit (gg) */ case 0x0: m0 = dsp_m_dst; break; case 0x1: m1 = dsp_m_dst; break; case 0x2: a0 = dsp_m_dst; if(dsp_m_dstg_lsb8==0x0) a0g=0x0; else a0g=0xffffffff; break; case 0x3: a1 = dsp_m_dst; if(dsp_m_dstg_lsb8==0x0) a1g=0x0; else a1g=0xffffffff; break; } } 328 examples: pmuls x0,y0,m0 nopx nopy ; before execution: x0=h'00010000, y0=h'00020000, (2 ?5 )(2 ?4 ) m0=h'33333333 ; after execution: x0=h'00010000, y0=h'00020000, m0=h'00000004 (2 ?4 ) the value is doubled when viewed as integer data. pmuls x1,y1,a0 nopx nopy ; before execution: x1=h'fffe2222, y1=h'0001aaaa, a0=h'4444444444 ; after execution: x1=h'fffe2222, y1=h'0001aaaa, a0=h'fffffffffc ( ) : fixed-point value 329 6.3.14 [if cc] pneg (negate): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pneg sx,dz 0 ?sx ? dz 111110********** 11001001xx00zzzz 1 update pneg sy,dz 0 ?sy ? dz 111110********** 1110100100yyzzzz 1 update dct pneg sx,dz if dc = 1, 0 ?sx ? dz if 0, nop 111110********** 11001010xx00zzzz 1 dct pneg sy,dz if dc = 1, 0 ?sy ? dz if 0, nop 111110********** 1110101000yyzzzz 1 dcf pneg sx,dz if dc = 0, 0 ?sx ? dz if 1, nop 111110********** 11001011xx00zzzz 1 dcf pneg sy,dz if dc = 0, 0 ?sy ? dz if 1, nop 111110********** 1110101100yyzzzz 1 description: reverses the sign. subtracts the sx and sy operands from 0 and stores the result in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. operation: /* case1 : pneg sx,dz */ /* case2 : pneg sy,dz */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; dsp_alu_src1 = 0; dsp_alu_src1g= 0; /* alu sources assignment */ if (case1) { /* 0 - sx ? dz */ 330 switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src2 = x0; if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; break; case 0x1: dsp_alu_src2 = x1; if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; break; case 0x2: dsp_alu_src2 = a0; dsp_alu_src2g = a0g; break; case 0x3: dsp_alu_src2 = a1; dsp_alu_src2g = a1g; break; } } else { /* 0 - sy ? dz */ switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; } dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu_dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; 331 dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ 332 minus_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dsp instruction"); break; } } } 333 examples: pneg x0,a0 nopx nopy ;before execution: x0=h'55555555, a0=h'a987654321 ;after execution: x0=h'55555555, a0=h'ffaaaaaaab pneg x1,y1 nopx nopy ;before execution: y1=h'99999999 ;after execution: y1=h'66666667 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 334 6.3.15 [if cc] por (logical or): dsp logical operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp por sx,sy,dz sx | sy ? dz, clear lsw of dz 111110********** 10110101xxyyzzzz 1 update dct por sx,sy,dz if dc = 1, sx | sy ? dz, clear lsw of dz; if 0, nop 111110********** 10110110xxyyzzzz 1 dcf por sx,sy,dz if dc = 0, sx | sy ? dz, clear lsw of dz; if 1, nop 111110********** 10110111xxyyzzzz 1 description: takes the or of the top word of the sx operand and the top word of the sy operand, stores the result in the top word of the dz operand, and clears the bottom word of dz with zeros. when dz is a register that has guard bits, the guard bits are also zeroed. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. note: the bottom word of the destination register and the guard bits are ignored when the dc bit is updated. 335 operation: /* por sx,sy,dz */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; break; case 0x1: dsp_alu_src1 = x1; break; case 0x2: dsp_alu_src1 = a0; break; case 0x3: dsp_alu_src1 = a1; break; } switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } dsp_alu_dst_hw = dsp_alu_src1_hw | dsp_alu_src2_hw; if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = 0x0; /* clear guard bits */ break; case 0x7: a0_hw = dsp_alu_dst_hw; 336 a0_lw = 0x0; /* clear lsw */ a0g = 0x0; /* clear guard bits */ break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } carry_bit = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; /* dsr register update */ logical_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; 337 a1_lw = 0x0; /* clear lsw */ a1g = 0x0; /* clear guard bits */ break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = 0x0; /* clear guard bits */ break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } } } example : por x0,y0,a0 nopx nopy ;before execution: x0=h'33333333, y0=h'55555555 a0=h'123456789a ;after execution: x0=h'33333333, y0=h'55555555 a0=h'127777789a in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 338 6.3.16 prnd (rounding): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp prnd sx,dz sx + h'00008000 ? dz clear lsw of dz 111110********** 10011000xx00zzzz 1 update prnd sy,dz sy + h'00008000 ? dz clear lsw of dz 111110********** 1011100000yyzzzz 1 update description: does rounding. adds the immediate data h'00008000 to the contents of the sx and sy operands, stores the result in the upper word of the dz operand, and clears the bottom word of dz with zeros. the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. operation: /* case1 : prnd sx,dz */ /* case2 : prnd sy,dz */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ dsp_alu_src2 = 0x00008000; dsp_alu_src2g= 0x0; if (case1) { /* sx + h'00008000 ? dz; clr dz lw */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; 339 case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } } else { /* sy + h'00008000 ? dz; clr dz lw */ switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src1 = y0; break; case 0x1: dsp_alu_src1 = y1; break; case 0x2: dsp_alu_src1 = m0; break; case 0x3: dsp_alu_src1 = m1; break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; } dsp_alu_dst = (dsp_alu_src1 + dsp_alu_src2) & 0xffff0000; carry_bit = ((dsp_alu_src1_msb | dsp_alu_src2_msb) & !dsp_alu_dst_msb) |(dsp_alu_src1_msb & dsp_alu_src2_msb); dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 + dsp_alu_src2g_lsb8 + carry_bit; overflow_bit= plus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; 340 case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst_hw; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst_hw; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst_hw; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst_hw; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst_hw; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst_hw==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ plus_dc_bit(); } 341 example : prnd x0,m0 nopx nopy ;before execution: x0=h'0052330f, m0=h'12345678 ; after execution: x0=h'0052330f, m0=h'00520000 prnd x1,x1 nopx nopy ;before execution: x1=h'fc34c087 ;after execution: x1=h'fc350000 dc bit is updated depending on the state of cs [2:0]. 342 6.3.17 [if cc] psha (shift arithmetically with condition): dsp arithmetic shift instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp psha sx,sy,dz if sy > = 0, sx << sy ? dz if sy < 0, sx >> sy > dz 111110********** 10010001xxyyzzzz 1 update dct psha sx,sy,dz if dc = 1 & sy > = 0, sx << sy ? dz if dc = 1 & sy < 0, sx >> sy ? dz if dc = 0, nop 111110********** 10010010xxyyzzzz 1 update dcf psha sx,sy,dz if dc = 0 & sy > = 0, sx << sy > dz if dc = 0 & sy < 0, sx >> sy ? dz if dc = 1, nop 111110********** 10010011xxyyzzzz 1 psha #imm,dz if imm > = 0, dz << imm ? dz if imm < 0, dz >> imm ? dz 111110********** 00010iiiiiiizzzz 1 description: arithmetically shifts the contents of the sx or dz operand and stores the result in the dz operand. the amount of the shift is specified by the sy operand or the immediate value imm operand. when the shift amount is positive, it shifts left. when the shift amount is negative, it shifts right. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. 343 operation: /* psha sx,sy,dz */ 344 char cnt = (dsp_alu_src2_hw & 0x003f); if(cnt > 32) { printf("\npsha sz,sy,dz \nerror! shift %2x exceed range.\n",cnt); exit(); } dsp_alu_dst = dsp_alu_src1 << cnt; dsp_alu_dstg = ((dsp_alu_src1g << cnt) | (dsp_alu_src1 >> (32-cnt))) & 0x000000ff; carry_bit = ((dsp_alu_dstg & 0x00000001)==0x1); } else { /* right shift 0< cnt <=32 */ char cnt = ((~dsp_alu_src2_hw & 0x003f)+1); if(cnt > 32) { printf("\npsha sz,sy,dz \nerror! shift -%2x exceed range.\n",cnt); exit(); } if((cnt>8) && dsp_alu_src1g_bit7) { /* msb copy */ dsp_alu_dst=((dsp_alu_src1>>8) | (dsp_alu_src1g<<(32-8))); dsp_alu_dst=(long) dsp_alu_dst >> (cnt-8); } else { dsp_alu_dst=((dsp_alu_src1>>cnt)|(dsp_alu_src1g<<(32-cnt))); } dsp_alu_dstg_lsb8 = (char) dsp_alu_src1g_lsb8 >> cnt-- ; carry_bit = (((dsp_alu_src1 >> cnt) & 0x00000001)==0x1); } overflow_bit = !(pos_not_ov || neg_not_ov); overflow_protection(); if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; 345 case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ shift_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; 346 break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dspinstruction"); break; } } } /* psha #imm,dz */ 347 case 0xc: dsp_alu_src1 = m0; break; case 0xe: dsp_alu_src1 = m1; break; default: printf("\nerror:illegal dsp instruction"); break; } if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; tmp_imm = (#imm) & 0x0000007f); /* extract 7bit immidiate data */ if((tmp_imm & 0x0040)==0) { /* left shift 0<= cnt <=32 */ char cnt = (tmp_imm & 0x003f); if(cnt > 32) { printf("\npsha dz,#imm,dz \nerror! #imm=%7x exceed range\n",tmp_imm); exit(); } dsp_alu_dst = dsp_alu_src1 << cnt; dsp_alu_dstg = ((dsp_alu_src1g << cnt) |(dsp_alu_src1 >> (32-cnt))) & 0x000000ff; carry_bit = ((dsp_alu_dstg & 0x00000001)==0x1); } else { /* right shift 0< cnt <=32 */ char cnt = ((~tmp_imm & 0x003f)+1); if(cnt > 32) { printf("\npshl dz,#imm,dz \nerror! #imm=%7x exceed range\n",tmp_imm); exit(); } if((cnt>8) && dsp_alu_src1g_bit7) { /* msb copy */ dsp_alu_dst=((dsp_alu_src1>>8) | (dsp_alu_src1g<<(32-8))); dsp_alu_dst=(long) dsp_alu_dst >> (cnt-8); } else { dsp_alu_dst=((dsp_alu_src1>>cnt)|(dsp_alu_src1g<<(32-cnt))); } dsp_alu_dstg_lsb8 = (char) dsp_alu_src1g_lsb8 >> cnt--; 348 carry_bit = (((dsp_alu_src1 >> cnt) & 0x00000001)==0x1); } overflow_bit = !(pos_not_ov || neg_not_ov); overflow_protection(); { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ shift_dc_bit(); } } 349 examples: psha x0,y0,a0 nopx nopy ;before execution: x0=h'88888888, y0=h'00020000, a0=h'123456789a ;after execution: x0=h'88888888, y0=h'00020000, a0=h'fe22222222 psha x0,y0,x0 nopx nopy ;before execution: x0=h'33333333, y0=h'ffff0000 ;after execution: x0=h'19999999, y0=h'fffe0000 psha #-5,a1 nopx nopy ;before execution: a1=h'aaaaaaaaaa ;after execution: a1=h'fd55555555 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 350 6.3.18 [if cc] pshl (shift logically with condition): dsp logical shift instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pshl sx,sy,dz if sy 3 0, sx< 351 operation: 353 default: printf("\nerror:illegal dsp instruction"); break; } carry_bit = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; /* dsr register update */ shift_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = 0x0; /* clear guard bits */ break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = 0x0; /* clear guard bits */ break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst; m0_lw = 0x0; /* clear lsw */ 354 break; case 0xe: m1_hw = dsp_alu_dst; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } } } /* pshl #imm,dz */ 355 if((tmp_imm & 0x0020)==0) { /* left shift 0<= cnt <16 */ char cnt = (tmp_imm & 0x001f); if(cnt > 16) { printf("pshl dz,#imm,dz \nerror! #imm=%6x exceed range\n",tmp_imm); exit(); } dsp_alu_dst_hw = dsp_alu_src1_hw << cnt--; carry_bit = (((dsp_alu_src1_hw << cnt) & 0x8000)==0x8000); } else { /* right shift 0< cnt <=16 */ char cnt = ((~tmp_imm & 0x001f)+1); if(cnt > 16) { printf("pshl dz,#imm,dz \nerror! #imm=%6x exceed range\n",tmp_imm); exit(); } dsp_alu_dst_hw = dsp_alu_src1_hw >> cnt--; carry_bit = (((dsp_alu_src1_hw >> cnt) & 0x0001)==0x1); } { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = 0x0; /* clear guard bits */ break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = 0x0; /* clear guard bits */ break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst; x1_lw = 0x0; /* clear lsw */ break; 356 case 0xa: y0_hw = dsp_alu_dst; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dspinstruction"); break; } carry_bit = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; /* dsr register update */ shift_dc_bit(); } } examples: pshl x0,y0,a0 nopx nopy ;before execution: x0=h'22222222, y0=h'00030000, a0=h'123456789a ;after execution: x0=h'22222222, y0=h'00030000, a0=h'0011100000 pshl x1,y1,x1 nopx nopy ;before execution: x1=h'cccccccc, y1=h'fffe0000 ;after execution: x1=h'33330000, y1=h'fffe0000 pshl #7,a1 nopx nopy ;before execution: a1=h'55555555 ;after execution: a1=h'aa800000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 357 6.3.19 [if cc] psts (store system register): dsp system control instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp psts mach,dz mach ? dz 111110********** 110011010000zzzz 1 psts macl,dz macl ? dz 111110********** 110111010000zzzz 1 dct psts mach,dz if dc = 1, mach ? dz if 0, nop 111110********** 110011100000zzzz 1 dct psts macl,dz if dc = 1, macl ? dz if 0, nop 111110********** 110111100000zzzz 1 dcf psts mach,dz if dc = 0, mach ? dz if 1, nop 111110********** 110011110000zzzz 1 dcf psts macl,dz if dc = 0, macl ? dz if 1, nop 111110********** 110111110000zzzz 1 description: stores the contents of the mach and macl registers in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. the dc, n, z, v, and gt bits of the dsr register are not updated. 358 note: though psts, movx and movy can be designated in parallel, their execution may take 2 cycles. operation: /* case1 : psts mach,dz */ /* case2 : psts macl,dz */ { if(case1){ /* mach ? dz */ if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = mach; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = mach; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = mach; break; case 0x9: x1 = mach; break; case 0xa: y0 = mach; break; case 0xb: y1 = mach; break; case 0xc: m0 = mach; break; case 0xe: m1 = mach; break; default: printf("\nerror:illegal dsp instruction"); break; } } 359 else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = mach; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = mach; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = mach; break; case 0x9: x1 = mach; break; case 0xa: y0 = mach; break; case 0xb: y1 = mach; break; case 0xc: m0 = mach; break; case 0xe: m1 = mach; break; default: printf("\nerror:illegal dsp instruction"); break; } } else{ /* macl ? dz */ if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = macl; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = macl; 360 a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = macl; break; case 0x9: x1 = macl; break; case 0xa: y0 = macl; break; case 0xb: y1 = macl; break; case 0xc: m0 = macl; break; case 0xe: m1 = macl; break; default: printf("\nerror:illegal dsp instruction"); break; } } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = macl; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = macl; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = macl; break; case 0x9: x1 = macl; break; case 0xa: y0 = macl; break; 361 case 0xb: y1 = macl; break; case 0xc: m0 = macl; break; case 0xe: m1 = macl; break; default: printf("\nerror:illegal dsp instruction"); break; } } } } examples: psts mach,a0 nopx nopy ;before execution: a0=h'123456789a, mach=h'88888888 ;after execution: a0=h'ff88888888, mach=h'88888888 362 6.3.20 [if cc]psub (subtract with condition): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp psub sx,sy,dz sx ?sy ? dz 111110********** 10100001xxyyzzzz 1 update dct psub sx,sy,dz if dc = 1, sx ?sy ? dz if 0, nop 111110********** 10100010xxyyzzzz 1 dcf psub sx,sy,dz if dc = 0, sx ?sy ? dz if 1, nop 111110********** 10100011xxyyzzzz 1 description: subtracts the contents of the sy operand from the sx operand and stores the result in the dz operand. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. 363 operation: /* psub sx,sy,dz */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2; 364 carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu_dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dsp instruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); 365 /* dsr register update */ minus_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dspinstruction"); break; } } } 366 examples: psub x0,y0,a0 nopx nopy ;before execution: x0=h'55555555, y0=h'33333333, a0=h'123456789a ;after execution: x0=h'55555555, y0=h'33333333, a0=h'0022222222 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 367 6.3.21 psub pmuls (subtraction & multiply signed by signed): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp psub sx,sy,du sx ?sy ? du 111110********** 1 update pmuls se,sf,dg msw of se msw of sf ? dg 0110eeffxxyygguu description: subtracts the contents of the sy operand from the sx operand and stores the result in the du operand. the contents of the top word of the se and sf operands are multiplied as signed and the result stored in the dg operand. these two processes are executed simultaneously in parallel. the dc bit of the dsr register is updated according to the results of the alu operation and the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated according to the results of the alu operation. operation: /* psub sx,sy,du pmuls se,sf,dg */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; /* multiplier sources assignment */ switch (ee) { /* se operand selection bit (ee) */ case 0x0: dsp_m_src1 = x0_hw; break; case 0x1: dsp_m_src1 = x1_hw; break; case 0x2: dsp_m_src1 = y0_hw; break; case 0x3: dsp_m_src1 = a1_hw; break; } switch (ff) { /* sf operand selection bit (ff) */ 368 case 0x0: dsp_m_src2 = y0_hw; break; case 0x1: dsp_m_src2 = y1_hw; break; case 0x2: dsp_m_src2 = x0_hw; break; case 0x3: dsp_m_src2 = a1_hw; break; } /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g_lsb8 = 0xff; else dsp_alu_src1g_lsb8 = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g_lsb8 = 0xff; else dsp_alu_src1g_lsb8 = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; 369 } if (dsp_alu_src2_msb) dsp_alu_src2g_lsb8 = 0xff; else dsp_alu_src2g_lsb8 = 0x0; /* multiplier operation */ /* pmuls se, sf, dg */ if ((sbit==1) && (dsp_m_src1==0x8000) && (dsp_m_src2==0x8000)) { dsp_m_dst=0x7fffffff; /* overflow protection */ } else { dsp_m_dst=((long)(short)dsp_m_src1*(long)(short)dsp_m_src2)<<1; } if (dsp_m_dst_msb) dsp_m_dstg_lsb8 = 0xff; else dsp_m_dstg_lsb8 = 0x0; switch (gg) { /* dg operand selection bit (gg) */ case 0x0: m0 = dsp_m_dst; break; case 0x1: m1 = dsp_m_dst; break; case 0x2: a0 = dsp_m_dst; if(dsp_m_dstg_lsb8==0x0) a0g=0x0; else a0g=0xffffffff; break; case 0x3: a1 = dsp_m_dst; if(dsp_m_dstg_lsb8==0x0) a1g=0x0; else a1g=0xffffffff; break; } /* alu operation */ dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2; carry_bit=((dsp_alu_src1_msb | !dsp_alu_src2_msb)&& !dsp_alu_dst_msb)| (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8=dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; 370 overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); switch (uu) { /* du operand selection bit (uu) */ case 0x0: x0 = dsp_alu_dst; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst==0); break; case 0x1: y0 = dsp_alu_dst; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst==0); break; case 0x2: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); break; case 0x3: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); break; } /* dsr register update */ minus_dc_bit(); } 371 examples: psub a0,m0,a0 pmuls x0,y0, m0 nopx nopy ;before execution: x0=h'00020000, y0=h'fffe0000, m0=h'33333333, a0=h'0022222222 ;after execution: x0=h'00020000, y0=h'fffe0000, m0=h'fffffff8, a0=h'55555555 372 6.3.22 psubc (subtraction with carry): dsp arithmetic operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp psubc sx,sy,dz sx ?sy ?dc ? dz 111110********** 10100000xxyyzzzz 1 borrow description: subtracts the contents of the sy operand and the dc bit from the sx operand and stores the result in the dz operand. the dc bit of the dsr register is updated as the borrow flag. the n, z, v, and gt bits of the dsr register are also updated. note: after the psubc instruction is executed, the dc bit is updated as the borrow flag without regard to the cs bit. operation: /* psubc sx,sy,dz */ { unsigned char carry_bit, borrow_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x1: dsp_alu_src1 = x1; if (dsp_alu_src1_msb) dsp_alu_src1g = 0xff; else dsp_alu_src1g = 0x0; break; case 0x2: dsp_alu_src1 = a0; dsp_alu_src1g = a0g; break; case 0x3: dsp_alu_src1 = a1; dsp_alu_src1g = a1g; break; } 373 switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } if (dsp_alu_src2_msb) dsp_alu_src2g = 0xff; else dsp_alu_src2g = 0x0; dsp_alu_dst = dsp_alu_src1 - dsp_alu_src2 - dspdcbit; carry_bit =((dsp_alu_src1_msb | !dsp_alu_src2_msb) && !dsp_alu_dst_msb) | (dsp_alu_src1_msb & !dsp_alu_src2_msb); borrow_bit = !carry_bit; dsp_alu_dstg_lsb8 = dsp_alu_src1g_lsb8 - dsp_alu_src2g_lsb8 - borrow_bit; overflow_bit= minus_op_g_ov || !(pos_not_ov || neg_not_ov); overflow_protection(); /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1 = dsp_alu_dst; a1g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a1g = a1g | 0xffffff00; break; case 0x7: a0 = dsp_alu_dst; a0g = dsp_alu_dstg & 0x000000ff; if(dsp_alu_dstg_bit7) a0g = a0g | 0xffffff00; break; case 0x8: x0 = dsp_alu_dst; break; case 0x9: x1 = dsp_alu_dst; break; case 0xa: y0 = dsp_alu_dst; break; case 0xb: y1 = dsp_alu_dst; 374 break; case 0xc: m0 = dsp_alu_dst; break; case 0xe: m1 = dsp_alu_dst; break; default: printf("\nerror:illegal dspinstruction"); break; } negative_bit = dsp_alu_dstg_bit7; zero_bit = (dsp_alu_dst==0) & (dsp_alu_dstg_lsb8==0); /* dsr register update */ dc_always_borrow(); } example : cs[2:0]=***: always carry or borrow mode psubc x0,y0,m0 nopx nopy ;before execution: x0=h'33333333, y0=h'55555555 m0=h'00 12345678, dc=0 ;after execution: x0=h'33333333, y0=h'55555555 m0=h'ffddddddde, dc=1 psubc x0,y0,m0 nopx nopy ;before execution: x0=h'33333333, y0=h'55555555 m0=h'00 12345678, dc=1 ;after execution: x0=h'33333333, y0=h'55555555 m0=h'ffdddddddd, dc=1 375 6.3.23 [if cc] pxor (logical exclusive or): dsp logical operation instruction applicable instructions format abstract code cycle dc bit sh-1 sh-2 sh- dsp pxor sx,sy,dz sx ^ sy ? dz, clear lsw of dz 111110********** 10100101xxyyzzzz 1 update dct pxor sx,sy,dz if dc = 1, sx^sy ? dz, clear lsw of dz; if 0, nop 111110********** 10100110xxyyzzzz 1 dcf pxor sx,sy,dz if dc = 0, sx^sy ? dz clear lsw of dz; if 1, nop 111110********** 10100111xxyyzzzz 1 description: takes the exclusive or of the top word of the sx operand and the top word of the sy operand, stores the result in the top word of the dz operand, and clears the bottom word of dz with zeros. when dz is a register that has guard bits, the guard bits are also zeroed. when conditions are specified for dct and dcf, the instruction is executed when those conditions are true. when they are false, the instruction is not executed. when conditions are not specified, the dc bit of the dsr register is updated according to the specifications for the cs bits. the n, z, v, and gt bits of the dsr register are also updated. if conditions are specified, the dc, n, z, v, and gt bits are not updated even is the conditions were true and the instruction was executed. note: the bottom word of the destination register and the guard bits are ignored when the dc bit is updated. 376 operation: /* pxor sx,sy,dz */ { unsigned char carry_bit, negative_bit, zero_bit, overflow_bit; /* alu sources assignment */ switch (xx) { /* sx operand selection bit (xx) */ case 0x0: dsp_alu_src1 = x0; break; case 0x1: dsp_alu_src1 = x1; break; case 0x2: dsp_alu_src1 = a0; break; case 0x3: dsp_alu_src1 = a1; break; } switch (yy) { /* sy operand selection bit (yy) */ case 0x0: dsp_alu_src2 = y0; break; case 0x1: dsp_alu_src2 = y1; break; case 0x2: dsp_alu_src2 = m0; break; case 0x3: dsp_alu_src2 = m1; break; } dsp_alu_dst_hw = dsp_alu_src1_hw ^ dsp_alu_src2_hw; if(dsp_unconditional_update) { /* unconditional operation */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; a1_lw = 0x0; /* clear lsw */ a1g = 0x0; /* clear guard bits */ break; case 0x7: a0_hw = dsp_alu_dst_hw; 377 a0_lw = 0x0; /* clear lsw */ a0g = 0x0; /* clear guard bits */ break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } carry_bit = 0x0; negative_bit = dsp_alu_dst_msb; zero_bit = (dsp_alu_dst_hw==0); overflow_bit = 0x0; /* dsr register update */ logical_dc_bit(); } else if(dsp_condition_match) { /* conditional operation and match */ /* alu destination assignment */ switch (zzzz) { /* dz operand selection bit (zzzz) */ case 0x5: a1_hw = dsp_alu_dst_hw; 378 a1_lw = 0x0; /* clear lsw */ a1g = 0x0; /* clear guard bits */ break; case 0x7: a0_hw = dsp_alu_dst_hw; a0_lw = 0x0; /* clear lsw */ a0g = 0x0; /* clear guard bits */ break; case 0x8: x0_hw = dsp_alu_dst_hw; x0_lw = 0x0; /* clear lsw */ break; case 0x9: x1_hw = dsp_alu_dst; x1_lw = 0x0; /* clear lsw */ break; case 0xa: y0_hw = dsp_alu_dst; y0_lw = 0x0; /* clear lsw */ break; case 0xb: y1_hw = dsp_alu_dst; y1_lw = 0x0; /* clear lsw */ break; case 0xc: m0_hw = dsp_alu_dst; m0_lw = 0x0; /* clear lsw */ break; case 0xe: m1_hw = dsp_alu_dst; m1_lw = 0x0; /* clear lsw */ break; default: printf("\nerror:illegal dsp instruction"); break; } } } 379 example: pxor x0,y0,a0 nopx nopy ;before execution: x0=h'33333333, y0=h'55555555 a0=h'123456789a ;after execution: x0=h'33333333, y0=h'55555555 a0=h'0066660000 in case of unconditional execution, the dc bit is updated depending on the state of cs [2:0]. 381 section 7 pipeline operation this section describes the operation of the pipelines for each instruction. this information is provided to allow calculation of the required number of cpu instruction execution states (system clock cycles). 7.1 basic configuration of pipelines 7.1.1 the five-stage pipeline pipelines are composed of the following five stages: 1. if (instruction fetch) fetches instruction from the memory where the program is stored. 2. id (instruction decode) decodes the instruction fetched. 3. ex (instruction execution) does data operations and address calculations according to the results of decoding. 4. ma (memory access) accesses data in memory. generated by instructions that involve memory access, with some exceptions. 5. wb/dsp (w/d) (write back (cpu core) or dsp (dsp unit)) write back: returns the results of the memory access (data) to a register. generated by instructions that involve memory loads, with some exceptions. dsp: does operations using the dsp unit? alu and mac. also, the results of memory accesses (data) are returned to registers; not generated during writes to memory or no operation (nop). these stages flow with the execution of the instructions and thereby constitute a pipeline. at a given instant, five instructions are being executed simultaneously. the basic pipeline flow is as shown in figure 7.1. the period in which a single stage is operating is called a slot and is indicated by two-way arrows ( ??) . all instructions have at least the 3 stages if, id and ex, but not all have stages ma and wb/dsp. the way the pipeline flows also varies with the type of instruction. some pipelines differ, however, because of contention between if and ma. 382 instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 if id if ex id if ma ex id if wb/dsp ma ex id if wb/dsp ma ex id wb/dsp ma ex wb/dsp ma wb/dsp time : slot instruction stream figure 7.1 basic structure of pipeline flow 7.1.2 slot and pipeline flow the time period in which a single stage operates called a slot. slots must follow the rules described below. all stages (if, id, ex, ma, wb/dsp) of an instruction must be executed in 1 slot. two or more stages cannot be executed within 1 slot. since wb/dsp is executed immediately after ma, however, some instructions may execute ma and wb/dsp within the same slot. figures 7.2 and 7.3 show impossible pipeline flows. instruction execution: each stage (if, id, ex, ma, wb/dsp) of an instruction must be executed in one slot. two or more stages cannot be executed within one slot (figure 7.2), with exception of wb and ma. since wb is executed immediately after ma, however, some instructions may execute ma and wb within the same slot. instruction 1 instruction 2 if id if ex id ex ma w/d : slot note: id and ex of instruction 1 are executed in the same slot. figure 7.2 impossible pipeline flow 1 slot sharing: a maximum of one stage from another instruction may be set per slot, and that stage must be different from the stage of the first instruction. identical stages from two different instructions may never be executed within the same slot (figure 7.3). 383 instruction 1 instruction 2 instruction 3 instruction 4 instruction 5 if id if ex id if ma ex id if w/d ma ex id if w/d ma ex id w/d ma ex w/d ma w/d : slot note: same stage of another instruction is being executed in same slot. figure 7.3 impossible pipeline flow 2 7.1.3 slot length the number of states (system clock cycles) s for the execution of one slot is calculated with the following conditions: s = (the cycles of the stage with the highest number of cycles of all instruction stages contained in the slot). this means that the instruction with the longest stage stalls others with shorter stages. the number of execution cycles for each stage: ? if the number of memory access cycles for instruction fetch ? id always one cycle ? ex always one cycle ? ma the number of memory access cycles for data access ? wb/dsp always one cycle as an example, figure 7.4 shows the flow of a pipeline in which the if (memory access for instruction fetch) of instructions 1 and 2 are two cycles, the ma (memory access for data access) of instruction 1 is three cycles and all others are one cycle. the dashes indicate the instruction is being stalled. instruction 1 instruction 2 (2) if (2) id if if (1) ex id ma ex : slot (3) ma (1) w/d ma (1) w/d ma if number of cycles figure 7.4 slots requiring multiple cycles 384 7.1.4 number of instruction execution cycles the number of instruction execution cycles is counted as the interval between execution of ex stages. the number of cycles between the start of the ex stage for instruction 1 and the start of the ex stage for the following instruction (instruction 2) is the execution time for instruction 1. for example, in a pipeline flow like that shown in figure 7.5, the ex stage interval between instructions 1 and 2 is five cycles, so the execution time for instruction 1 is five cycles. since the interval between ex stages for instructions 2 and 3 is one cycle, the execution time of instruction 2 is one cycle. if a program ends with instruction 3, the execution time for instruction 3 should be calculated as the interval between the ex stage of instruction 3 and the ex stage of a hypothetical instruction 4, using a mov rm, rn that follows instruction 3. (in figure 7.5, the execution time of instruction 3 would thus be one cycle.) in this example, the ma of instruction 1 and the if of instruction 4 are in contention. for operation during the contention between the ma and if, see section 7.2.1, contention between instruction fetch (if) and memory access (ma). instruction 1 instruction 2 instruction 3 (instruction 4 (2) if (2) id if if (2) ex id if if : slot if ma ma ma w/d ex ?d if (1) ex id (1) ma ex (4) : mov rm, rn ) figure 7.5 method for counting instruction execution cycles 385 7.2 contention contention occurs in four cases. when it occurs, the slot splits and requires at least two cycles. 1. contention between instruction fetch (if) and memory access (ma) 2. contention when the previous instruction? destination register is used 3. multiplier access contention 4. contention between memory stores (ma) and either dsp operations or memory loads (wb/dsp) 7.2.1 contention between instruction fetch (if) and memory access (ma) basic operation when if and ma are in contention (common): the if and ma stages both access memory, so they cannot operate simultaneously. when the if and ma stages both try to access memory within the same slot, the slot splits as shown in figure 7.6. when there is a wb, it is executed immediately after the ma ends. instruction 2 instruction 3 instruction 4 instruction 5 if if ex id if ex id w/d ma ex id w/d ex : slot instruction 1 id ma if id ex if bcde f ag ma of instruction 1 and if of instruction 4 contend at d ma of instruction 2 and if of instruction 5 contend at e instruction 2 instruction 3 instruction 4 instruction 5 if if ex id if ex w/d ma id id w/d ex : slot instruction 1 id ma if id ex if bcd e f ag split at d split at e when ma and if are in contention, the following occurs: ex figure 7.6 operation when if and ma are in contention the slots in which ma and if contend are split into two cycles. ma is given priority to execute in the first half (when there is a wb, it immediately follows the ma), and the ex, id, and if are executed simultaneously in the latter half. for example, in figure 7.6 the ma of instruction 1 is 386 executed in slot d while the ex of instruction 2, the id of instruction 3 and if of instruction 4 are executed simultaneously thereafter. in slot e, the ma of instruction 2 is given priority and the ex of instruction 3, the id of instruction 4 and the if of instruction 5 executed thereafter. the number of cycles for a slot in which ma and if are in contention is the sum of the number of memory access cycles for the ma and the number of memory access cycles for the if. the relationship between if and the location of instructions in on-chip rom/ram or on-chip memory (sh1 and sh2): when the instruction is located in the on-chip memory (rom or ram) or on-chip cache of the superh microcomputer, the superh microcomputer accesses the on-chip memory in 32-bit units. the superh microcomputer instructions are all fixed at 16 bits, so basically 2 instructions can be fetched in a single if stage access. if an instruction is located on a longword boundary, an if can get two instructions at each instruction fetch. the if of the next instruction does not generate a bus cycle to fetch an instruction from memory. since the next instruction if also fetches two instructions, the instruction ifs after that do not generate a bus cycle either. this means that ifs of instructions that are located so they start from the longword boundaries within instructions located in on-chip memory (the position when the bottom two bits of the instruction address are 00 is a1 = 0 and a0 = 0) also fetch two instructions. the if of the next instruction does not generate a bus cycle. ifs that do not generate bus cycles are written in lower case as ?f? these ?f? always take one state. when branching results in a fetch from an instruction located so it starts from the word boundaries (the position when the bottom two bits of the instruction address are 10 is a1 = 1, a0 = 0), the bus cycle of the if fetches only the specified instruction more than one of said instructions. the if of the next instruction thus generates a bus cycle, and fetches two instructions. figure 7.7 illustrates these operations. 387 instruction 2 ... instruction 3 instruction 4 ... instruction 5 if if ex id if ex id ex id ex ... instruction 1 id if id ex if : slot instruction 6 instruc- tion 1 instruc- tion 2 instruc- tion 3 instruc- tion 4 instruc- tion 5 instruc- tion 6 id ex if 32 bits (on-chip memory or on-chip cache) ... instruction 2 ... instruction 3 instruction 4 ... instruction 5 if ex if id ex id ex id if id ex if : slot instruction 6 id ex if instruc- tion 2 instruc- tion 3 instruc- tion 4 instruc- tion 5 instruc- tion 6 fetching from an instruction (instruction 1) located on a longword boundary fetching from an instruction (instruction 2) located on a word boundary if if : bus cycle generated : no bus cycle if if : bus cycle generated : no bus cycle figure 7.7 relationship between if and location of instructions in on-chip memory relationship between position of instructions located in on-chip rom/ram or on-chip memory and contention between if and ma (sh-1 and sh-2): when an instruction is located in on-chip memory (rom/ram) or on-chip cache, there are instruction fetch stages (?f?written in lower case) that do not generate bus cycles as explained in section 7.4.2 above. when an if is in contention with an ma, the slot will not split, as it does when an if and an ma are in contention, because ifs and mas can be executed simultaneously. such slots execute in the number of states the ma requires for memory access, as illustrated in figure 7.8. when programming, avoid contention of ma and if whenever possible and pair mas with ifs to increase the instruction execution speed. instructions that have 4 (5)-stage pipelines of if, id, ex, ma, (wb) prevent stalls when they start from the longword boundaries in on-chip memory (the position when the bottom 2 bits of instruction address are 00 is a1 = 0 and a0 = 0) because the ma of the instruction falls in the same slot as ifs that follow. 388 instruction 2 ... instruction 3 instruction 4 ... instruction 5 if if ex id if ex id ?d ... instruction 1 id if if id : slot instruction 6 instruc- tion 1 instruc- tion 2 instruc- tion 3 instruc- tion 4 instruc- tion 5 instruc- tion 6 id ex if if if : splits : does not split 32 bits (on-chip memory or on-chip cache) ma wb ma wb ex ex ex ab ma in slot a is in contention with an if, so no split occurs. ma in slot b is in contention with an if, so it splits. figure 7.8 relationship between the location of instructions in on-chip memory and contention between if and ma relationship between position of instructions located in on-chip memory and contention between if and ma: when an instruction is located in on-chip memory, there are instruction fetch stages (?f? written in lower case) that do not generate bus cycles. when an if is in contention with an ma, the slot will not split, as it does when an if and an ma are in contention, because ifs and mas can be executed simultaneously. such slots execute in the number of cycles the ma requires for memory access. when programming, avoid contention of ma and if whenever possible and pair mas with ifs to increase the instruction execution speed. 389 7.2.2 contention when the previous instruction? destination register is used relationship between load instructions and the instructions that follow: instructions that involve loading from memory return data to the destination register during the wb/dsp stage, which comes at the end of the pipeline. the wb/dsp stage of such a load instruction (load instruction 1) will thus not have ended before after the ex stage of the instruction that immediately follows it (instruction 2) begins. when instruction 2 uses the same destination register as load instruction 1, the contents of that register will not be ready, so any slot containing the ma of instruction 1 and ex of instruction 2 will split. when the destination register of load instruction 1 is the same as the destination, not the source, of instruction 2 it will still split. when the destination of load instruction 1 is the status register (sr) and the flag in it is fetched by instruction 2 (as addc does), a split occurs. no split occurs, however, in the following cases: when instruction 2 is a load instruction and its destination is the same as that of load instruction? when instruction 2 is mac @rm+,@rn+ and the destinations of rm and load instruction 1 were the same the number of cycles in the slot generated by the split is the number of ma cycles plus the number of if (or if) cycles, as shown in figure 7.9. this means the execution speed will be lowered if the instruction that will use the results of the load instruction is placed immediately after the load instruction. the instruction that uses the result of the load instruction will not slow down the program if placed one or more instructions after the load instruction. instruction 2 (add rb,rc) instruction 3 instruction 4 if if ex id if w/d ex id if id : slot load instruction 1 (mov @ra,rb) id ma ex ma w/d ex ma w/d figure 7.9 effects of memory load instructions on the pipeline (1) when data is loaded to a register in the previous instruction and the following memory access instruction uses that register as an address pointer, the memory access is extended until the data load of the ma stage of the previous instruction ends. 390 instruction 2 (mov @rb,rc) instruction 3 instruction 4 if if ex id ma w/d if w/d ex id if id : slot load instruction 1 (mov @ra,rb) id ma ex ma w/d ex ma w/d figure 7.10 effects of memory load instructions on the pipeline (2) in the dsp unit, all operation instructions are executed in the wb/dsp stage, so transfers and operations do not contend. when the destination of the previous mov instruction is used as the address pointer for the following instruction, however, contention can occur. instruction 2 (padd x0,y0,a0) instruction 3 instruction 4 if if ex id ma w/d if w/d id if id : slot load instruction 1 (movx @ra,x0) id ma ex ma w/d ex ma w/d ex figure 7.11 effects of memory load instructions in the dsp unit on the pipeline relationship between data operation instructions and store instructions: when dsp operations are executed by the dsp unit and the results are stored in memory by the next instruction, contention occurs just as with memory load instructions. in such cases, the data store of the ma stage of the following instruction is extended until the data operation of the wb/dsp stage of the previous instruction ends. since the operation is executed in the ex stage by the cpu core, however, no stall cycle is produced. figure 7.12 shows the relationship between dsp unit data operation instructions and store instructions; figure 7.13 shows the relationship to the cpu core. instruction 1 (padd x0,y0,a0) instruction 3 instruction 4 if if ex id ma w/d if w/d id if id : slot instruction 2 (movx a0,@ra) id ma ex ma w/d ex ma w/d ex figure 7.12 relationship between dsp engine operation instructions and store instructions 391 instruction 2 (mov rb,@rc) instruction 3 instruction 4 if if ex id if w/d ex id if id : slot instruction 1 (add ra,rb) id ma ex ma w/d ma w/d ex ma w/d figure 7.13 relationship between cpu core operation instructions and store instructions relationship between load and store instructions: when data is loaded from memory to the destination register and the register is then specified as the source operand for a following store instruction, the preceding instruction? load is executed in the wb/dsp stage and the following instruction? store is executed in the ma stage. these stages are executed in exactly the same cycle. nevertheless, they do not contend. the cpu core and dsp unit use the same data transfer method. in this case, when the data input to the internal bus is stored to the destination register, the same data is simultaneously output again to the internal bus. instruction 2 (mov.l rn,@rb) instruction 3 instruction 4 if if ex id if w/d ex id if id : slot instruction 1 (mov.l @ra,rn) id ma ex ma w/d ma w/d ex ma w/d figure 7.14 relationship between load and store instructions in the cpu core instruction 2 (movs.l ds,@r5) instruction 3 instruction 4 if if ex id if w/d ex id if id : slot instruction 1 (movs.l @r4,ds) id ma ex ma w/d ma w/d ex ma w/d figure 7.15 relationship between load and store instructions in the dsp unit relationship between mac and sts instructions: the mac.w instruction has two ma stages and two mm (multiplier access) stages. when an sts instruction that stores a macl or mach register in the rn register comes after a mac.w instruction, the ma stage of the sts instruction is executed after the mm stage of the mac.w instruction ends. likewise, when an sts instruction that stores a macl or mach register in memory comes after a mac.w instruction, the ma stage of the sts instruction is executed after the mm stage of the mac.w instruction ends. 392 instruction 2 (sts macl,rc) instruction 3 if if ex id if ex id : slot instruction 1 (mac.w @ra+,@rb+) id ma ma mm mm ex ma w/d ma w/d figure 7.16 relationship between mac.w and sts instructions sts.l mac.l memory next instruction if if ex ?d if id mac.l slot id ex ma ma ex ma mm ma mm mm mm figure 7.17 example of multiplier access contention?ac.l and sts.l instructions 7.2.3 multiplier access contention instructions that access multiplier type instructions (multiply/accumulate instructions and multiplication instructions) or the multiply and accumulate calculation registers (mach and macl) contend with multiplier accesses. in multiplier type instructions, the multiplier operates for either four cycles (for double-length 64 bits instructions) or two cycles (single-length 32 bit instructions) after the ma ends, regardless of the slot. when the ma (or the second ma, if there are two) of a multiplier type instruction (multiply/accumulate instructions and multiplication instructions) contends with the multiplier access (mm) of the previous multiplier type instruction, the bus cycle of the ma is extended until the mm ends. the extended ma becomes a single slot. the id of the instruction following a double-length instruction also stalls until one slot later. multiplier type instructions and instructions that access the multiply and accumulate calculation registers have ma stages, so they also contend with ifs. figure 7.18 shows an example of multiplier access contention, but it does not address ma and if contention. mac.l next instruction if if ex ?d if mac.l slot id ex m a ma mm mm mm mm id ex ma ... ma mm ma mm mm mm figure 7.18 example of multiplier access contention?ac.l and mac.l instructions 393 7.2.4 contention between memory stores and dsp operations when an instruction that will store the result of a dsp operation instruction is written immediately after the dsp operation instruction is executed, the execution will be too late. to prevent this, a stall cycle is inserted. for more information, see section 4.17.2, single data transfers. 7.3 programming guide 7.3.1 types of contention and affected instructions types of contention and the instructions they affect are summarized below. instructions without contention instructions with memory accesses (ma) that contend with instruction fetches (if) instructions that store the result of the immediately preceding dsp operation in memory using the x bus or y bus instructions with memory accesses (ma) that contend with instruction fetches (if), also have write backs (wb/dsp), and may cause contention with memory loads instructions with memory accesses (ma) that contend with instruction fetches (if), also access the multiplier (mm), and may cause contention with the multiplier instructions that store dsp operation results in memory, because the memory access (ma) contends with an instruction fetch (if) instructions with memory accesses (ma) that contend with instruction fetches (if), access the multiplier (mm), and may cause contention with the multiplier, and also have write backs (wb/dsp) and may cause contention with memory loads instructions that cause contention with mov.x, mov.y, or movs.l instructions 394 table 7.1 shows the correspondence between types of contention and instructions. table 7.1 types of contention and instructions contention cycles stages instructions none 1 3 inter-register transfer instructions inter-register operations (except multiplier type instructions) inter-register logic operation instructions shift instructions system control alu instructions 2 3 unconditional branch instructions 3/1 3 conditional branch instructions 2/1 3 delayed conditional branch instruction 3 3 sleep instruction 4 5 rte instruction 8 9 trap instruction 1 5 dsp operation instructions movx.w (load) and movy.w (load) instructions ma contends with if 1 4 memory store instructions sts.l instruction (pr) 2 4 stc.l instruction 3 6 memory logic operations 4 6 tas instruction 1 5 movs.w (load) and movs.l (load) instructions causes dsp operation contention 1 4 movx.w (store) and movy.w (store) instructions ma contends with if causes memory load contention 1 5 memory load instructions lds.l instruction (pr) 3 5 ldc.l instruction ma contends with if causes multiplier contention 1 4 register to mac transfer instructions (mach/macl) memory to mac transfer instructions (mach/macl) mac to memory transfer instructions (mach/macl) 1 (to 3)* 6 multiplication instructions 395 table 7.1 types of contention and instructions (cont) contention cycles stages instructions ma contends with if causes multiplier contention (cont) 2 (to 3)* 7 multiply and accumulate calculation instructions 2 (to 4)* 9 double-length multiplication instructions 2 (to 4)* 9 double-length multiply and accumulate calculation instructions ma contends with if causes dsp operation contention 1 4 movs.w (store) and movs.l (store) instructions ma contends with if causes multiplier contention causes dsp operation contention causes memory load contention 1 5 sts instruction (except pr) causes movx.w, movy.w, movs.w or movs.l instruction contention 1 5 plds and psts instructions note: indicates the normal number of cycles. the figures in parentheses are the cycles when contention also occurs with the previous instruction. 7.3.2 increasing instruction execution speed instruction execution speed can be increased by trying, at the programming stage, to keep contention from occurring. follow these rules when writing programs to minimize contention: 1. a 32-bit dsp instruction can require up to three memory accesses per cycle: one instruction (i-bus), one x-data (x-bus), and one y-data (y-bus). the sh-dsp has four independently accessible on-chip memory areas: x-rom, x-ram, y-rom, and y-ram. if more than one access is performed in the same memory area in a cycle, a stall occurs. locate the program (instructions) and the data arrays that the program accesses in different on-chip memory areas. this prevents memory bank contention in dsp instructions. 2. follow instructions that compute a value in the dsp unit and write it to a dsp register with instructions that do not store the same register to memory. this prevents dsp register contention because storing a dsp register that was the destination of a dsp calculation in the previous cycle will cause a stall. 3. instruction fetch (if) can conflict with an sh data memory access (ma) because both use the same bus. whether the instruction fetch occurs in a specific cycle depends on the locations and size (16 bit or 32 bit) of the preceding instructions. try to locate the sh instructions that perform memory access at longword boundries in on-chip memory and use a 16-bit instruction as the next instruction. this prevents contention between memory accesses and instruction fetches. 396 4. follow instructions that load an sh register (r0 to r15) from memory with instructions that do not use the same register as the load instruction? destination register. this prevents memory load contention caused by write backs (wb/dsp). note: the dsp registers (a0 to y1) loaded in the previous cycle can be used in this cycle without causing any stalls. 5. do not place two instructions that use the multiplier consecutively (the pmuls instruction is excepted from this rule). also try to keep accesses of mach and macl registers for getting the results from the multiplier away from instructions that use the multiplier. this prevents multiplier contention caused by multiplier accesses (mm). 6. avoid data transfers to memory or cpu core registers immediately after dsp unit data operations from those registers storing the operation results. avoid contention by placing another instruction before the transfer. 7.3.3 cycles basic instructions are designed to execute in one cycle. one-cycle instructions include both instructions that cause contention and instructions that do not. operations and transfers that occur between registers do not create contention. there are instructions that require two or more cycles even when there is no contention. instructions that change the branch destination addresses, such as branch instructions or the like, memory logic operation instructions, instructions that execute memory accesses twice or more, such as some system control instructions, and instructions that have memory accesses and multiplier accesses such as multiplication instructions and multiply and accumulate instructions, (excluding pmuls) all take two or more cycles. instructions that require two or more cycles also include both instructions that cause contention and instructions that do not. to write efficient programs, it is essential to avoid contention, keep instruction execution speed up, and use instructions with fewer stages. 7.4 operation of instruction pipelines this section describes the operation of the instruction pipelines. by combining these with the rules described so far, the way pipelines flow in a program and the number of instruction execution cycles can be calculated. in the following figures, ?nstruction a?refers to the instruction being discussed. when ?f?is written in the instruction fetch stage, it may refer to either ?f?or ?f? when there is contention between if and ma, the slot will split, but the manner of the split is not discussed in the tables, with a few exceptions. when a slot has split, see section 7.2.1, contention between instruction 397 fetch (if) and memory access (ma). base your response on the rules for pipeline operation given there. table 7.2 shows the number of instruction stages and number of execution cycles as follows: type: given by function category: categorized by differences in instruction operation stages: the number of stages in the instruction cycles: the number of execution cycles when there is no contention contention: indicates the contention that occurs instructions: gives a mnemonic for the instruction concerned 398 table 7.2 number of instruction stages and execution cycles type category instruction stages cycles contention data transfer instructions register- register transfer instructions mov #imm,rn mov rm,rn mova @(disp,pc),r0 movt rn swap.b rm,rn swap.w rm,rn xtrct rm,rn 31 memory load instructions mov.w @(disp,pc),rn mov.l @(disp,pc),rn mov.b rm,@rn mov.w rm,@rn mov.l rm,@rn mov.b @rm+,rn mov.w @rm+,rn mov.l @rm+,rn mov.b @(disp,rm),r0 mov.w @(disp,rm),r0 mov.l @(disp,rm),rn mov.b @(r0,rm),rn mov.w @(r0,rm),rn mov.l @(r0,rm),rn mov.b @(disp,gbr),r0 mov.w @(disp,gbr),r0 mov.l @(disp,gbr),r0 51 contention occurs if the instruction placed immediately after this cpu instruction uses the same destination register ma contends with if 399 table 7.2 number of instruction stages and execution cycles (cont) type category instruction stages cycles contention data transfer instructions (cont) memory store instructions mov.b @rm,rn mov.w @rm,rn mov.l @rm,rn mov.b rm,@?n mov.w rm,@?n mov.l rm,@?n mov.b r0,@(disp,rn) mov.w r0,@(disp,rn) mov.l rm,@(disp,rn) mov.b rm,@(r0,rn) mov.w rm,@(r0,rn) mov.l rm,@(r0,rn) mov.b r0,@(disp,gbr) mov.w r0,@(disp,gbr) mov.l r0,@(disp,gbr) 4 1 ma contends with if 400 table 7.2 number of instruction stages and execution cycles (cont) type category instruction stages cycles contention arithmetic instructions arithmetic instructions between registers (except multiplic- ation instruc- tions) add rm,rn add #imm,rn addc rm,rn addv rm,rn cmp/eq #imm,r0 cmp/eq rm,rn cmp/hs rm,rn cmp/ge rm,rn cmp/hi rm,rn cmp/gt rm,rn cmp/pz rn cmp/pl rn cmp/str rm,rn div1 rm,rn div0s rm,rn div0u dt rn exts.b rm,rn exts.w rm,rn extu.b rm,rn extu.w rm,rn neg rm,rn negc rm,rn sub rm,rn subc rm,rn subv rm,rn 31 multiply/ add instructions mac.w @rm+,@rn+ 7/8* 3 2 (to 3)* 1 multiplier contention occurs when an instruction that uses the multiplier follows a mac instruction ma contends with if 401 table 7.2 number of instruction stages and execution cycles (cont) type category instruction stages cycles contention arithmetic instructions (cont) double- length multiply/ accumulate instruction mac.l @rm+,@rn+ 92 (to?)* 1 multiplier contention occurs when an instruction that uses the multiplier follows a mac instruction ma contends with if multiplic- ation instructions muls.w rm,rn mulu.w rm,rn 6/7* 3 1 (to?)* 1 multiplier contention occurs when an instruc- tion that uses the multiplier follows a mul instruction ma contends with if double- length multiply/ accumulate instruction dmuls.l rm,rn dmulu.l rm,rn mul.l rm,rn 92 (to?)* 1 multiplier contention occurs when an instruction that uses the multiplier follows a mac instruction ma contends with if logic operation instructions register- register logic operation instructions and rm,rn and #imm,r0 not rm,rn or rm,rn or #imm,r0 tst rm,rn tst #imm,r0 xor rm,rn xor #imm,r0 31 402 table 7.2 number of instruction stages and execution cycles (cont) type category instruction stages cycles contention logic operation instructions (cont) memory logic operations instructions and.b #imm,@(r0,gbr) or.b #imm,@(r0,gbr) tst.b #imm,@(r0,gbr) xor.b #imm,@(r0,gbr) 6 3 ma contends with if tas instruction tas.b @rn 6 4 ma contends with if shift instructions shift instructions rotl rn rotr rn rotcl rn rotcr rn shal rn shar rn shll rn shlr rn shll2 rn shlr2 rn shll8 rn shlr8 rn shll16 rn shlr16 rn 31 branch instructions conditional branch instructions bf label bt label 3 3/1* 2 delayed conditional branch instructions bf/s label bt/s label 3 2/1* 2 unconditional branch instructions bra label braf rm bsr label bsrf rm jmp @rm jsr @rm rts 32 403 table 7.2 number of instruction stages and execution cycles (cont) type category instruction stages cycles contention system control instructions system control alu instructions clrt ldc rm,sr ldc rm,gbr ldc rm,vbr ldc rm,mod ldc rm,re ldc rm,rs ldre @(disp,pc) ldrs @(disp,pc) lds rm,pr nop setrc rm setrc #imm sett stc sr,rn stc gbr,rn stc vbr,rn stc mod,rn stc re,rn stc rs,rn sts pr,rn 31 404 table 7.2 number of instruction stages and execution cycles (cont) type category instruction stages cycles contention system control instructions (cont) lds.l instructions (pr) lds.l @rm+,pr 51 contention occurs when an instruction that uses the same destination register is placed immediately after this instruction ma contends with if sts.l instruction (pr) sts.l pr,@?n 4 1 ma contends with if ldc.l instructions ldc.l @rm+,sr ldc.l @rm+,gbr ldc.l @rm+,vbr ldc.l @rm+,mod ldc.l @rm+,re ldc.l @rm+,rs 53 contention occurs when an instruction that uses the same destination register is placed immediately after this instruction ma contends with if stc.l instructions stc.l sr,@?n stc.l gbr,@?n stc.l vbr,@?n stc.l mod,@?n stc.l re,@?n stc.l rs,@?n 4 2 ma contends with if 405 table 7.2 number of instruction stages and execution cycles (cont) type category instruction stages cycles contention system control instructions (cont) register ? mac transfer instruction clrmac lds rm,mach lds rm,macl 41 contention occurs with multiplier ma contends with if register ? dsp transfer instruction lds rm,dsr lds rm,a0 lds rm,x0 lds rm,x1 lds rm,y0 lds rm,y1 41 memory ? mac transfer instructions lds.l @rm+,mach lds.l @rm+,macl 41 contention occurs with multiplier ma contends with if memory ? dsp transfer instructions lds.l @rm+,dsr lds.l @rm+,a0 lds.l @rm+,x0 lds.l @rm+,x1 lds.l @rm+,y0 lds.l @rm+,y1 41 406 table 7.2 number of instruction stages and execution cycles (cont) type category instruction stages cycles contention system control instructions mac ? register transfer instruction sts mach,rn sts macl,rn 51 contention occurs with multiplier (cont) dsp ? register transfer instruction sts dsr,rn sts a0,rn sts x0,rn sts x1,rn sts y0,rn sts y1,rn contention occurs when an instruction that uses the same destination register is placed immediately after this instruction ma contends with if mac ? memory transfer instruction sts.l mach,@?n sts.l macl,@?n 41 contention occurs with multiplier ma contends with if dsp ? memory transfer instruction sts.l dsr,@?n sts.l a0,@?n sts.l x0,@?n sts.l x1,@?n sts.l y0,@?n sts.l y1,@?n 41 rte instruction rte 54 trap instruction trapa #imm 98 sleep instruction sleep 33 notes: 1. the normal minimum number of execution cycles. (the number in parentheses is the number of cycles when there is contention with following instructions. 2. one state when there is no branch. 3. number of stages of the sh-1 cpu. 407 7.4.1 data transfer instructions register-register transfer instructions (common): includes the following instruction types: mov #imm, rn mov rm, rn mova @(disp, pc), r0 movt rn swap.b rm, rn swap.w rm, rn xtrct rm, rn next instruction third instruction in series if if ex id if ex id ex : slot instruction a id ...... ...... ...... figure 7.19 register-register transfer instruction pipeline operation: the pipeline ends after three stages: if, id, and ex. data is transferred in the ex stage via the alu. 408 memory load instructions (common): include the following instruction types: mov.w @(disp, pc), rn mov.l @(disp, pc), rn mov.b @rm, rn mov.w @rm, rn mov.l @rm, rn mov.b @rm+, rn mov.w @rm+, rn mov.l @rm+, rn mov.b @(disp, rm), r0 mov.w @(disp, rm), r0 mov.l @(disp, rm), rn mov.b @(r0, rm), rn mov.w @(r0, rm), rn mov.l @(r0, rm), rn mov.b @(disp, gbr), r0 mov.w @(disp, gbr), r0 mov.l @(disp, gbr), r0 next instruction third instruction in series if if ex id if ex id ex : slot instruction a id ma ..... ..... wb ...... figure 7.20 memory load instruction pipeline the pipeline has five stages: if, id, ex, ma, and wb (figure 7.20). if an instruction that uses the same destination register as this instruction is placed immediately after it, contention will occur. (see section 7.2.2, contention when the previous instruction? destination register is used.) 409 memory store instructions (common): include the following instruction types: mov.b rm, @rn mov.w rm, @rn mov.l rm, @rn mov.b rm, @?n mov.w rm, @?n mov.l rm, @?n mov.b r0, @(disp, rn) mov.w r0, @(disp, rn) mov.l rm, @(disp, rn) mov.b rm, @(r0, rn) mov.w rm, @(r0, rn) mov.l rm, @(r0, rn) mov.b r0, @(disp, gbr) mov.w r0, @(disp, gbr) mov.l r0, @(disp, gbr) next instruction third instruction in series if if ex id if ex id ex : slot instruction a id ma ..... ..... ...... figure 7.21 memory store instructions pipeline the pipeline has four stages: if, id, ex, and ma (figure 7.21). data is not returned to the register so there is no wb stage. 410 7.4.2 arithmetic instructions arithmetic instructions between registers (except multiplication instructions) (common, or sh-2 cpu, sh-dsp): include the following instruction types: add rm, rn add #imm, rn addc rm, rn addv rm, rn cmp/eq #imm, r0 cmp/eq rm, rn cmp/hs rm, rn cmp/ge rm, rn cmp/hi rm, rn cmp/gt rm, rn cmp/pz rn cmp/pl rn cmp/str rm, rn div1 rm, rn div0s rm, rn div0u dt rn (sh-2 cpu, sh-dsp) exts.b rm, rn exts.w rm, rn extu.b rm, rn extu.w rm, rn neg rm, rn negc rm, rn sub rm, rn subc rm, rn subv rm, rn next instruction third instruction in series if if ex id if ex id ex : slot instruction a id ma ..... ..... ...... figure 7.22 pipeline for arithmetic instructions between registers except multiplication instructions the pipeline has three stages: if, id, and ex (figure 7.22). the data operation is completed in the ex stage via the alu. 411 multiply/accumulate instruction (sh-1 cpu): includes the following instruction type: mac.w @rm+, @rn+ next instruction third instruction in series if if ex ?d if id instruction a id ex ex ma ma : slot wb wb ma mm ma mm ...... figure 7.23 multiply/accumulate instruction pipeline the pipeline has seven stages: if, id, ex, ma, ma, mm, and mm. the second ma reads the memory and accesses the multiplier. mm indicates that the multiplier is operating. mm operates for two cycles after the final ma ends, regardless of slot. the id of the instruction after the mac.w instruction is stalled for 1 slot. the two mas of the mac.w instruction, when they contend with if, split the slots as described in section 7.2.1, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier comes after the mac.w instruction, the mac.w instruction may be considered to be a five-stage pipeline instruction of if, id, ex, ma, ma. in such cases, the id of the next instruction simply stalls one slot and thereafter operates like a normal pipeline. when an instruction that uses the multiplier comes after the mac.w instruction, however, contention occurs with the multiplier, so operation is different from normal. this occurs in the following cases: 1. when a mac.w instruction is located immediately after another mac.w instruction 2. when a muls.w instruction is located immediately after a mac.w instruction 3. when an sts (register) instruction is located immediately after a mac.w instruction 4. when an sts.l (memory) instruction is located immediately after a mac.w instruction 5. when an lds (register) instruction is located immediately after a mac.w instruction 6. when an lds.l (memory) instruction is located immediately after a mac.w instruction 412 1. when a mac.w instruction is located immediately after another mac.w instruction when the second ma of a mac.w instruction contends with an mm generated by a preceding multiplier-type instruction, the bus cycle of that ma is extended until the mm ends (the m? shown in the dotted line box below) and that extended ma occupies one slot. if one or more instruction not related to the multiplier is located between the mac.w instructions, multiplier contention between mac instructions does not cause stalls (figure 7.24). if if ex ?d if id mac.w id ex ex ma ma ma ma mm ma mm mm mac.w other instruction mm mm if if ex ?d if mac.w id ex id ma ex : slot ma ma mm ma mm mm third instruction mac.w mm mm mm ma ..... wb mm : slot ...... ...... ..... figure 7.24 unrelated instructions between mac.w instructions sometimes consecutive mac.ws may not have multiplier contention even when ma and if contention causes misalignment of instruction execution. figure 7.25 illustrates a case of this type. this figure assumes ma and if contention. mac.w mac.w mac.w ..... if if ex ?d if mac.w id id ex ma mm ma mm mm ma if id ex ..... : slot ex m a mm mm mm ma mm mm mm ma m a mm ma figure 7.25 consecutive mac.ws without misalignment 413 when the second ma of the mac.w instruction is extended until the mm ends, contention between ma and if will split the slot, as usual. figure 7.26 illustrates a case of this type. this figure assumes ma and if contention. if if ex id if mac.w id ex id ma ma ma mm ?m mm other instruction mac.w ex other instruction mm mm mm : slot id ex ..... ma ma ..... if other instruction if ...... figure 7.26 ma and if contention 414 2. when a muls.w instructions is located immediately after a mac.w instruction a muls.w instruction has an ma stage for accessing the multiplier. when the ma of the muls.w instruction contends with an operating mac instruction multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.27) to create a single slot. when two or more instructions not related to the multiplier come between the mac.w and muls.w instructions, mac.w and muls.w contention does not cause stalling. when the muls.w ma and if contend, the slot is split. if if ex ?d if id mac.w id ex ex ma mm ma mm mm other instruction muls.w ma ...... mm mm mm : slot ma ..... if if ex ?d if id mac.w id ex ex ma mm ma mm mm muls.w other instruction mm other instruction : slot mm mm ma if id ex ma ..... if if ex ?d if id mac.w id ex ex ma mm ma mm mm other instruction other instruction muls.w : slot if id ma wb mm mm other instruction if id ex ma ..... ma wb ex ma mm ...... ...... figure 7.27 muls.w instruction immediately after a mac.w instruction 415 3. when an sts (register) instruction is located immediately after a mac.w instruction when the contents of a mac register are stored in a general-purpose register using an sts instruction, an ma stage for accessing the multiplier is added to the sts instruction, as described later. when the ma of the sts instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.28) to create a single slot. the ma of the sts contends with the if. figure 7.28 illustrates how this occurs, assuming ma and if contention. if if ex ?d if id mac.w id ex ma mm mm mm other instruction sts ex wb : slot ma ma other instruction if id ex if id ex if ex ?d if mac.w id id ex ma mm ma mm mm other instruction sts wb : slot ma other instruction if id ex id ex ex if ..... if ..... other instruction other instruction ...... ...... ma figure 7.28 sts (register) instruction immediately after a mac.w instruction 416 4. when an sts.l (memory) instruction is located immediately after a mac.w instruction when the contents of a mac register are stored in memory using an sts instruction, an ma stage for accessing the multiplier and writing to memory is added to the sts instruction, as described later. when the ma of the sts instruction contends with the operating multiplier (mm), the ma is extended until one state after the mm ends (the m? shown in the dotted line box in figure 7.29) to create a single slot. the ma of the sts contends with the if. figure 7.29 illustrates how this occurs, assuming ma and if contention. if if ex ?d if id mac.w id ex ma mm ma mm mm other instruction sts.l other instruction ma ex ma other instruction if ?d ex if id ex : slot if if ex ?d if mac.w id id ex ma mm ma mm mm other instruction sts.l other instruction ma other instruction if id ex ?dex : slot ex ..... if ...... ...... ...... ?b figure 7.29 sts.l (memory) instruction immediately after a mac.w instruction 417 5. when an lds (register) instruction is located immediately after a mac.w instruction when the contents of a mac register are loaded from a general-purpose register using an lds instruction, an ma stage for accessing the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.30) to create a single slot. the ma of this lds contends with if. figure 7.30 illustrates how this occurs, assuming ma and if contention. if if ex ?d if id mac.w id ex ma mm ma mm mm other instruction lds ex other instruction ma ma other instruction if ?d ex if id ex : slot if if ex ?d if mac.w id id ex ma mm ma mm mm other instruction lds other instruction ma other instruction if id ex id ex ..... : slot ex if ..... ...... ...... figure 7.30 lds (register) instruction immediately after a mac.w instruction 418 6. when an lds.l (memory) instruction is located immediately after a mac.w instruction when the contents of a mac register are loaded from memory using an lds instruction, an ma stage for accessing the memory and the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.31) to create a single slot. the ma of the lds contends with if. figure 7.31 illustrates how this occurs, assuming ma and if contention. if if ex ?d if id mac.w id ex ma mm ma mm mm other instruction lds.l ex ma other instruction ma if ?d ex if id ex : slot if if ex ?d if mac.w id id ex ma mm ma mm mm other instruction lds.l other instruction ma other instruction if id ex ma id ex ..... : slot ex if ..... other instruction ...... ...... figure 7.31 lds.l (memory) instruction immediately after a mac.w instruction 419 double-length multiply/accumulate instruction (sh-2 cpu, sh-dsp): includes the following instruction type: mac.l @rm+, @rn+ next instruction third instruction if if ex ?d if id instruction a id ex ex ma ma : slot wb wb ma mm ma mm mm mm ...... figure 7.32 multiply/accumulate instruction pipeline the pipeline has nine stages: if, id, ex, ma, ma, mm, mm, mm, and mm (figure 7.32). the second ma reads the memory and accesses the multiplier. the mm indicates that the multiplier is operating. the mm operates for four cycles after the final ma ends, regardless of slot. the id of the instruction after the mac.l instruction is stalled for one slot. the two mas of the mac.l instruction, when they contend with if, split the slots as described in section 7.2.1, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier follows the mac.l instruction, the mac.l instruction may be considered to be a five-stage pipeline instruction of if, id, ex, ma, ma. in such cases, the id of the next instruction simply stalls one slot and thereafter the pipeline operates normally. when an instruction that uses the multiplier comes after the mac.l instruction, contention occurs with the multiplier, so operation is different from normal. this occurs in the following cases: 1. when a mac.w instruction is located immediately after another mac.w instruction 2. when a mac.l instruction is located immediately after a mac.w instruction 3. when a muls.w instruction is located immediately after a mac.w instruction 4. when a dmuls.l instruction is located immediately after a mac.w instruction 5. when an sts (register) instruction is located immediately after a mac.w instruction 6. when an sts.l (memory) instruction is located immediately after a mac.w instruction 7. when an lds (register) instruction is located immediately after a mac.w instruction 8. when an lds.l (memory) instruction is located immediately after a mac.w instruction 420 1. when a mac.w instruction is located immediately after another mac.w instruction the second ma of a mac.w instruction does not contend with an mm generated by a preceding multiplication instruction. if if ex ?d if id ex id ma ex ma mm ma mm third instruction mac.w ma .... : slot ...... mac.w ma mm mm figure 7.33 mac.w instruction that immediately follows another mac.w instruction sometimes consecutive mac.ws may have misalignment of instruction execution caused by ma and if contention. figure 7.34 illustrates a case of this type. this figure assumes ma and if contention. if if ex ?d if mac.w id ex ?d ex ma mm ma mm mac.w mac.w ma mac.w ma mm ma mm if id ex ma ma mm .... : slot ...... ma mm mm figure 7.34 consecutive mac.ws with misalignment 421 when the second ma of the mac.w instruction contends with if, the slot will split as usual. figure 7.35 illustrates a case of this type. this figure assumes ma and if contention. if if ex if if mac.w id id ?d ma mm ?a mm other instruction mac.w ex ma other instruction ma ex .... if id ex .... : slot other instruction ...... ma mm mm figure 7.35 ma and if contention 2. when a mac.l instruction is located immediately after a mac.w instruction the second ma of a mac.w instruction does not contend with an mm generated by a preceding multiplication instruction (figure 7.36). if if ex ?d if id ex id ma ex ma mm ma mm third instruction mac.l ma .... : slot ...... mac.w ma mm mm mm mm figure 7.36 mac.l instructions immediately after a mac.w instruction 422 3. when a muls.w instruction is located immediately after a mac.w instruction muls.w instructions have an ma stage for accessing the multiplier. when the ma of the muls.w instruction contends with an operating mac.w instruction multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.37) to create a single slot. when one or more instructions not related to the multiplier come between the mac.w and muls.w instructions, mac.w and muls.w contention does not cause stalling. there is no muls.w ma contention while the mac.w instruction multiplier is operating (mm). when the muls.w ma and if contend, the slot is split. if if ex ?d if id mac.w id ex ex ma ma mm ma mm other instruction muls.w .... ...... ma : slot mm mm if ex id if id mac.w id ex if ex ma mm ma mm ma mm muls.w other instruction mm : slot if id ex ma .... other instruction ...... figure 7.37 muls.w instruction immediately after a mac.w instruction 4. when a dmuls.l instruction is located immediately after a mac.w instruction dmuls.l instructions have an ma stage for accessing the multiplier, but there is no dmuls.l ma contention while the mac.w instruction multiplier is operating (mm). when the dmuls.l ma and if contend, the slot is split (figure 7.38). if ex id mac.w id ex if ma mm ma mm other instruction dmuls.l : slot if id ex ma .... ...... ma ma mm mm mm mm figure 7.38 dmuls.l instructions immediately after a mac.w instruction 423 5. when an sts (register) instruction is located immediately after a mac.w instruction when the contents of a mac register are stored in a general-purpose register using an sts instruction, an ma stage for accessing the multiplier is added to the sts instruction, as described later. when the ma of the sts instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.39) to create a single slot. the ma of the sts contends with the if. figure 7.39 illustrates how this occurs, assuming ma and if contention. if if ex if mac.w id ex id id ma mm ma mm other instruction sts ex ma other instruction other instruction ...... ma : slot wb if id ex if id ex .... if if ex ?d if mac.w id ex id ex ma mm ma mm other instruction sts other instruction other instruction ...... ma : slot wb if id ex ma if id ex .... figure 7.39 sts (register) instruction immediately after a mac.w instruction 424 6. when an sts.l (memory) instruction is located immediately after a mac.w instruction when the contents of a mac register are stored in memory using an sts instruction, an ma stage for accessing the memory and the multiplier and writing to memory is added to the sts instruction, as described later. figure 7.40 illustrates how this occurs, assuming ma and if contention. if if ex if mac.w id ex id id ma mm ma mm other instruction sts.l ex ma other instruction other instruction ...... : slot if id ex if id ex .... if if ex ?d if mac.w id ex id ex ma mm ma mm other instruction sts.l other instruction other instruction ...... ma : slot if id ex if id ex .... ma figure 7.40 sts.l (memory) instruction immediately after a mac.w instruction 425 7. when an lds (register) instruction is located immediately after a mac.w instruction when the contents of a mac register are loaded from a general-purpose register using an lds instruction, an ma stage for accessing the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.41) to create a single slot. the ma of this lds contends with if. figure 7.41 illustrates how this occurs, assuming ma and if contention. if if ex if mac.w id ex id id ma mm ma mm other instruction lds ex ma other instruction other instruction ...... ma : slot if id ex if id ex .... if if ex ?d if mac.w id ex id ex ma mm ma mm other instruction lds other instruction other instruction ...... ma : slot if id ex if id ex .... figure 7.41 lds (register) instruction immediately after a mac.w instruction 426 8. when an lds.l (memory) instruction is located immediately after a mac.w instruction when the contents of a mac register are loaded from memory using an lds instruction, an ma stage for accessing the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.42) to create a single slot. the ma of the lds contends with if. figure 7.42 illustrates how this occurs, assuming ma and if contention. if if ex id if mac.w id ex id ex ma ma mm mm other instruction lds.l other instruction other instruction ...... ma : slot if id ex if id ex .... if if ex ?d if mac.w id ex id ex ma mm ma mm other instruction lds.l other instruction other instruction ...... ma : slot if id ex if id ex .... figure 7.42 lds.l (memory) instruction immediately after a mac.w instruction 427 double-length multiply/accumulate instruction (sh-2 cpu, sh-dsp): includes the following instruction type: mac.l @rm+, @rn+ (sh-2 cpu only) if if ex ?d id ex mac.l id ma wb ma mm mm third instruction next instruction ...... : slot mm if ma mm ex ma wb figure 7.43 multiply/accumulate instruction pipeline operation: the pipeline has nine stages: if, id, ex, ma, ma, mm, mm, mm, and mm (figure 7.43). the second ma reads the memory and accesses the multiplier. the mm indicates that the multiplier is operating. the mm operates for four cycles after the final ma ends, regardless of a slot. the id of the instruction after the mac.l instruction is stalled for one slot. the two mas of the mac.l instruction, when they contend with if, split the slots as described in section 7.4, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier follows the mac.l instruction, the mac.l instruction may be considered to be five-stage pipeline instructions of if, id, ex, ma, and ma. in such cases, the id of the next instruction simply stalls one slot and thereafter the pipeline operates normally. when an instruction that uses the multiplier comes after the mac.l instruction, contention occurs with the multiplier, so operation is not as normal. this occurs in the following cases: 1. when a mac.l instruction is located immediately after another mac.l instruction 2. when a mac.w instruction is located immediately after a mac.l instruction 3. when a dmuls.l instruction is located immediately after a mac.l instruction 4. when a muls.w instruction is located immediately after a mac.l instruction 5. when an sts (register) instruction is located immediately after a mac.l instruction 6. when an sts.l (memory) instruction is located immediately after a mac.l instruction 7. when an lds (register) instruction is located immediately after a mac.l instruction 8. when an lds.l (memory) instruction is located immediately after a mac.l instruction 428 1. when a mac.l instruction is located immediately after another mac.l instruction when the second ma of the mac.l instruction contends with the mm produced by the previous multiplication instruction, the ma bus cycle is extended until the mm ends (the m a shown in the dotted line box in figure 7.44) to create a single slot. when two or more instructions that do not use the multiplier occur between two mac.l instructions, the stall caused by multiplier contention between mac.l instructions is eliminated. if if ex ?d if mac.l id ex id ex ma mm ma mm mm third instruction mac.l ...... ma ma ...... : slot if if ex ?d if id mac.l id ex ma wb ma mm ma mm mm other instruction other instruction mac.l ...... id ex ma ma mm mm ex mm ma mm mm mm ma wb if mm mm : slot mm mm figure 7.44 mac.l instruction immediately after another mac.l instruction sometimes consecutive mac.ls may have less multiplier contention even when there is misalignment of instruction execution caused by ma and if contention. figure 7.45 illustrates a case of this type, assuming ma and if contention. if if ex ?d if mac.l id ex ?d ex ma mm ma mm mm mac.l mac.l ma ma : slot mm ma mm mm mm mac.l ...... mm ma mm mm mm if id ma ex mm figure 7.45 consecutive mac.ls with misalignment 429 when the second ma of the mac.l instruction is extended to the end of the mm, contention between the ma and if will split the slot in the usual way. figure 7.46 illustrates a case of this type, assuming ma and if contention. if if ex if mac.l id id id ma ma ?m mm other intruction mac.l ex ex : slot mm ma mm mm other intruction other intruction ma mm if id ...... mm mm if figure 7.46 ma and if contention 430 2. when a mac.w instruction is located immediately after a mac.l instruction when the second ma of the mac.w instruction contends with the mm produced by the previous multiplication instruction, the ma bus cycle is extended until the mm ends (the m a shown in the dotted line box in figure 7.47) to create a single slot. when two or more instructions that do not use the multiplier occur between the mac.l and mac.w instructions, the stall caused by multiplier contention between mac.l instructions is eliminated. if if ex ?d if mac.l id ex id ex ma mm ma mm mm third instruction mac.w ...... ma ma ...... : slot if if ex ?d if id mac.l id ex ma wb ma mm ma mm mm other instruction other instruction mac.w ...... id ex ma ma mm mm : slot ex mm maa mm mm mm ma wb if figure 7.47 mac.w instruction immediately after a mac.l instruction 431 3. when a dmuls.l instruction is located immediately after a mac.l instruction dmuls.l instructions have an ma stage for accessing the multiplier. when the second ma of the dmuls.l instruction contends with an operating mac.l instruction multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.48) to create a single slot. when two or more instructions not related to the multiplier come between the mac.l and dmuls.l instructions, mac.l and dmuls.l contention does not cause stalling. when the dmuls.l ma and if contend, the slot is split. if if ex ?d if mac.l id ex id ma mm ma mm mm other instruction dmuls.l ex ...... ma ma ...... : slot if if ex ?d if id mac.l id ex ma ma mm ma mm mm dmuls.l other instruction other instruction ...... id ex ma ...... ex mm ma mm mm mm if mm mm : slot mm mm mm mm ma if if ex ?d if id mac.l id ex ma ma mm ma mm mm other instruction other instruction dmuls.l other instruction id ex ma ma ex mm if : slot mm mm mm mm wb ...... ma wb ?dexma ...... if figure 7.48 dmuls.l instruction immediately after a mac.l instruction 432 4. when a muls.w instruction is located immediately after a mac.l instruction muls.w instructions have an ma stage for accessing the multiplier. when the ma of the muls.w instruction contends with an operating mac.l instruction multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.49) to create a single slot. when three or more instructions not related to the multiplier come between the mac.l and muls.w instructions, mac.l and muls.w contention does not cause stalling. when the muls.w ma and if contend, the slot is split. 433 if if ex ?d if id mac.l id ex ma ma mm ma mm mm other instruction other instruction muls.w other instruction id ex ex mm if : slot wb ...... id ex ma if if if ex ?d if id mac.l id ex ma ma mm ma mm mm other instruction other instruction other instruction muls.w id ex ex mm if : slot mm mm wb ma wb id ex ma ...... if ma other instruction ...... wb ma mm mm ma ...... ma wb id ex if if if ex ?d if mac.l id ex id ex ma mm ma mm mm other instruction muls.w ...... ma ...... : slot if if ex ?d if id mac.l id ex ma mm ma mm mm muls.w other instruction other instruction ...... id ex ma ...... ex mm mm ma ma mm mm mm if : slot mm mm ma figure 7.49 muls.w instruction immediately after a mac.l instruction 434 5. when an sts (register) instruction is located immediately after a mac.l instruction when the contents of a mac register are stored in a general-purpose register using an sts instruction, an ma stage for accessing the multiplier is added to the sts instruction, as described later. when the ma of the sts instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.50) to create a single slot. the ma of the sts contends with the if. figure 7.50 illustrates how this occurs, assuming ma and if contention. if if ex id if id mac.l id ma mm ma mm mm other instruction sts other instruction other instruction ex mm if : slot ...... m? wb ex ma id ex if id ex ...... if if ex ?d mac.l id id ex ma mm ma mm mm other instruction sts other instruction other instruction if id mm : slot ...... ma wb ex id ex ...... ex if if figure 7.50 sts (register) instruction immediately after a mac.l instruction 435 6. when an sts.l (memory) instruction is located immediately after a mac.l instruction when the contents of a mac register are stored in memory using an sts instruction, an ma stage for accessing the multiplier and writing to memory is added to the sts instruction, as described later. the ma of the sts contends with the if. figure 7.51 illustrates how this occurs, assuming ma and if contention. if if ex ?d if id mac.l id ma mm ma mm mm other instruction sts.l other instruction other instruction ex mm if : slot ...... m a ex ma id ex if id ex ...... if if ex ?d mac.l id id ex ma mm ma mm mm other instruction sts.l other instruction other instruction if id mm : slot ...... ma ex id ex ex if if ...... figure 7.51 sts.l (memory) instruction immediately after a mac.l instruction 436 7. when an lds (register) instruction is located immediately after a mac.l instruction when the contents of a mac register are loaded from a general-purpose register using an lds instruction, an ma stage for accessing the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.52) to create a single slot. the ma of this lds contends with if. figure 7.52 illustrates how this occurs, assuming ma and if contention. if if ex ?d if id mac.l id ma mm ma mm mm other instruction lds other instruction other instruction ex mm if : slot ...... m? ex ma id ex if id ex if if ex ?d mac.l id id ex ma mm ma mm mm other instruction lds other instruction other instruction if id mm : slot ...... ma ex id ex ex if if ...... ...... figure 7.52 lds (register) instruction immediately after a mac.l instruction 437 8. when an lds.l (memory) instruction is located immediately after a mac.l instruction when the contents of a mac register are loaded from memory using an lds instruction, an ma stage for accessing the memory and the memory and the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.53) to create a single slot. the ma of the lds contends with if. figure 7.53 illustrates how this occurs, assuming ma and if contention. if if ex ?d if id mac.l id ma mm ma mm mm other instruction lds.l other instruction other instruction ex mm if : slot ...... m? ex ma id ex if id ex if if ex ?d mac.l id id ex ma mm ma mm mm other instruction lds.l other instruction other instruction if id mm : slot ...... ma ex id ex ex if if ...... ...... figure 7.53 lds.l (memory) instruction immediately after a mac.l instruction 438 multiplication instructions (sh-1 cpu): include the following instruction types: muls.w rm, rn mulu.w rm, rn next instruction third instruction if if ex id if id instruction a id ex ex ma ma : slot wb wb ma mm mm ...... figure 7.54 multiplication instruction pipeline the pipeline has six stages: if, id, ex, ma, mm, and mm. the ma accesses the multiplier. mm indicates that the multiplier is operating. mm operates for three cycles after the ma ends, regardless of slot. the ma of the muls.w instruction, when it contends with if, splits the slot as described in section 7.2.1, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier comes after the muls.w instruction, the muls.w instruction may be considered to be a four-stage pipeline instruction of if, id, ex, and ma. in such cases, it operates like a normal pipeline. when an instruction that uses the multiplier comes after the muls.w instruction, however, contention occurs with the multiplier, so operation is different from normal. this occurs in the following cases: 1. when a mac.w instruction is located immediately after a muls.w instruction 2. when a muls.w instruction is located immediately after another muls.w instruction 3. when an sts (register) instruction is located immediately after a muls.w instruction 4. when an sts.l (memory) instruction is located immediately after a muls.w instruction 5. when an lds (register) instruction is located immediately after a muls.w instruction 6. when an lds.l (memory) instruction is located immediately after a muls.w instruction 439 1. when a mac.w instruction is located immediately after a muls.w instruction when the second ma of a mac.w instruction contends with the mm generated by a preceding multiplication instruction, the bus cycle of that ma is extended until the mm ends (the m? shown in the dotted line box below) and that extended ma occupies one slot. if one or more instructions not related to the multiplier comes between the muls.w and mac.w instructions, multiplier contention between the muls.w and mac.w instructions does not cause stalls (figure 7.55). if if ex id ex ? muls.w id ma id ex ma mm mm mm third instruction mac.w ma ma : slot if if ex id ex if id muls.w id ma ex ma ma ma mm mm mm other instruction mm mm : slot wb mm mm mm mm ..... ..... if ...... ...... mac.w figure 7.55 mac.w instruction immediately after a muls.w instruction 440 2. when a muls.w instruction is located immediately after another muls.w instruction muls.w instructions have an ma stage for accessing the multiplier. when the ma of the muls.w instruction contends with the operating multiplier (mm) of another muls.w instruction, the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.56) to create a single slot. when two or more instructions not related to the multiplier are located between the two muls.w instructions, contention between the muls.ws does not cause stalling. when the muls.w ma and if contend, the slot is split. mm mm mm if if ex id ex if muls.w id id ex ma mm mm mm other instruction muls.w ma mm mm mm ma ..... : slot if if ex id ex if muls.w id id ex ma mm mm mm muls.w other instruction ma : slot other instruction if id ex ma ..... if if ex id ex if muls.w id id ex ma ma mm mm mm other instruction other instruction mm mm mm : slot muls.w other instruction ma wb if id ex ma wb if id ex ma ...... ...... ...... ...... figure 7.56 muls.w instruction immediately after another muls.w instruction 441 when the ma of the muls.w instruction is extended until the mm ends, contention between ma and if will split the slot, as is normal. figure 7.57 illustrates a case of this type, assuming ma and if contention. if if ex id ex if muls.w id id ma mm mm mm other instruction muls.w ex ma mm mm mm ma ..... : slot other instruction other instruction if id ex ..... if id ..... ...... figure 7.57 muls.w instruction immediately after another muls.w instruction (if and ma contention) 442 3. when an sts (register) instruction is located immediately after a muls.w instruction when the contents of a mac register are stored in a general-purpose register using an sts instruction, an ma stage for accessing the multiplier is added to the sts instruction, as described later. when the ma of the sts instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.58) to create a single slot. the ma of the sts contends with the if. figure 7.58 illustrates how this occurs, assuming ma and if contention. ..... if if ex id ex if muls.w id id ma mm mm mm other instruction sts ex ma wb ma : slot other instruction other instruction if id ex if id ex ..... if if ex id if muls.w id ?d ma mm mm mm other instruction sts ma wb : slot other instruction other instruction ex if id if id ex ex ex ...... ...... figure 7.58 sts (register) instruction immediately after a muls.w instruction 443 4. when an sts.l (memory) instruction is located immediately after a muls.w instruction when the contents of a mac register are loaded from memory using an sts instruction, an ma stage for accessing the multiplier and writing to memory is added to the sts instruction, as described later. when the ma of the sts instruction contends with the operating multiplier (mm), the ma is extended until one cycle after the mm ends (the m? shown in the dotted line box in figure 7.59) to create a single slot. the ma of the sts contends with the if. figure 7.59 illustrates how this occurs, assuming ma and if contention. ex ex if if ex id ex if muls.w id id ma mm mm mm other instruction sts.l ma ex ma other instruction other instruction if id ex if id ex ..... if if ex id if muls.w id ?d ma mm mm mm other instruction sts.l ma : slot other instruction other instruction ex if id if id ex ..... ...... ...... : slot figure 7.59 sts.l (memory) instruction immediately after a muls.w instruction 444 5. when an lds (register) instruction is located immediately after a muls.w instruction when the contents of a mac register are loaded from a general-purpose register using an lds instruction, an ma stage for accessing the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box below) to create a single slot. the ma of this lds contends with if. figure 7.60 illustrates how this occurs, assuming ma and if contention. ..... if if ex id ex if muls.w id id ma mm mm mm other instruction lds ex ma ma : slot other instruction other instruction if id ex if id ex ..... if if ex id if muls.w id ?d ma mm mm mm other instruction lds ma : slot other instruction other instruction ex if ex id if id ex ex ...... ...... figure 7.60 lds (register) instruction immediately after a muls.w instruction 445 6. when an lds.l (memory) instruction is located immediately after a muls.w instruction when the contents of a mac register are loaded from memory using an lds instruction, an ma stage for accessing the memory and the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.61) to create a single slot. the ma of the lds contends with if. figure 7.61 illustrates how this occurs, assuming ma and if contention. ..... if if ex id ex if muls.w id id ma mm mm mm other instruction lds.l ex ma ma : slot other instruction other instruction if id ex if id ex ..... if if ex id if muls.w id ?d ma mm mm mm other instruction lds.l ma : slot other instruction other instruction ex if ex id if id ex ex ...... ...... figure 7.61 lds.l (memory) instruction immediately after a muls.w instruction 446 multiplication instructions (sh-2 cpu, sh-dsp): include the following instruction types: muls.w rm, rn mulu.w rm, rn next instruction third instruction if if ex id if ex id ex muls.w id ma ma : slot ..... ma mm mm wb wb figure 7.62 multiplication instruction pipeline operation: the pipeline has six stages: if, id, ex, ma, mm, and mm (figure 8.62). the ma accesses the multiplier. the mm indicates that the multiplier is operating. the mm operates for two cycles after the ma ends, regardless of the slot. the ma of the muls.w instruction, when it contends with if, splits the slot as described in section 7.4, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier comes after the muls.w instruction, the muls.w instruction may be considered to be four-stage pipeline instructions of if, id, ex, and ma. in such cases, it operates like a normal pipeline. when an instruction that uses the multiplier is located after the muls.w instruction, however, contention occurs with the multiplier, so operation is not as normal. this occurs in the following cases: 1. when a mac.w instruction is located immediately after a muls.w instruction 2. when a mac.l instruction is located immediately after a muls.w instruction 3. when a muls.w instruction is located immediately after another muls.w instruction 4. when a dmuls.l instruction is located immediately after a muls.w instruction 5. when an sts (register) instruction is located immediately after a muls.w instruction 6. when an sts.l (memory) instruction is located immediately after a muls.w instruction 7. when an lds (register) instruction is located immediately after a muls.w instruction 8. when an lds.l (memory) instruction is located immediately after a muls.w instruction 447 1. when a mac.w instruction is located immediately after a muls.w instruction the second ma of a mac.w instruction does not contend with the mm generated by a preceding multiplication instruction. if if ex id ex ?d muls.w id ma ex ma ...... ma mm mm third instruction mac.w ...... ma : slot mm mm if figure 7.63 mac.w instruction immediately after a muls.w instruction 2. when a mac.l instruction is located immediately after a muls.w instruction the second ma of a mac.w instruction does not contend with the mm generated by a preceding multiplication instruction. if if ex id ex ?d muls.w id ma ex ma ...... ma mm mm third instruction mac.l ...... ma : slot mm mm if mm mm figure 7.64 mac.l instruction immediately after a muls.w instruction 448 3. when a muls.w instruction is located immediately after another muls.w instruction muls.w instructions have an ma stage for accessing the multiplier. when the ma of the muls.w instruction contends with the operating multiplier (mm) of another muls.w instruction, the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.65) to create a single slot. when one or more instructions not related to the multiplier is located between the two muls.w instructions, contention between the muls.ws does not cause stalling. when the muls.w ma and if contend, the slot is split. if if ex id ex if id muls.w id ex ma ma mm mm other instruction muls.w ...... ...... : slot mm mm ma if if ex id ex if id muls.w id ex ma mm ma mm mm muls.w other instruction mm other instruction : slot ...... if id ex ma ...... figure 7.65 muls.w instruction immediately after another muls.w instruction when the ma of the muls.w instruction is extended until the mm ends, contention between the ma and if will split the slot in the usual way. figure 7.66 illustrates a case of this type, assuming ma and if contention. if if ex id ex if id muls.w id ex ma mm mm other instruction muls.w ma other instruction : slot mm mm ma other instruction ...... ...... if id ex ...... if id ...... figure 7.66 muls.w instruction immediately after another muls.w instruction (if and ma contention) 449 4. when a dmuls.l instruction is located immediately after a muls.w instruction though the second ma in the dmuls.l instruction makes an access to the multiplier, it does not contend with the operating multiplier (mm) generated by the muls.w instruction. if if ex id ex if muls.w id id ex ma ma mm mm other instruction dmuls.l ...... ...... : slot mm mm ma ma mm mm figure 7.67 dmuls.l instruction immediately after a muls.w instruction 450 5. when an sts (register) instruction is located immediately after a muls.w instruction when the contents of a mac register are stored in a general-purpose register using an sts instruction, an ma stage for accessing the multiplier is added to the sts instruction, as described later. when the ma of the sts instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.68) to create a single slot. the ma of the sts contends with the if. figure 7.68 illustrates how this occurs, assuming ma and if contention. if if ex id ex if id muls.w id ex ma mm mm other instruction sts ma other instruction : slot wb ma other instruction ...... if id ex if id ex ...... if if ex id if muls.w id id ex ma mm mm other instruction sts other instruction : slot wb ma other instruction ...... if id ex if id ex ...... ex figure 7.68 sts (register) instruction immediately after a muls.w instruction 451 6. when an sts.l (memory) instruction is located immediately after a muls.w instruction when the contents of a mac register are stored in memory using an sts instruction, an ma stage for accessing the multiplier and writing to memory is added to the sts instruction, as described later. the ma of the sts contends with the if. figure 7.69 illustrates how this occurs, assuming ma and if contention. if if ex id ex if id muls.w id ex ma mm mm other instruction sts.l ma other instruction : slot m a other instruction ...... if id ex if id ex ...... if if ex id if muls.w id id ex ma mm mm other instruction sts.l other instruction ma other instruction ...... if id ex if id ex ex ...... : slot figure 7.69 sts.l (memory) instruction immediately after a muls.w instruction 452 7. when an lds (register) instruction is located immediately after a muls.w instruction when the contents of a mac register are loaded from a general-purpose register using an lds instruction, an ma stage for accessing the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box below) to create a single slot. the ma of this lds contends with if. the following figures illustrates how this occurs, assuming ma and if contention. if if ex id ex if id muls.w id ex ma mm mm other instruction lds ma other instruction : slot ma other instruction ...... if id ex if id ex ...... if if ex id if muls.w id id ex ma mm mm other instruction lds other instruction : slot ma other instruction ...... if id ex if id ex ...... ex figure 7.70 lds (register) instruction immediately after a muls.w instruction 453 8. when an lds.l (memory) instruction is located immediately after a muls.w instruction when the contents of a mac register are loaded from memory using an lds instruction, an ma stage for accessing the multiplier is added to the lds instruction, as described later. when the ma of the lds instruction contends with the operating multiplier (mm), the ma is extended until the mm ends (the m? shown in the dotted line box in figure 7.71) to create a single slot. the ma of the lds contends with if. figure 7.71 illustrates how this occurs, assuming ma and if contention. if if ex id ex if id muls.w id ex ma mm mm other instruction lds.l ma other instruction : slot ma other instruction ...... if id ex if id ex ...... if if ex id if muls.w id id ex ma mm mm other instruction lds.l other instruction : slot ma other instruction ...... if id ex if id ex ...... ex figure 7.71 lds.l (memory) instruction immediately after a muls.w instruction 454 double-length multiplication instructions (sh-2 cpu, sh-dsp): include the following instruction types: dmuls.l rm, rn dmulu.l rm, rn mul.l rm, rn next instruction third instruction if if ex ?d if id instruction a id ex ex ma ma : slot wb wb ma mm ma mm mm mm ...... figure 7.72 multiplication instruction pipeline the pipeline has nine stages: if, id, ex, ma, ma, mm, mm, mm, and mm (figure 7.72). the second ma accesses the multiplier. the mm indicates that the multiplier is operating. the mm operates for four cycles after the ma ends, regardless of slot. the id of the instruction following the dmuls.l instruction is stalled for 1 slot (see the description of the multiply/accumulate instruction). the two ma stages of the dmuls.l instruction, when they contend with if, split the slot as described in section 7.2.1, contention between instruction fetch (if) and memory access (ma). when an instruction that does not use the multiplier comes after the dmuls.l instruction, the dmuls.l instruction may be considered to be a five-stage pipeline instruction of if, id, ex, ma, and ma. in such cases, it operates like a normal pipeline. when an instruction that uses the multiplier come after the dmuls.l instruction, however, contention occurs with the multiplier, so operation is different from normal. this occurs in the following cases: 1. when a mac.l instruction is located immediately after a dmuls.l instruction 2. when a mac.w instruction is located immediately after a dmuls.l instruction 3. when a dmuls.l instruction is located immediately after another dmuls.l instruction 4. when a muls.w instruction is located immediately after a dmuls.l instruction 5. when an sts (register) instruction is located immediately after a dmuls.l instruction 6. when an sts.l (memory) instruction is located immediately after a dmuls.l instruction 7. when an lds (register) instruction is located immediately after a dmuls.l instruction 8. when an lds.l (memory) instruction is located immediately after a dmuls.l instruction 455 1. when a mac.l instruction is located immediately after a dmuls.l instruction when the second ma of a mac.l instruction contends with the mm generated by a preceding multiplication instruction, the bus cycle of that ma is extended until the mm ends (the m? shown in the dotted line box below) and that extended ma occupies one slot. if two or more instructions not related to the multiplier are located between the dmuls.l and mac.l instructions, multiplier contention between the dmuls.l and mac.l instructions does not cause stalls (figure 7.73). if if ex ?d ?d dmuls.l id ex mm ma mm mm third instruction mac.l mm ...... : slot ma ma ...... mm if mm mm ma mm ex ma if if ex ?d id ex dmuls.l id ma wb ma mm mm other instruction other instruction mac.l : slot mm if ma mm ex ma ...... wb id ex ma ma if mm mm mm mm figure 7.73 mac.l instruction immediately after a dmuls.l instruction 456 7.4.3 logic operation instructions register-register logic operation instructions (common): include the following instruction types: and rm, rn and #imm, r0 not rm, rn or rm, rn or #imm, r0 tst rm, rn tst #imm, r0 xor rm, rn xor #imm, r0 next instruction third instruction in series if if ex id if ex id ex : slot instruction a id ...... ...... ...... figure 7.74 register-register logic operation instruction pipeline the pipeline has three stages: if, id, and ex (figure 7.74). the data operation is completed in the ex stage via the alu. 457 memory logic operations instructions (common): include the following instruction types: and.b #imm, @(r0, gbr) or.b #imm, @(r0, gbr) tst.b #imm, @(r0, gbr) xor.b #imm, @(r0, gbr) next instruction third instruction in series if if ex if id ex instruction a id ex ..... ex ma ma ..... : slot id ..... figure 7.75 memory logic operation instruction pipeline the pipeline has six stages: if, id, ex, ma, ex, and ma (figure 7.75). the id of the next instruction stalls for 2 slots. the mas of these instructions contend with if. 458 tas instruction (common): includes the following instruction type: tas.b @rn next instruction third instruction in series if if ex if id ex instruction a id ex ..... ex ma ma ..... : slot id ..... figure 7.76 tas instruction pipeline the pipeline has six stages: if, id, ex, ma, ex, and ma (figure 7.76). the id of the next instruction stalls for 3 slots. the ma of the tas instruction contends with if. 459 7.4.4 shift instructions (common) rotl rn rotr rn rotcl rn rotcr rn shal rn shar rn shll rn shlr rn shll2 rn shlr2 rn shll8 rn shlr8 rn shll16 rn shlr16 rn next instruction third instruction in series if if ex if id ex instruction a id id ..... ex ..... : slot ..... figure 7.77 general shift instruction pipeline the pipeline has three stages: if, id, and ex (figure 7.77). the data operation is completed in the ex stage via the alu. 460 7.4.5 branch instructions conditional branch instructions (common): include the following instruction types: bf label bt label the pipeline has three stages: if, id, and ex. condition verification is performed in the id stage. conditionally branched instructions are not delay branched. 1. when condition is satisfied the branch destination address is calculated in the ex stage. the two instructions after the conditional branch instruction (instruction a) are fetched but discarded. the branch destination instruction begins its fetch from the slot following the slot which has the ex stage of instruction a (figure 7.78). next instruction third instruction in series if if ex if instruction a id : slot branch destination ?fidex ..... ..... if id ex ..... (fetched but discarded) (fetched but discarded) ..... figure 7.78 branch instruction when condition is satisfied 2. when condition is not satisfied if it is determined that conditions are not satisfied at the id stage, the ex stage proceeds without doing anything. the next instruction also executes a fetch (figure 7.79). next instruction third instruction in series if if ex if id instruction a id id : slot ..... if id ex ..... ..... ex ex ..... ..... figure 7.79 branch instruction when condition is not satisfied 461 delayed conditional branch instructions (sh-2 cpu, sh-dsp): include the following instruction types: bf/s label bt/s label the pipeline has three stages: if, id, and ex. condition verification is performed in the id stage. 1. when condition is satisfied the branch destination address is calculated in the ex stage. the instruction after the conditional branch instruction (instruction a) is fetched and executed, but the instruction after that is fetched and discarded. the branch destination instruction begins its fetch from the slot following the slot which has the ex stage of instruction a (figure 7.80). next instruction third instruction in series if if ex if instruction a id : slot branch destination if id ex ..... ..... if id ex ..... (fetched but discarded) id ex ma wb figure 7.80 branch instruction when condition is satisfied 2. when condition is not satisfied if it is determined that a condition is not satisfied at the id stage, the ex stage proceeds without doing anything. the next instruction also executes a fetch (figure 7.81). next instruction third instruction in series if if ex if id instruction a id id : slot ..... if id ex ..... ..... ex ex ..... ..... figure 7.81 branch instruction when condition is not satisfied 462 unconditional branch instructions (common, or sh-2 cpu, sh-dsp): include the following instruction types: bra label braf rm (sh-2, sh-dsp cpu) bsr label bsrf rm (sh-2, sh-dsp cpu) jmp @rm jsr @rm rts delay slot branch destination if if ex if id instruction a id : slot ..... if id ex ..... id ex ..... ex ma wb ..... figure 7.82 unconditional branch instruction pipeline the pipeline has three stages: if, id, and ex (figure 7.82). unconditionally branched instructions are delay branched. the branch destination address is calculated in the ex stage. the instruction following the unconditional branch instruction (instruction a), that is, the delay slot instruction is not fetched and discarded as conditional branch instructions are, but is instead executed. note that the id slot of the delay slot instruction does stall for one cycle. the branch destination instruction starts its fetch from the slot after the slot that has the ex stage of instruction a. 463 7.4.6 system control instructions system control alu instructions (common, or sh-dsp): include the following instruction types: clrt ldc rm,sr ldc rm,gbr ldc rm,vbr ldc rm,mod (sh-dsp) ldc rm,re (sh-dsp) ldc rm,rs (sh-dsp) ldre @(disp,pc) ldrs @(disp,pc) lds rm,pr nop setrc rm (sh-dsp) setrc #imm (sh-dsp) sett stc sr,rn stc gbr,rn stc vbr,rn stc mod,rn (sh-dsp) stc re,rn (sh-dsp) stc rs,rn (sh-dsp) sts pr,rn next instruction third instruction in series if if ex if id instruction a id id : slot ex ex ..... ..... ..... figure 7.83 system control alu instruction pipeline the pipeline has three stages: if, id, and ex (figure 7.83). the data operation is completed in the ex stage via the alu. 464 ldc.l instructions (common, or sh-dsp): include the following instruction types: ldc.l @rm+, sr ldc.l @rm+, gbr ldc.l @rm+, vbr ldc.l @rm+, mod (sh-dsp) ldc.l @rm+, re (sh-dsp) ldc.l @rm+, rs (sh-dsp) next instruction third instruction in series if if ex ma wb if id instruction a id id : slot ex ex ..... ..... ..... figure 7.84 ldc.l instruction pipeline the pipeline has five stages: if, id, ex, ma, and ex (figure 7.84). the id of the following instruction is stalled two slots. 465 stc.l instructions (common, or sh-dsp): include the following instruction types: stc.l sr, @?n stc.l gbr, @?n stc.l vbr, @?n stc.l mod, @?n (sh-dsp) stc.l re, @?n (sh-dsp) stc.l rs, @?n (sh-dsp) next instruction third instruction in series if if ex ma if id instruction a id id : slot ex ex ..... ..... ..... figure 7.85 stc.l instruction pipeline the pipeline has four stages: if, id, ex, and ma (figure 7.85). the id of the next instruction is stalled one slot. 466 lds.l instruction (common): includes the following instruction type: lds.l @rm+, pr next instruction third instruction in series if if ex if id instruction a id : slot ex ..... id ma ex ..... wb ..... figure 7.86 lds.l instructions (pr) pipeline the pipeline has five stages: if, id, ex, ma, and wb (figure 7.86). it is the same as an ordinary load instruction. 467 sts.l instruction (common): includes the following instruction type: sts.l pr, @?n next instruction third instruction in series if if ex if id instruction a id : slot ex ..... id ma ex ..... ..... figure 7.87 sts.l instruction (pr) pipeline the pipeline has four stages: if, id, ex, and ma (figure 7.87). it is the same as an ordinary load instruction. 468 register ? mac transfer instructions (common, or sh-dsp): include the following instruction types: clrmac lds rm, mach lds rm, macl lds rm,dsr (sh-dsp) lds rm,a0 (sh-dsp) lds rm,x0 (sh-dsp) lds rm,x1 (sh-dsp) lds rm,y0 (sh-dsp) lds rm,y1 (sh-dsp) next instruction third instruction in series if if ex if id instruction a id : slot ex ..... id ma ex ..... ..... figure 7.88 register ? mac transfer instruction pipeline the pipeline has four stages: if, id, ex, and ma (figure 7.88). ma is a stage for accessing the multiplier. ma contends with if. this makes it the same as ordinary store instructions. since the multiplier does contend with the ma, however, the items noted for the multiplication, multiply/accumulate, double-length multiplication, and double-length multiply/accumulate instructions apply. 469 memory ? mac transfer instructions (common, or sh-dsp): include the following instruction types: lds.l @rm+, mach lds.l @rm+, macl lds.l @rm+,dsr (sh-dsp) lds.l @rm+,a0 (sh-dsp) lds.l @rm+,x0 (sh-dsp) lds.l @rm+,x1 (sh-dsp) lds.l @rm+,y0 (sh-dsp) lds.l @rm+,y1 (sh-dsp) next instruction third instruction in series if if ex if id instruction a id : slot ex ..... id ma ex ..... ..... figure 7.89 memory ? mac transfer instruction pipeline the pipeline has four stages: if, id, ex, and ma (figure 7.89). ma contends with if. ma is a stage for memory access and multiplier access. this makes it the same as ordinary load instructions. since the multiplier does contend with the ma, however, the items noted for the multiplication, multiply/accumulate, double-length multiplication, and double-length multiply/accumulate instructions apply. 470 mac ? register transfer instructions (common, or sh-dsp): include the following instruction types: sts mach, rn sts macl, rn sts dsr,rn sts a0,rn sts x0,rn sts x1,rn sts y0,rn sts y1,rn next instruction third instruction in series if if ex if id instruction a id : slot ex ..... id ma ex ..... wb ..... figure 7.90 mac ? register transfer instruction pipeline the pipeline has five stages: if, id, ex, ma, and wb (figure 7.90). ma is a stage for accessing the multiplier. ma contends with if. this makes it the same as ordinary load instructions. since the multiplier does contend with the ma, however, the items noted for the multiplication, multiply/accumulate, double-length multiplication, and double-length multiply/accumulate instructions apply. 471 mac ? memory transfer instructions (common, or sh-dsp): include the following instruction types: sts.l mach, @?n sts.l macl, @?n sts.l dsr,@?n (sh-dsp) sts.l a0,@?n (sh-dsp) sts.l x0,@?n (sh-dsp) sts.l x1,@?n (sh-dsp) sts.l y0,@?n (sh-dsp) sts.l y1,@?n (sh-dsp) next instruction third instruction in series if if ex if id instruction a id : slot ex ..... id ma ex ..... ..... figure 7.91 mac ? memory transfer instruction pipeline the pipeline has four stages: if, id, ex, and ma (figure 7.91). ma is a stage for accessing the memory and multiplier. ma contends with if. this makes it the same as ordinary store instructions. since the multiplier does contend with the ma, however, the items noted for the multiplication, multiply/accumulate, double-length multiplication, and double-length multiply/accumulate instructions apply. 472 rte instruction (common): rte delay slot branch destination if if ex if id rte id : slot ex ..... ma ma id ex ..... ..... figure 7.92 rte instruction pipeline the pipeline has five stages: if, id, ex, ma, and ma (figure 7.92). the mas do not contend with if. rte is a delayed branch instruction. the id of the delay slot instruction is stalled 3 slots. the if of the branch destination instruction starts from the slot following the ma of the rte. trap instruction (common): trapa #imm next instruction third instruction in series if if ex if instruction a id ex ma ex ..... ex if id branch destination ma ma ex ex if id : slot ...... figure 7.93 trap instruction pipeline the pipeline has nine stages: if, id, ex, ex, ma, ma, ma, ex, and ex (figure 7.93). the mas do not contend with if. trap is not a delayed branch instruction. the two instructions after the trap instruction are fetched, but they are discarded without being executed. the if of the branch destination instruction starts from the slot of the ex in the ninth stage of the trap instruction. sleep instruction (common): sleep next instruction if if ex sleep id : slot ..... figure 7.94 sleep instruction pipeline the pipeline has three stages: if, id and ex (figure 7.94). it is issued until the if of the next instruction. after the sleep instruction is executed, the cpu enters sleep mode or standby mode. 473 7.4.7 exception processing interrupt exception processing (common): the interrupt is received during the id stage of the instruction and everything after the id stage is replaced by the interrupt exception processing sequence. the pipeline has ten stages: if, id, ex, ex, ma, ma, ex, ma, ex, and ex (figure 7.95). interrupt exception processing is not a delayed branch. in interrupt exception processing, an overrun fetch (if) occurs. in branch destination instructions, the if starts from the slot that has the final ex in the interrupt exception processing. interrupt sources are external interrupt request pins such as nmi, user breaks, irq, and on-chip peripheral module interrupts. ex next instruction branch destination if ex interrupt id ex ma ma ex ma ex ex if id if id : slot if ...... figure 7.95 interrupt exception processing pipeline address error exception processing: the address error is received during the id stage of the instruction and everything after the id stage is replaced by the address error exception processing sequence. the pipeline has ten stages: if, id, ex, ex, ma, ma, ex, ma, ex, and ex (figure 7.96). address error exception processing is not a delayed branch. in address error exception processing, an overrun fetch (if) occurs. in branch destination instructions, the if starts from the slot that has the final ex in the address error exception processing. address errors are caused by instruction fetches and by data reads or writes. see the hardware manual for information on the causes of address errors. ex next instruction branch destination if ex interrupt id ex ma ma ex ma ex ex if id if id : slot if ...... figure 7.96 address error exception processing pipeline illegal instruction exception processing (common): the illegal instruction is received during the id stage of the instruction and everything after the id stage is replaced by the illegal instruction exception processing sequence. the pipeline has nine stages: if, id, ex, ex, ma, ma, ma, ex, and ex (figure 7.97). illegal instruction exception processing is not a delayed 474 branch. in illegal instruction exception processing, overrun fetches (if) occur. whether there is an if only in the next instruction or in the one after that as well depends on the instruction that was to be executed. in branch destination instructions, the if starts from the slot that has the final ex in the illegal instruction exception processing. illegal instruction exception processing is caused by ordinary illegal instructions and by instructions with illegal slots. when undefined code placed somewhere other than the slot directly after the delayed branch instruction (called the delay slot) is decoded, ordinary illegal instruction exception processing occurs. when undefined code placed in the delay slot is decoded or when an instruction placed in the delay slot to rewrite the program counter is decoded, an illegal slot instruction occurs. ex next instruction branch destination if ex interrupt id ex ma ma ma ex ex if if) id if id : slot if ...... figure 7.97 illegal instruction exception processing pipeline 475 appendix a cpu instructions a.1 cpu instructions instructions executed by the cpu core are described in alphabetical order. table a.1 cpu instructions in alphabetical order instruction operation code cycles t bit add #imm,rn rn + imm ? rn 0111nnnniiiiiiii 1 add rm,rn rn + rm ? rn 0011nnnnmmmm1100 1 addc rm,rn rn + rm + t ? rn, carry ? t 0011nnnnmmmm1110 1 carry addv rm,rn rn + rm ? rn, overflow ? t 0011nnnnmmmm1111 1 over- flow and #imm,r0 r0 & imm ? r0 11001001iiiiiiii 1 and rm,rn rn & rm ? rn 0010nnnnmmmm1001 1 and.b #imm,@(r0, gbr) (r0 + gbr) & imm ? (r0 + gbr) 11001101iiiiiiii 3 bf label if t = 0, disp 2 + pc ? pc; if t = 1, nop 10001011dddddddd 3/1* 1 bf/s label if t = 0, disp 2 + pc ? pc; if t = 1, nop 10001111dddddddd 2/1* 1 bra label delayed branch, disp 2 + pc ? pc 1010dddddddddddd 2 braf rm delayed branch, rm + pc ? pc 0000mmmm00100011 2 bsr label delayed branch, pc ? pr, disp 2 + pc ? pc 1011dddddddddddd 2 bsrf rm delayed branch, pc ? pr, rm + pc ? pc 0000mmmm00000011 2 bt label if t = 1, disp 2 + pc ? pc; if t = 0, nop 10001001dddddddd 3/1* 1 bt/s label if t = 1, disp 2 + pc ? pc; if t = 0, nop 10001101dddddddd 2/1* 1 476 table a.1 cpu instructions in alphabetical order (cont) instruction operation code cycles t bit clrmac 0 ? mach, macl 0000000000101000 1 clrt 0 ? t 0000000000001000 10 cmp/eq #imm,r0 if r0 = imm, 1 ? t 10001000iiiiiiii 1 comparison result cmp/eq rm,rn if rn = rm, 1 ? t 0011nnnnmmmm0000 1 comparison result cmp/ge rm,rn if rn 3 rm with signed data, 1 ? t 0011nnnnmmmm0011 1 comparison result cmp/gt rm,rn if rn > rm with signed data, 1 ? t 0011nnnnmmmm0111 1 comparison result cmp/hi rm,rn if rn > rm with unsigned data, 0011nnnnmmmm0110 1 comparison result cmp/hs rm,rn if rn 3 rm with unsigned data, 1 ? t 0011nnnnmmmm0010 1 comparison result cmp/pl rn if rn>0, 1 ? t 0100nnnn00010101 1 comparison result cmp/pz rn if rn 3 0, 1 ? t 0100nnnn00010001 1 comparison result cmp/str rm,rn if rn and rm have an equivalent byte, 1 ? t 0010nnnnmmmm1100 1 comparison result div0s rm,rn msb of rn ? q, msb of rm ? m, m ^ q ? t 0010nnnnmmmm0111 1 calculation result div0u 0 ? m/q/t 0000000000011001 10 div1 rm,rn single-step division (rn/rm) 0011nnnnmmmm0100 1 calculation result dmuls.l rm,rn signed operation of rn rm ? mach, machl 0011nnnnmmmm1101 2 to?* 2 dmulu.l rm,rn unsigned operation of rn rm ? mach, macl 0011nnnnmmmm0101 2 to?* 2 477 table a.1 cpu instructions in alphabetical order (cont) instruction operation code cycles t bit dt rn rn ?1 ? rn, when rn is 0, 1 ? t. when rn is nonzero, 0 ? t 0100nnnn00010000 1 comp- arison result exts.b rm,rn a byte in rm is sign- extended ? rn 0110nnnnmmmm1110 1 exts.w rm,rn a word in rm is sign- extended ? rn 0110nnnnmmmm1111 1 extu.b rm,rn a byte in rm is zero- extended ? rn 0110nnnnmmmm1100 1 extu.w rm,rn a word in rm is zero- extended ? rn 0110nnnnmmmm1101 1 jmp @rm delayed branch, rm ? pc 0100mmmm00101011 2 jsr @rm delayed branch, pc ? pr, rm ? pc 0100mmmm00001011 2 ldc rm,gbr rm ? gbr 0100mmmm00011110 1 ldc rm,mod rm ? mod 0100mmmm01011110 1 ldc rm,re rm ? re 0100mmmm01111110 1 ldc rm,rs rm ? rs 0100mmmm01101110 1 ldc rm,sr rm ? sr 0100mmmm00001110 1 lsb ldc rm,vbr rm ? vbr 0100mmmm00101110 1 ldc.l @rm+,gbr (rm) ? gbr,rm+4 ? rm 0100mmmm00010111 3 ldc.l @rm+,mod (rm) ? mod,rn+4 ? rn 0100mmmm01010111 3 ldc.l @rm+,re (rm) ? re,rn+4 ? rn 0100mmmm01110111 3 ldc.l @rm+,rs (rm) ? rs,rn+4 ? rn 0100mmmm01100111 3 ldc.l @rm+,sr (rm) ? sr,rm+4 ? rm 0100mmmm00000111 3 lsb ldc.l @rm+,vbr (rm) ? vbr,rm+4 ? rm 0100mmmm00100111 3 ldre @(disp,pc) disp 2 +pc ? re 10001110dddddddd 1 ldrs @(disp,pc) disp 2 +pc ? rs 10001100dddddddd 1 lds rm,a0 rm ? a0 0100mmmm01111010 1 lds rm,dsr rm ? dsr 0100mmmm01101010 1 lds rm,mach rm ? mach 0100mmmm00001010 1 lds rm,macl rm ? macl 0100mmmm00011010 1 lds rm,pr rm ? pr 0100mmmm00101010 1 478 table a.1 cpu instructions in alphabetical order (cont) instruction operation code cycles t bit lds rm,x0 rm ? x0 0100mmmm10001010 1 lds rm,x1 rm ? x1 0100mmmm10011010 1 lds rm,y0 rm ? y0 0100mmmm10101010 1 lds rm,y1 rm ? y1 0100mmmm10111010 1 lds.l @rm+,a0 (rm) ? a0, rm + 4 ? rm 0100mmmm01110110 1 lds.l @rm+,dsr (rm) ? dsr, rm + 4 ? rm 0100mmmm01100110 1 lds.l @rm+,mach (rm) ? mach, rm + 4 ? rm 0100mmmm00000110 1 lds.l @rm+,macl (rm) ? macl, rm + 4 ? rm 0100mmmm00010110 1 lds.l @rm+,pr (rm) ? pr, rm + 4 ? rm 0100mmmm00100110 1 lds.l @rm+,x0 (rm) ? x0,rm+4 ? rm 0100mmmm10000110 1 lds.l @rm+,x1 (rm) ? x1,rm+4 ? rm 0100mmmm10010110 1 lds.l @rm+,y0 (rm) ? y0,rm+4 ? rm 0100mmmm10100110 1 lds.l @rm+,y1 (rm) ? y1,rm+4 ? rm 0100mmmm10110110 1 mac.l @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac 0000nnnnmmmm1111 3 (2 to 4)* 2 mac.w @rm+,@rn+ signed operation of (rn) (rm) + mac ? mac 0100nnnnmmmm1111 3/(2)* 2 mov #imm,rn #imm ? sign extension ? rn 1110nnnniiiiiiii 1 mov rm,rn rm ? rn 0110nnnnmmmm0011 1 mov.b @(disp,gbr), r0 (disp + gbr) ? sign extension ? r0 11000100dddddddd 1 mov.b @(disp,rm), r0 (disp + rm) ? sign extension ? r0 10000100mmmmdddd 1 mov.b @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1100 1 mov.b @rm+,rn (rm) ? sign extension ? rn, rm + 1 ? rm 0110nnnnmmmm0100 1 mov.b @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0000 1 479 table a.1 cpu instructions in alphabetical order (cont) instruction operation code cycles t bit mov.b r0,@(disp, gbr) r0 ? (disp + gbr) 11000000dddddddd 1 mov.b r0,@(disp, rn) r0 ? (disp + rn) 10000000nnnndddd 1 mov.b rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0100 1 mov.b rm,@?n rn? ? rn, rm ? (rn) 0010nnnnmmmm0100 1 mov.b rm,@rn rm ? (rn) 0010nnnnmmmm0000 1 mov.l @(disp,gbr) ,r0 (disp 4 + gbr) ? r0 11000110dddddddd 1 mov.l @(disp,pc), rn (disp 4 + pc) ? rn 1101nnnndddddddd 1 mov.l @(disp,rm), rn (disp 4 + rm) ? rn 0101nnnnmmmmdddd 1 mov.l @(r0,rm),rn (r0 + rm) ? rn 0000nnnnmmmm1110 1 mov.l @rm+,rn (rm) ? rn, rm + 4 ? rm 0110nnnnmmmm0110 1 mov.l @rm,rn (rm) ? rn 0110nnnnmmmm0010 1 mov.l r0,@(disp, gbr) r0 ? (disp 4 + gbr) 11000010dddddddd 1 mov.l rm,@(disp, rn) rm ? (disp 4 + rn) 0001nnnnmmmmdddd 1 mov.l rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0110 1 mov.l rm,@?n rn? ? rn, rm ? (rn) 0010nnnnmmmm0110 1 mov.l rm,@rn rm ? (rn) 0010nnnnmmmm0010 1 mov.w @(disp,gbr) ,r0 (disp 2 + gbr) ? sign extension ? r0 11000101dddddddd 1 mov.w @(disp,pc), rn (disp 2 + pc) ? sign extension ? rn 1001nnnndddddddd 1 mov.w @(disp,rm), r0 (disp 2 + rm) ? sign extension ? r0 10000101mmmmdddd 1 mov.w @(r0,rm),rn (r0 + rm) ? sign extension ? rn 0000nnnnmmmm1101 1 mov.w @rm+,rn (rm) ? sign extension ? rn, rm + 2 ? rm 0110nnnnmmmm0101 1 480 table a.1 cpu instructions in alphabetical order (cont) instruction operation code cycles t bit mov.w @rm,rn (rm) ? sign extension ? rn 0110nnnnmmmm0001 1 mov.w r0,@(disp, gbr) r0 ? (disp 2 + gbr) 11000001dddddddd 1 mov.w r0,@(disp, rn) r0 ? (disp 2 + rn) 10000001nnnndddd 1 mov.w rm,@(r0,rn) rm ? (r0 + rn) 0000nnnnmmmm0101 1 mov.w rm,@?n rn? ? rn, rm ? (rn) 0010nnnnmmmm0101 1 mov.w rm,@rn rm ? (rn) 0010nnnnmmmm0001 1 mova @(disp,pc), r0 disp 4 + pc ? r0 11000111dddddddd 1 movt rn t ? rn 0000nnnn00101001 1 mul.l rm,rn rn rm ? macl 0000nnnnmmmm0111 2 to 4* 2 muls.w rm,rn signed operation of rn rm ? mac 0010nnnnmmmm1111 1 to 3* 2 mulu.w rm,rn unsigned operation of rn rm ? mac 0010nnnnmmmm1110 1 to 3* 2 neg rm,rn 0?m ? rn 0110nnnnmmmm1011 1 negc rm,rn 0?m? ? rn, borrow ? t 0110nnnnmmmm1010 1 bor- row nop no operation 0000000000001001 1 not rm,rn ~rm ? rn 0110nnnnmmmm0111 1 or #imm,r0 r0 | imm ? r0 11001011iiiiiiii 1 or rm,rn rn | rm ? rn 0010nnnnmmmm1011 1 or.b #imm,@(r0, gbr) (r0 + gbr) | imm ? (r0 + gbr) 11001111iiiiiiii 3 rotcl rn t ? rn ? t 0100nnnn00100100 1 msb rotcr rn t ? rn ? t 0100nnnn00100101 1 lsb rotl rn t ? rn ? msb 0100nnnn00000100 1 msb rotr rn lsb ? rn ? t 0100nnnn00000101 1 lsb rte delayed branch, stack area ? pc/sr 0000000000101011 4 lsb 481 table a.1 cpu instructions in alphabetical order (cont) instruction operation code cycles t bit rts delayed branch, pr ? pc 0000000000001011 2 setrc #imm imm ? rc (sr[23:16]), 0 ? sr[27:24] 10000010iiiiiiii 1 setrc rm rm [11:0]), 0 ? rc(sr[27:16]) 0100mmmm00010100 1 sett 1 ? t 0000000000011000 11 shal rn t ? rn ? 0 0100nnnn00100000 1 msb shar rn msb ? rn ? t 0100nnnn00100001 1 lsb shll rn t ? rn ? 0 0100nnnn00000000 1 msb shll2 rn rn << 2 ? rn 0100nnnn00001000 1 shll8 rn rn << 8 ? rn 0100nnnn00011000 1 shll16 rn rn << 16 ? rn 0100nnnn00101000 1 shlr rn 0 ? rn ? t 0100nnnn00000001 1 lsb shlr2 rn rn>>2 ? rn 0100nnnn00001001 1 shlr8 rn rn>>8 ? rn 0100nnnn00011001 1 shlr16 rn rn>>16 ? rn 0100nnnn00101001 1 sleep sleep 0000000000011011 3 stc gbr,rn gbr ? rn 0000nnnn00010010 1 stc mod,rn mod ? rn 0000nnnn01010010 1 stc re,rn re ? rn 0000nnnn01110010 1 stc rs,rn rs ? rn 0000nnnn01100010 1 stc sr,rn sr ? rn 0000nnnn00000010 1 stc vbr,rn vbr ? rn 0000nnnn00100010 1 stc.l gbr,@?n rn? ? rn, gbr ? (rn) 0100nnnn00010011 2 stc.l mod,@?n rn? ? rn, mod ? (rn) 0100nnnn01010011 2 stc.l re,@?n rn? ? rn, re ? (rn) 0100nnnn01110011 2 stc.l rs,@?n rn? ? rn, rs ? (rn) 0100nnnn01100011 2 482 table a.1 cpu instructions in alphabetical order (cont) instruction operation code cycles t bit stc.l sr,@?n rn? ? rn, sr ? (rn) 0100nnnn00000011 2 stc.l vbr,@?n rn? ? rn, vbr ? (rn) 0100nnnn00100011 2 sts a0,rn a0 ? rn 0000nnnn01111010 1 sts dsr,rn dsr ? rn 0000nnnn01101010 1 sts mach,rn mach ? rn 0000nnnn00001010 1 sts macl,rn macl ? rn 0000nnnn00011010 1 sts pr,rn pr ? rn 0000nnnn00101010 1 sts x0,rn x0 ? rn 0000nnnn10001010 1 sts x1,rn x1 ? rn 0000nnnn10011010 1 sts y0,rn y0 ? rn 0000nnnn10101010 1 sts y1,rn y1 ? rn 0000nnnn10111010 1 sts.l a0,@?n rn? ? rn, a0 ? (rn) 0100nnnn01110010 1 sts.l dsr,@?n rn? ? rn, dsr ? (rn) 0100nnnn01100010 1 sts.l mach,@?n rn? ? rn, mach ? (rn) 0100nnnn00000010 1 sts.l macl,@?n rn? ? rn, macl ? (rn) 0100nnnn00010010 1 sts.l pr,@?n rn? ? rn, r ? (rn) 0100nnnn00100010 1 sts.l x0,@-rn rn? ? rn,x0 ? (rn) 0100nnnn10000010 1 sts.l x1,@-rn rn? ? rn,x1 ? (rn) 0100nnnn10010010 1 sts.l y0,@-rn rn? ? rn,y0 ? (rn) 0100nnnn10100010 1 sts.l y1,@-rn rn? ? rn,y1 ? (rn) 0100nnnn10110010 1 sub rm,rn rn?m ? rn 0011nnnnmmmm1000 1 subc rm,rn rn?m? ? rn, borrow ? t 0011nnnnmmmm1010 1 borrow subv rm,rn rn?m ? rn, underflow ? t 0011nnnnmmmm1011 1 under- flow 483 table a.1 cpu instructions in alphabetical order (cont) instruction operation code cycles t bit swap. b rm,rn rm ? swap the two lowest-order bytes ? rn 0110nnnnmmmm1000 1 swap. w rm,rn rm ? swap two consecutive words ? rn 0110nnnnmmmm1001 1 tas.b @rn if (rn) is 0, 1 ? t; 1 ? msb of (rn) 0100nnnn00011011 4 test result trapa #imm pc/sr ? stack area , (imm 4 + vbr) ? pc 11000011iiiiiiii 8 tst #imm,r0 r0 & imm; if the result is 0, 1 ? t 11001000iiiiiiii 1 test result tst rm,rn rn & rm; if the result is 0, 1 ? t 0010nnnnmmmm1000 1 test result tst.b #imm,@(r0, gbr) (r0 + gbr) & imm; if the result is 0, 1 ? t 11001100iiiiiiii 3 test result xor #imm,r0 r0 ^ imm ? r0 11001010iiiiiiii 1 xor rm,rn rn ^ rm ? rn 0010nnnnmmmm1010 1 xor.b #imm,@(r0, gbr) (r0 + gbr) ^ imm ? (r0 + gbr) 11001110iiiiiiii 3 xtrct rm,rn rm: middle 32 bits of rn ? rn 0010nnnnmmmm1101 1 notes: 1. the normal minimum number of execution cycles. the number in parentheses is the number of cycles when there is contention with following instructions. 2. one state when it does not branch. added cpu instructions: table a.2 shows the cpu instructions in the sh-dsp added since the sh-2 (3 types, 24 instructions). table a.3 shows the cpu instructions in the sh-2 added since the sh-1 (6 types, 9 instructions). 484 table a.2 cpu instructions in the sh-dsp added since the sh-2 instruction operation code cycles t bit ldc rm,mod rm ? mod 0100mmmm01011110 1 ldc rm,re rm ? re 0100mmmm01111110 1 ldc rm,rs rm ? rs 0100mmmm01101110 1 ldc.l @rm+,mod (rm) ? mod, rm + 4 ? rm 0100mmmm01010111 3 ldc.l @rm+,re (rm) ? re, rm + 4 ? rm 0100mmmm01110111 3 ldc.l @rm+,rs (rm) ? rs, rm + 4 ? rm 0100mmmm01100111 3 ldre @(disp,pc) disp 2 + pc ? re 10001110dddddddd 1 ldrs @(disp,pc) disp 2 + pc ? rs 10001100dddddddd 1 lds rm,dsr rm ? dsr 0100mmmm01101010 1 lds rm,a0 rm ? a0 0100mmmm01111010 1 lds rm,x0 rm ? x0 0100mmmm10001010 1 lds rm,x1 rm ? x1 0100mmmm10011010 1 lds rm,y0 rm ? y0 0100mmmm10101010 1 lds rm,y1 rm ? y1 0100mmmm10111010 1 lds.l @rm+,dsr (rm) ? dsr, rm + 4 ? rm 0100mmmm01100110 1 lds.l @rm+,a0 (rm) ? a0, rm + 4 ? rm 0100mmmm01110110 1 lds.l @rm+,x0 (rm) ? x0,rm+4 ? rm 0100nnnn10000110 1 lds.l @rm+,x1 (rm) ? x1,rm+4 ? rm 0100nnnn10010110 1 lds.l @rm+,y0 (rm) ? y0,rm+4 ? rm 0100nnnn10100110 1 lds.l @rm+,y1 (rm) ? y1,rm+4 ? rm 0100nnnn10110110 1 setrc rm rm[11:0] ? rc (sr[27:16]) 0100nnnn00010100 1 setrc #imm imm ? rc (sr [23:16]), zeros ? sr[27:24] 10000010iiiiiiii 1 stc mod,rn mod ? rn 0000nnnn01010010 1 stc re,rn re ? rn 0000nnnn01110010 1 stc rs,rn rs ? rn 0000nnnn01100010 1 485 table a.2 cpu instructions in the sh-dsp added since the sh-2 (cont) instruction operation code cycles t bit stc.l mod,@?n rn? ? rn, mod ? (rn) 0100nnnn01010011 2 stc.l re,@?n rn? ? rn, re ? (rn) 0100nnnn01110011 2 stc.l rs,@?n rn? ? rn, rs ? (rn) 0100nnnn01100011 2 sts dsr,rn dsr ? rn 0000nnnn01101010 1 sts a0,rn a0 ? rn 0000nnnn01111010 1 sts x0,rn x0 ? rn 0000nnnn10001010 1 sts x1,rn x1 ? rn 0000nnnn10011010 1 sts y0,rn y0 ? rn 0000nnnn10101010 1 sts y1,rn y1 ? rn 0000nnnn10111010 1 sts.l dsr,@?n rn? ? rn, dsr ? (rn) 0100nnnn01100010 1 sts.l a0,@?n rn? ? rn, a0 ? (rn) 0100nnnn01110010 1 sts.l x0,@-rn rn? ? rn,x0 ? (rn) 0100nnnn10000010 1 sts.l x1,@-rn rn? ? rn,x1 ? (rn) 0100nnnn10010010 1 sts.l y0,@-rn rn? ? rn,y0 ? (rn) 0100nnnn10100010 1 sts.l y1,@-rn rn? ? rn,y1 ? (rn) 0100nnnn10110010 1 486 table a.3 cpu instructions in the sh-2 added since the sh-1 instruction operation code cycles t bit bf/s label when t = 0, disp 2 + pc ? pc; when t = 1, nop 10001111dddddddd 2/1 braf rm delayed branch, rm + pc ? pc 0000mmmm00100011 2 bsrf rm delayed branch, pc ? pr, rm + pc ? pc 0000mmmm00000011 2 bt/s label when t = 1, disp 2 + pc ? pc; when t = 0, nop 10001101dddddddd 2/1 dmuls.l rm,rn signed rn x rm ? mach, macl 32 32 ? 64 bits 0011nnnnmmmm1101 2 (to 4) dmulu.l rm,rn unsigned rn x rm ? mach, macl 32 32 ? 64 bits 0011nnnnmmmm0101 2 (to 4) dt rn rn - 1 ? rn, when rn is 0, 1 ? t, when rn is nonzero, 0 ? t 0100nnnn00010000 1 compa- rison result mac.l @rm+,@rn+ signed (rn) (rm) + mac ? mac 0000nnnnmmmm1111 2 (to 4) mul.l rm,rn rn rm ? macl 0000nnnnmmmm0111 2 (to 4) sh-1/sh-2/sh-dsp programming manual publication date: 1st edition, september 1994 4th edition, march 1999 published by: electronic devices sales & marketing group semiconductor & integrated circuits group hitachi, ltd. edited by: technical documentation group ul media co., ltd. copyright ?hitachi, ltd., 1994. all rights reserved. printed in japan. |
Price & Availability of SH-1SH-2SH-DSP |
|
|
All Rights Reserved © IC-ON-LINE 2003 - 2022 |
[Add Bookmark] [Contact Us] [Link exchange] [Privacy policy] |
Mirror Sites : [www.datasheet.hk]
[www.maxim4u.com] [www.ic-on-line.cn]
[www.ic-on-line.com] [www.ic-on-line.net]
[www.alldatasheet.com.cn]
[www.gdcy.com]
[www.gdcy.net] |